Home Videos

Nvidia CEO Jensen Huang talks about his company’s latest innovations at CES 2026 (YouTube Video Transcript)

Need transcripts for other videos? Try our YouTube Transcript Generator →
Title: Nvidia CEO Jensen Huang talks about his company’s latest innovations at CES 2026
Duration: 01:50:40
Total Correct Answers:
Current Caption
Correct

Learning Modes

YouTube Video Transcript Hide

Ask AI Result

The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here (00:00:14) lay. (00:00:42) Please take your seats. Our event is (00:00:44) about to begin. (00:00:54) [music] (00:00:59) >> Hey (00:01:12) lays (00:01:28) excuse (00:01:30) Or would you Go fetus (00:01:46) glory (00:01:55) glory (00:01:59) kicked into the cur. (00:02:10) It's better than (00:02:25) my name. (00:02:54) I got a fire in me. You're going to set (00:02:57) burn. (00:03:17) [music] (00:03:36) my nails. (00:03:38) Loose (00:03:50) go. (00:07:41) Hey, (00:07:49) hey, hey. (00:08:18) Heat. Heat. (00:09:57) Hey. (00:09:58) Hey. Hey. (00:10:16) Yeah. (00:10:23) Yeah. Yeah. (00:13:59) Natalie. (00:14:26) Hey, (00:14:46) hey, (00:14:49) hey. (00:15:30) Heat. Heat. (00:15:40) Ready, (00:16:10) Let's go. (00:17:08) Heat. Heat. (00:17:30) Heat. Heat. (00:18:06) Heat. (00:18:10) Heat. (00:18:58) >> [music] (00:19:36) >> Welcome to the stage, Nvidia founder and (00:19:39) CEO, Jensen Wong. (00:19:49) Hello, Las Vegas. (00:19:53) Happy New Year. (00:19:55) >> Welcome to CES. (00:19:58) Well, we have about 15 kilos worth of (00:20:01) material to pack in here. I'm so happy (00:20:03) to see all of you. You got 3,000 people (00:20:05) in this auditorium. There's 2,000 people (00:20:07) in a courtyard watching us. There's (00:20:09) another thousand people apparently in (00:20:11) the fourth floor where there were (00:20:13) supposed to be Nvidia show floors all (00:20:15) watching this keynote and of course (00:20:17) millions around the world are going to (00:20:18) be watching this to kick off this new (00:20:21) year. Well, every 10 to 15 years the (00:20:25) computer industry resets. (00:20:28) A new platform shift happens (00:20:31) from mainframe to PC, PC to internet, (00:20:34) internet to cloud, cloud to mobile. Each (00:20:37) time (00:20:39) the world of applications (00:20:42) target a new platform, that's why it's (00:20:43) called a platform shift. You write new (00:20:45) applications for a new computer. (00:20:49) Except this time (00:20:51) there are two simultaneous platform (00:20:53) shifts in fact happening at the same (00:20:55) time. (00:20:58) While we now move to AI, applications (00:21:01) are now going to be built on top of AI. (00:21:04) At first, people thought AIS are (00:21:06) applications. And in fact, AIs are (00:21:08) applications. But you're going to build (00:21:10) applications on top of AIS. (00:21:13) But in addition to that, (00:21:16) how you run the software, how you (00:21:19) develop the software (00:21:21) fundamentally changed. The entire (00:21:24) fabulary stack of the computer industry (00:21:26) is being reinvented. (00:21:28) You no longer program the software, you (00:21:31) train the software. You don't run it on (00:21:34) CPUs, you run it on GPUs. (00:21:38) And whereas applications were (00:21:41) pre-recorded, pre-ompiled (00:21:43) and run on your device, now applications (00:21:47) understand the context and generate (00:21:50) every single pixel, every single token (00:21:52) completely from scratch every single (00:21:55) time. (00:21:56) Computing has been fundamentally (00:21:58) reshaped as a result of accelerated (00:22:00) computing, as a result of artificial (00:22:02) intelligence. Every single layer of that (00:22:04) five layer cake is now being re (00:22:07) reinvented. (00:22:08) Well, what that means is some 10 (00:22:11) trillion dollars or so of the last (00:22:13) decade of computing is now being (00:22:15) modernized to this new way of doing (00:22:17) computing. What that means is hundreds (00:22:21) of billions of dollars, a couple hundred (00:22:22) billion dollars in VC funding each year (00:22:24) is going into modernize and inventing (00:22:27) this new world. And what it means is a (00:22:30) hundred trillion dollars of industry, (00:22:33) several percent of which is R&D budget (00:22:35) is shifting over to artificial (00:22:38) intelligence. People ask where is the (00:22:40) money coming from? That's where the (00:22:42) money is coming from. the modernization (00:22:45) of AI to AI, the shifting of R&D budgets (00:22:49) from classical methods to now artificial (00:22:51) intelligence methods. Enormous amounts (00:22:54) of investments coming into this (00:22:56) industry, which explains why we're so (00:22:58) busy. And this last year was no (00:23:00) difference. This last year was (00:23:02) incredible. (00:23:03) This last year, there's a slide coming. (00:23:07) This is what happens when you don't (00:23:08) practice. (00:23:11) It's the first keynote of the year. I (00:23:13) hope it's your first keynote of the (00:23:14) year. Otherwise, you can been you have (00:23:15) been pretty pretty busy. This is our (00:23:17) first keynote of the year. We're going (00:23:18) to get the spiderwebs out. And so 2025 (00:23:23) was an incredible year. (00:23:26) It's just see it seemed like everything (00:23:27) was happening all at the same time. And (00:23:29) it in fact it probably was. The first (00:23:31) thing of course is scaling loss. (00:23:35) In 2015, (00:23:38) the first language model that I thought (00:23:40) was really going to make a difference (00:23:42) made a huge difference. It was called (00:23:43) BERT. 2017, Transformers came. It wasn't (00:23:48) until 5 years later, 2022, that Chad GPT (00:23:51) moment happened and it awakened the (00:23:53) world to the possibilities of artificial (00:23:55) intelligence. (00:23:57) Something very important happened a year (00:23:59) after that. (00:24:01) The first 01 model from Chad GPT, the (00:24:04) first reasoning model, completely (00:24:06) revolutionary, invented this idea called (00:24:09) test time scaling, which is a very (00:24:10) common sense, common sensical thing. Not (00:24:13) only do we pre-train a model to learn, (00:24:15) we postrain it with our re reinforcement (00:24:18) learning so that it could learn skills. (00:24:20) And now we also have test time scaling, (00:24:22) which is another way of saying thinking. (00:24:25) You think in real time. Each one of (00:24:27) these phases of artificial intelligence (00:24:30) requires enormous amount of compute and (00:24:32) the computing law continued to scale. (00:24:34) Large language models continue to get (00:24:36) better. Meanwhile, another breakthrough (00:24:39) happened and this breakthrough happened (00:24:41) in 2024. (00:24:43) Agentic systems started to emerge in (00:24:46) 2025. It started to pervase to to uh (00:24:49) proliferate just about everywhere. (00:24:51) Agentic models that have the ability to (00:24:54) reason, (00:24:55) look up information, do research, use (00:24:58) tools, plan futures, simulate outcomes. (00:25:03) All of a sudden started to solve very (00:25:05) very important problems. One of my (00:25:07) favorite Agentic models is called cursor (00:25:10) which revolutionized the way we do (00:25:12) software programming at NVIDIA. Agentic (00:25:14) systems are going to really take off (00:25:16) from here. Of course, there were other (00:25:19) types of AI. We know that large language (00:25:20) models isn't the only type of (00:25:22) information. Wherever the universe has (00:25:24) information, wherever the universe has (00:25:26) structure, we could teach a large (00:25:29) language model a form of language model (00:25:32) to go understand that information to (00:25:35) understand its representation and to (00:25:38) turn that into an AI. One of the biggest (00:25:40) most important one is physical AI. AI (00:25:44) that understand the laws of nature. And (00:25:47) then of course physical AI is about AI's (00:25:50) interacting with the world but the world (00:25:52) itself has information encoded (00:25:54) information and that's called AI (00:25:56) physics. AI that in the case of physical (00:25:59) AI you have AI that interacts with the (00:26:01) physical world and you have AI physics (00:26:04) AI that understands the laws of physics. (00:26:07) And then lastly one of the most (00:26:09) important things that happened last year (00:26:11) the advancement of open models. We can (00:26:14) now know that AI is going to proliferate (00:26:17) everywhere when open source when open (00:26:20) innovation when innovation across every (00:26:23) single company and every industry around (00:26:24) the world is activated. At the same (00:26:26) time, open models really took off last (00:26:29) year. In fact, (00:26:31) last year we saw the advance of DeepSeek (00:26:36) R1, the first open model that's a (00:26:40) reasoning system. It caught the world by (00:26:44) surprise and it activated literally this (00:26:48) entire movement. Really, really exciting (00:26:50) work. We're so happy with it. Now we (00:26:53) have openings open model systems all (00:26:56) over the world of all different kinds (00:26:57) and we now know that open models have (00:27:00) also reached the frontier. still solidly (00:27:03) is six months behind the frontier models (00:27:06) but every single six months a new model (00:27:08) is emerging and these models are getting (00:27:11) smarter and smarter because of that you (00:27:15) could see the number of downloads has (00:27:17) exploded (00:27:19) the number of downloads is growing so (00:27:21) fast because startups want to (00:27:23) participate in the AI revolution large (00:27:26) companies want to researchers want to (00:27:28) students want to just about every single (00:27:30) country wants (00:27:31) How is it possible that intelligence, (00:27:34) the digital form of intelligence will (00:27:36) leave anyone behind? And so open models (00:27:40) has really revolutionized artificial (00:27:42) intelligence last year. This entire (00:27:44) industry is going to be reshaped as a (00:27:46) result of that. Now, we had this inkling (00:27:48) some time ago. You might have heard that (00:27:51) several years ago, we started to build (00:27:55) and operate our own AI supercomputers. (00:27:57) We call them DGX clouds. A lot of people (00:28:00) asked, are you going to in going into (00:28:02) the cloud business? The answer is no. (00:28:04) We're building these DGX supercomputers (00:28:06) for our own use. Well, it turns out we (00:28:09) have billions of dollars of (00:28:11) supercomputers in operation so that we (00:28:13) could develop our open models. I am so (00:28:17) pleased with the work that we're doing. (00:28:19) It is starting to attract attention all (00:28:21) over the world and all over the (00:28:22) industries because we are doing frontier (00:28:25) AI model work in so many different (00:28:27) domains. The work that we did in (00:28:29) proteins in digital biology. La protina (00:28:32) to be able to synthesize and generate (00:28:34) proteins. Open fold 3 to understand the (00:28:37) understand the structure of proteins. (00:28:40) [snorts] EVO 2 how to understand and (00:28:43) generate (00:28:45) multiple proteins otherwise the (00:28:47) beginnings of cellular cellular (00:28:49) representation. Earth 2 AI that (00:28:52) understands laws of physics. The work (00:28:54) that we did with forecast net, the work (00:28:56) that we did with Cordiff really (00:28:58) revolutionized the way that people are (00:29:00) doing weather prediction. Neotron, (00:29:03) we've now doing groundbreaking work (00:29:05) there. The first hybrid transformer SSM (00:29:09) model that's incredibly fast can and (00:29:11) therefore can think for a very long time (00:29:14) or can think very quickly with that for (00:29:17) not a very long time and produce very (00:29:19) very smart intelligent answers. (00:29:20) Neimotron 3 is groundbreaking work and (00:29:23) you can expect us to deliver other (00:29:25) versions of Neimotron 3 in the near (00:29:26) future. Cosmos (00:29:30) a frontier open world foundation model (00:29:34) one that understand how the world works. (00:29:37) Groot a humanoid robotic system (00:29:39) articulation mobility locomotion. These (00:29:43) models, these technologies are now being (00:29:46) integrated and in the each one of these (00:29:48) cases open to the world. Frontier human (00:29:51) and robotics models open to the world. (00:29:53) And then today we're going to talk a (00:29:55) little bit about Alpamo, the work that (00:29:56) we've been doing in self-driving cars. (00:29:58) Not only do we open source the models, (00:30:01) we also open source the data that we use (00:30:04) to train those models because that in (00:30:07) that way only in that way can you truly (00:30:10) trust how the models came to be. We open (00:30:14) source all the models. We help you make (00:30:16) derivatives from them. We have a whole (00:30:18) suite of libraries we call the Nemo (00:30:20) libraries, physics li physics Nemo (00:30:23) libraries and the clarono libraries. (00:30:25) Each biono libraries each one of these (00:30:27) libraries are life cycle management (00:30:29) systems of AIS so that you could process (00:30:32) the data you could generate data you (00:30:33) could train the model you could create (00:30:35) the model evaluate the model guardrail (00:30:37) the model all the way to deploying the (00:30:39) model each one of these libraries are (00:30:42) incredibly complex and all of it is open (00:30:44) sourced and so now on top of this (00:30:47) platform NVIDIA is a frontier AI model (00:30:51) builder and we build it in a very (00:30:54) special way we build it completely in (00:30:56) the open so that we can enable every (00:30:59) company, every industry, every country (00:31:01) to be part of this AI revolution. I'm (00:31:04) incredibly proud of the work that we're (00:31:06) doing there. In fact, if you notice the (00:31:08) the charts, the chart shows that our (00:31:12) contribution to this industry is bar (00:31:15) none and you're going to see us in fact (00:31:16) continue to do that if not accelerate. (00:31:19) These models are also world class. (00:31:24) All systems are down. (00:31:28) This never happens in Santa Clara. (00:31:32) Is it because of Las Vegas? (00:31:40) Somebody must have went won a jackpot (00:31:42) outside. (00:31:44) [clears throat] All systems are down. (00:31:49) Okay, I think my system's still down, (00:31:52) but that's okay. I I I've I make it up (00:31:55) as I go. And so so uh not only are these (00:31:58) models uh frontier capable, not only are (00:32:02) they open, they're also top the (00:32:04) leaderboards. This is an area where (00:32:05) we're very proud. They top leaderboards (00:32:07) in intelligence. Uh we have uh uh (00:32:10) important models that understand (00:32:12) multimodality documents, otherwise known (00:32:15) as PDFs. The most valuable content in (00:32:18) the world are captured in PDFs, but (00:32:20) there it takes artificial intelligence (00:32:22) to find out what's inside, interpret (00:32:25) what's inside, and help you read it. And (00:32:27) so our PDF retrievers, our PDF parsers (00:32:30) are worldclass. (00:32:32) Our speech recognition models absolutely (00:32:34) worldclass. Our retrieval models, (00:32:37) basically search, semantic search, AI (00:32:40) search, the database engine of the (00:32:42) modern AI era, worldclass. So, we're on (00:32:46) top of leaderboards constantly. This is (00:32:48) an area we're very proud of. And all of (00:32:50) that is in service of your ability to (00:32:54) build AI agents. This is really a (00:32:58) groundbreaking area of development. You (00:33:00) know, at first when pe when chat GPT (00:33:01) came out, people said, you know, uh (00:33:04) gosh, it it produced really interesting (00:33:06) results, but it hallucinated greatly. (00:33:08) And the reason why it hallucinated, of (00:33:10) course, it could memorize everything um (00:33:12) in the past, but it can't memorize (00:33:14) everything in the future, in the (00:33:15) current. And so it needs to be grounded (00:33:17) in research. It has to do fundamental (00:33:19) research before it answers a question. (00:33:22) The ability to reason about do I have to (00:33:25) do research? Do I have to use tools? How (00:33:26) do I break up a problem into steps? Each (00:33:29) one of these steps something that that (00:33:31) the AI model knows how to do. And (00:33:34) together it is able to compose it into a (00:33:37) sequence of steps to perform something (00:33:39) it's never done before, never been (00:33:40) trained to do. This is the wonderful (00:33:43) capability of reasoning. We could we (00:33:45) could be we can encounter a circumstance (00:33:47) we've never seen before and break it (00:33:49) down into circumstances and knowledge or (00:33:52) rules that we know how to do because (00:33:55) we've experienced it in the past. And so (00:33:58) the ability for AI models now to be able (00:33:59) to reason incredibly powerful. (00:34:02) The reasoning capability of agents open (00:34:04) the doors to all of these different (00:34:06) applications. We no longer have to train (00:34:08) an AI model to know everything on day (00:34:11) one. Just as we don't have to know (00:34:13) everything on day one that we should be (00:34:16) able to in every circumstance reason (00:34:18) about how to solve that problem. Large (00:34:21) language models has now made this (00:34:23) fundamental leap. The ability to use (00:34:25) reinforcement learning and chain of (00:34:26) thought and you know search and planning (00:34:29) and all these different techniques in (00:34:30) reinforcement learning has made it (00:34:32) possible for us to have this basic (00:34:34) capability and is also now completely (00:34:37) open sourced. But the thing that's (00:34:39) really terrific is another breakthrough (00:34:41) that happened and the first time I saw (00:34:43) it was with Arvin's perplexity. (00:34:46) Perplexity, the search company, the AI (00:34:49) search company, really f really (00:34:51) innovative company. And the first time I (00:34:53) realized they were using multiple models (00:34:55) at the same time, I thought it was (00:34:57) completely genius. Of course, we would (00:34:58) do that. Of course, an AI would also (00:35:02) call upon all of the world's great AIs (00:35:05) to solve the problem it wants to solve (00:35:07) at any part of the reasoning chain. And (00:35:10) this is the reason why AIs are really (00:35:14) multi-modal (00:35:16) meaning they understand speech and (00:35:19) images and text and videos and 3D (00:35:22) graphics and proteins. It's multimodal. (00:35:25) It's also multi-model (00:35:28) meaning that it should be able to use (00:35:29) any model that best fits the task. It is (00:35:35) multicloud by definition. Therefore, (00:35:37) because these AI models are sitting in (00:35:39) all these different places and it also (00:35:42) is hybrid cloud because if you're an (00:35:44) enterprise company or you've built a (00:35:47) robot or whatever that device is, (00:35:49) sometimes it's at the edge, sometimes a (00:35:51) radio cell tower, maybe sometimes it's (00:35:54) in an enterprise or maybe it's a place (00:35:56) where a hospital where you need to have (00:35:58) the the data in real time right next to (00:36:01) you. Whatever those applications are, we (00:36:04) know now this is what an AI application (00:36:07) looks like in the future. Or another way (00:36:09) to think about that because future (00:36:12) applications are built on AIS. (00:36:15) This is the basic framework of future (00:36:17) applications. (00:36:19) This basic framework, this basic (00:36:22) structure of agentic AIs that could do (00:36:25) the things that I'm talking about that (00:36:26) is multi-model (00:36:28) has now turbocharged (00:36:31) AI startups of all kinds. And now you (00:36:34) can also because of all of the open (00:36:37) models and all the tools that we (00:36:38) provided you, you could also customize (00:36:41) your AIs to teach your AI skills that (00:36:44) nobody else is teaching. Nobody else is (00:36:47) causing their AI to become intelligent (00:36:49) or smart in that way. You could do it (00:36:51) for yourself. And that's the work that (00:36:53) we do with Neimotron, Nemo, and all of (00:36:56) the things that we do with open models (00:36:58) is intended to do. You put a smart (00:37:00) router in front of it. And that router (00:37:02) is essentially a manager that decides (00:37:04) which one of the task based on the (00:37:06) intention of the prompts that you give (00:37:08) it, which one of the models is best fit (00:37:11) for that application for that solving (00:37:13) that problem. Okay. So now when you (00:37:16) think about this architecture, what do (00:37:18) you have? (00:37:20) When you think about this architecture, (00:37:21) all of a sudden you have an AI that's on (00:37:24) the one hand completely customizable by (00:37:27) you. Something that you could teach to (00:37:29) do your own very skills for your (00:37:31) company, something that's domain secret, (00:37:35) something where you have deep domain (00:37:36) expertise. Maybe you've got all of the (00:37:38) data that you need to train that AI (00:37:41) model. On the other hand, your AI is (00:37:45) always at the frontier by definition. (00:37:48) You're always at the frontier on the one (00:37:50) hand. You're always customized. On the (00:37:51) other hand, it should just run. And so (00:37:54) we thought we would make the simplest of (00:37:56) examples to make it available to you. (00:37:59) This entire framework we call a (00:38:01) blueprint and we have blueprints that (00:38:03) are integrated into enterprise SAS (00:38:06) platforms all over the world and we're (00:38:08) really pleased with the progress. But (00:38:09) what we do is show you a a short example (00:38:11) of something that anybody can do. (00:38:16) Let's build a personal assistant. (00:38:19) I wanted to help me with my calendar, (00:38:21) emails, [music] to-do lists, and even (00:38:23) keep an eye on my home. I use Brev to (00:38:26) turn my DGX Spark into a personal cloud. (00:38:29) So, I can use the same interface whether (00:38:31) I'm using a cloud GPU or a DGX Spark. I (00:38:34) use a Frontier model API to easily get (00:38:36) started. [music] (00:38:40) I want him to help me with my emails. (00:38:42) So, I create an email tool for my agent (00:38:44) to call. (00:38:46) I want my emails to stay private. So, (00:38:48) I'll add an open model that's running (00:38:49) locally on the Spark. (00:38:53) Now, for any job, [music] I want the (00:38:55) agent to use the right model for the (00:38:57) right task. So, I'll use an intentbased (00:38:59) model router. (00:39:02) This way, prompts that need email will (00:39:04) stay on my Spark, and everything else (00:39:07) can call the Frontier model. I want my (00:39:10) assistant to interact with my world, so (00:39:11) I'll hook it up to Hugging Faces Reachi (00:39:13) mini robot. (00:39:15) My agent controls the head, ears, and (00:39:18) camera of the Reichi with tool calls. I (00:39:20) want to give Richi a voice, and I really (00:39:22) like 11 Labs, so I'll hook up their API. (00:39:25) >> Hi, I'm Richi running on DGX Spark. (00:39:28) [music] (00:39:28) >> Hey Richi, what's on my to-do list (00:39:30) today? (00:39:31) your to-do list today. (00:39:34) Grab groceries, eggs, milk, butter, and (00:39:37) send Jensen the new script. (00:39:40) >> Okay, let's send Jensen an update. Tell (00:39:42) him we'll have it for him by the end of (00:39:43) the day. (00:39:44) >> We'll do. (00:39:45) >> Richi, there's a sketch, too. Can you (00:39:47) turn it into an architectural rendering? (00:39:50) >> Sure. (00:39:59) >> Nice. Now make a video and show me (00:40:01) around the room. (00:40:04) >> Here you go. (00:40:05) >> That's great. (00:40:08) >> With Brev, I can share access to my (00:40:10) Spark and Reachi, so I'm going to share (00:40:11) it with Anna. (00:40:14) >> Hey Richi, what's Potato up to? (00:40:18) >> He's on the couch. (00:40:20) I remember you don't like this. I'll (00:40:22) tell him to get off. Potato, off the (00:40:25) couch. (00:40:26) With all the progress in open source, (00:40:28) it's incredible to see what you can (00:40:29) build. I'd love to see what you create. (00:40:37) >> Isn't that incredible? (00:40:39) Now, the amazing thing is that is (00:40:42) utterly trivial now. That is utterly (00:40:45) trivial now. And yet, just a couple (00:40:48) years ago, all of that would have been (00:40:50) impossible. Absolutely unimaginable. (00:40:52) Well, this basic framework, this basic (00:40:55) way of building applications using (00:40:58) language models (00:41:07) using language models (00:41:11) [clears throat] using language models (00:41:13) using language models that are (00:41:15) pre-trained and they're proprietary. (00:41:18) They're frontier. combine it with (00:41:20) customized language models into a aentic (00:41:24) framework, a reasoning framework that (00:41:26) allows you to access tools and files and (00:41:29) maybe even connect to other agents. (00:41:32) This is basically the architecture of AI (00:41:36) applications or applications in the (00:41:39) modern age and the ability for us to (00:41:41) create these applications are incredibly (00:41:43) fast. And notice (00:41:46) if you give it this application um (00:41:49) information that it's never seen before (00:41:52) or in a structure that has is not (00:41:54) represented exactly as you thought it (00:41:58) can still reason through it and make it (00:42:01) best effort to reason through the data (00:42:03) the information to try to understand how (00:42:05) to solve the problem artificial (00:42:07) intelligence. Okay. Okay, so this basic (00:42:09) framework is now being integrated and (00:42:11) everything that I just described, we had (00:42:12) the benefit of working with some of the (00:42:14) world's leading enterprise platform (00:42:16) companies. Uh, Palunteer for example (00:42:20) um their their entire AI and data (00:42:23) processing platform is being integrated (00:42:25) accelerated by Nvidia today. Service Now (00:42:28) the world's leading customer service and (00:42:31) um employee service platform. Snowflake (00:42:33) the world's top data platform in the (00:42:36) cloud. Uh, incredible work that that is (00:42:39) being done there. Uh, Code Rabbit, we're (00:42:42) using Code Rabbit all over Nvidia. Uh, (00:42:44) Crowdstrike creating AIS to detect to (00:42:47) find AI threats. Uh, NetApp, their AI, (00:42:51) their data platform now has NVIDIA's (00:42:53) semantic AI on top of it and agentic (00:42:56) systems on top of it uh to for uh for (00:42:59) them to do customer service. But the (00:43:01) important thing is this. Not only is (00:43:03) this the way that you develop (00:43:04) applications now, this is going to be (00:43:07) the user interface of your platform. So (00:43:09) whether it's Palanteer or Service Now or (00:43:12) Snowflake and many other companies that (00:43:14) we're working with, the agentic system (00:43:17) is the interface. It's no longer Excel (00:43:21) with a bunch of, you know, squares that (00:43:23) you enter enter information into. Maybe (00:43:26) it's no longer could just command line. (00:43:28) the any all of that multimodality (00:43:31) information is now possible and the way (00:43:33) you interact with your platform is much (00:43:36) more well if you will simple like you're (00:43:38) interacting with people and so that's (00:43:41) enterprise AI being revolutionized by (00:43:44) angentic systems the next thing is (00:43:47) physical AI this is an area that you've (00:43:49) seen me talk about for several years in (00:43:51) fact we've been working on this for (00:43:52) eight years the question is how do you (00:43:56) take something that is intelligent (00:43:58) inside a computer and interacts with you (00:44:01) with screens and speakers to something (00:44:05) that can interact with the world. (00:44:07) Meaning it can understand the common (00:44:10) sense of how the world works. Object (00:44:13) permanence. If I look away and I look (00:44:14) back, that object is still there. Um (00:44:18) causality. If I push it, it tips over. (00:44:20) It understands friction and gravity. It (00:44:23) understands inertia. that a heavy truck (00:44:26) rolling down the road is going to need a (00:44:28) little bit more time to stop, that a (00:44:31) ball is going to keep on rolling. (00:44:34) These ideas are common sense to even a (00:44:36) little child, but for AI, it's (00:44:39) completely unknown. And so we have to (00:44:42) create a system that allows AIS to learn (00:44:45) the the common sense of the physical (00:44:46) world, learn its laws, but also to be (00:44:52) able to of course learn from data and (00:44:54) the data is quite scarce and to be able (00:44:57) to evaluate whether that AI is working, (00:44:59) meaning it has to simulate in an (00:45:02) environment. How does an AI know that (00:45:05) the the actions that it's performing is (00:45:08) consistent with what it should do if it (00:45:10) doesn't have the ability to simulate the (00:45:12) response of the physical world back on (00:45:14) its actions. The response of its actions (00:45:16) is really important to simulate (00:45:18) otherwise there's no way to evaluate it. (00:45:20) It's different every time. And so this (00:45:22) basic system requires three computers. (00:45:26) One computer of course the one that we (00:45:28) know that Nvidia builds for training the (00:45:30) AI models. Another computer that we know (00:45:34) is to inference the computer. Inference (00:45:36) the models. Inferencing the model is (00:45:38) essentially a robotics computer that (00:45:40) runs in a car or runs in a robot or runs (00:45:43) in a factory, runs anywhere at the edge. (00:45:45) But there has to be another computer (00:45:47) that's designed for simulation and (00:45:50) simulation is at the heart of almost (00:45:52) everything Nvidia does. This is this is (00:45:54) where we are most comfortable and (00:45:57) simulation was really the foundations of (00:46:00) almost everything that we've done with (00:46:01) physical AI. So we have three computers (00:46:04) and multiple stacks that run on these (00:46:07) computers, these libraries to make them (00:46:08) useful. Omniverse is our digital twin (00:46:11) physically based simulation world. (00:46:14) Cosmos as I mentioned earlier is our (00:46:17) foundation model not a foundation model (00:46:19) for language but a foundation model of (00:46:21) the world. (00:46:23) and is also aligned with language. You (00:46:26) could say something like, you know, (00:46:27) what's happening to the ball and they'll (00:46:28) they'll tell you the ball's rolling down (00:46:30) the street. And so a world foundation (00:46:32) model and then of course the robotics (00:46:34) models. We have two of them. One of them (00:46:37) is called Groot. The other one's called (00:46:39) Alpamo that I'm going to tell you about. (00:46:41) Now the one of the most important things (00:46:43) that we have to do with physical AI is (00:46:45) to create the data to train the AI in (00:46:47) the first place. Where does that data (00:46:48) come from? rather than instead of having (00:46:52) languages because we created a bunch of (00:46:54) texts that are what we consider ground (00:46:57) truth that the AI can learn from. How do (00:47:00) we teach an AI the ground truth of (00:47:02) physics? There lots and lots of videos, (00:47:05) lots and lots of videos, but hardly (00:47:07) enough to capture the diversity and the (00:47:09) type of interactions that we need. And (00:47:12) so this is where great minds came (00:47:15) together and transformed (00:47:18) what used to be compute into data. Now (00:47:23) using synthetic data generation that is (00:47:25) grounded and conditioned by the laws of (00:47:28) physics, grounded and conditioned by (00:47:31) ground truth, we can now selectively (00:47:35) cleverly generate data that we can then (00:47:38) use to train the AI. So for example, (00:47:41) what comes into this AI, this Cosmos AI (00:47:43) world model on the left on over here is (00:47:47) the output of a traffic simulator. (00:47:51) Now this traffic simulator (00:47:54) is hardly enough for an AI to learn (00:47:56) from. We can take this, put it into a (00:47:59) Cosmos foundation model and generate (00:48:03) surround video that is physically based (00:48:06) and physically plausible that the AI can (00:48:09) now learn from. And there are so many (00:48:11) examples of this. Let me show you what (00:48:13) Cosmos can do. (00:48:18) The chat GPT moment for physical AI is (00:48:21) nearly here, but the challenge is clear. (00:48:25) The physical world is diverse and (00:48:27) unpredictable. (00:48:29) Collecting real world training data is (00:48:32) slow and costly and it's never enough. (00:48:35) The answer is synthetic data. It starts (00:48:39) with NVIDIA Cosmos, an open Frontier (00:48:43) World Foundation model for physical AI (00:48:47) pre-trained on internet scale video, (00:48:50) real driving and robotics data, and 3D (00:48:52) [music] simulation. (00:48:55) Cosmos learned a unified representation (00:48:57) of the world, able to align language, (00:49:00) images, 3D, and action. (00:49:04) It performs physical AI skills like (00:49:06) generation, reasoning, and trajectory (00:49:09) prediction (00:49:11) from a single image. Cosmos generates (00:49:14) realistic video (00:49:17) from 3D scene descriptions, physically (00:49:21) coherent motion, (00:49:24) from driving telemetry and sensor logs, (00:49:26) surround video (00:49:30) from planning simulators, (00:49:32) multi- camera environments, (00:49:35) or from scenario prompts. It brings edge (00:49:38) cases to life. (00:49:41) Developers can run interactive closed (00:49:43) loop simulations in Cosmos. When actions (00:49:46) are made, the world responds. (00:49:51) Cosmos reasons. (00:49:54) It analyzes edge scenarios, (00:49:56) breaks them down into familiar physical (00:49:58) interactions, and [music] reasons about (00:50:01) what could happen next. (00:50:04) Cosmos turns compute [music] into data, (00:50:07) training AVs for the longtail and robots (00:50:10) how to adapt for every scenario. (00:50:21) I know it's incredible. (00:50:24) Cosmos is the world's leading foundation (00:50:28) model. World foundation model. It's been (00:50:30) downloaded millions of times, used all (00:50:32) over the world, getting world getting (00:50:35) the world ready for this new era of (00:50:36) physical AI. We use it ourselves as (00:50:39) well. We use it ourselves to create our (00:50:42) self-driving car, (00:50:45) using it for scenario generation and (00:50:48) using it for evaluation. (00:50:50) We could have something that allows us (00:50:53) to effectively travel billions, (00:50:56) trillions of miles, but doing it inside (00:50:59) a computer. And we've made enormous (00:51:01) progress. Today, we're announcing Alpio, (00:51:06) the world's first (00:51:08) thinking reasoning autonomous vehicle (00:51:12) AI. Alpo is trained end to end. (00:51:17) Literally from camera in to actuation (00:51:20) out. The camera in lots and lots of (00:51:23) miles that are driven by itself (00:51:26) where human drive it dri using human (00:51:30) demonstration (00:51:31) and we have lots and lots of miles that (00:51:33) are generated by cosmos. In addition to (00:51:36) that, hundreds of thousands of examples (00:51:39) are labeled very, very carefully so that (00:51:42) we could teach the car how to drive. (00:51:44) Alpha Mayo does something that's really (00:51:46) special. Not only does it take sensor (00:51:49) input and activates steering wheel, (00:51:53) brakes, and and acceleration, it also (00:51:57) reasons about what action it is about to (00:52:01) take. It tells you what action it's (00:52:03) going to take. the reason by which it (00:52:05) came about that action and then of (00:52:07) course the trajectory. (00:52:10) All of these are coupled directly and (00:52:12) trained very specifically by a large (00:52:15) combination of human trained and as well (00:52:17) as Cosmos generated data. The result of (00:52:22) it is just really incredible. Not only (00:52:24) does your car drive as you would expect (00:52:26) it to drive and it drives so naturally (00:52:29) because it learned directly from human (00:52:31) demonstrators but in every single (00:52:34) scenario when it comes up to the (00:52:35) scenario it reasons about it tells you (00:52:37) what it's going to do and it reasons (00:52:38) about what you what's about to do. Now (00:52:40) the reason why this is so important is (00:52:43) because of the long tale of driving (00:52:45) there. It's impossible for us to simply (00:52:48) collect every single possible scenario (00:52:51) for everything that could ever happen in (00:52:53) every single country in every single (00:52:54) circumstance that's possibly ever going (00:52:57) to happen for all the population. (00:53:00) However, it is very unlikely is very (00:53:03) likely that every scenario if decomposed (00:53:07) into a whole bunch of other smaller (00:53:09) scenarios are quite normal for you to (00:53:11) understand. And so these long tails will (00:53:15) be decomposed into quite normal (00:53:17) circumstances that the card knows how to (00:53:19) deal with. It just needs to reason about (00:53:21) it. And so let's take a look. Everything (00:53:22) you're about to see is one shot. It's a (00:53:26) no hands. (00:53:31) >> Routing to your destination. (00:53:34) Buckle up. (00:53:38) Heat. Heat. (00:54:03) Heat. Heat. (00:55:01) Hallelujah. (00:55:43) Heat. (00:55:47) Heat. (00:56:06) You have arrived. (00:56:19) >> [applause] (00:56:22) >> We started working on self-driving cars (00:56:23) eight years ago. And the reason for that (00:56:25) is because we reasoned early on that (00:56:29) deep learning and artificial (00:56:30) intelligence was going to reinvent the (00:56:31) entire computing stack. And if we were (00:56:34) ever going to understand how to navigate (00:56:38) ourselves and how to guide the industry (00:56:40) towards this new future, we have to get (00:56:42) good at building the entire stack. Well, (00:56:46) as I mentioned earlier, AI is a five (00:56:49) layer cake. The lowest layer is land (00:56:51) power and shell. In the case of (00:56:53) robotics, the lowest layer is the car. (00:56:55) The next layer above it is chips, GPUs, (00:56:58) networking chips, CPUs, all that kind of (00:57:00) stuff. The next layer above that is the (00:57:03) infrastructure. (00:57:05) That infrastructure in this particular (00:57:07) case as I mentioned with physical AI is (00:57:10) omniverse and cosmos. (00:57:12) And then above that are the models. And (00:57:16) in the case of the models above that I (00:57:20) just shown you, (00:57:22) the model here is called Alpha Mayo. And (00:57:25) Alpha Mayo today is open sourced. We (00:57:28) this incredible body of work. It took (00:57:31) several thousand people. Our AV team is (00:57:34) several thousand people. Just to put in (00:57:36) perspective, our partner uh Ola, I think (00:57:40) Ola's here in the audience somewhere. (00:57:42) Uh, Mercedes agreed to partner with us (00:57:45) five years ago to go make all of this (00:57:48) possible. We imagine that someday a (00:57:51) billion cars on the road will all be (00:57:52) autonomous. You could either have it be (00:57:55) a robo taxi that you're you're (00:57:57) you'rechestrating (00:57:58) and and renting from somebody or you (00:58:00) could own it and is driving for driving (00:58:02) by itself or you could decide to drive (00:58:04) for yourself and so but every single car (00:58:06) will have autonomous vehicle capability. (00:58:08) every single car will be AI powered. And (00:58:10) so the the the model layer in this case (00:58:13) is Alpha Mayo and the application above (00:58:15) that is the Mercedes-Benz. (00:58:18) Okay. And so, so this entire stack is (00:58:21) our first Nvidia first entire stack (00:58:24) endeavor and we've been working on it (00:58:26) for this entire time and I'm just so (00:58:28) happy that the first AV car from Nvidia (00:58:32) is going to be on the road in Q1 and (00:58:35) then it goes Europe in Q2 here in the (00:58:38) United States in Q1 then Europe in Q2 (00:58:40) and I think it's Asia in Q3 and Q4 and (00:58:43) the powerful thing is that we're going (00:58:44) to keep on updating it with next ver (00:58:47) next versions of Alpa Mayo and versions (00:58:48) after that. There's no question in my (00:58:51) mind now that this is going to be one of (00:58:53) the largest robotics industries and I'm (00:58:55) so happy that we worked on it and it (00:58:57) taught us enormous amount about how to (00:59:00) help the rest of the world build robotic (00:59:02) systems. That deep understanding in (00:59:05) knowing how to build it ourselves, (00:59:06) building the entire infrastructure (00:59:08) ourselves and knowing what kind of chips (00:59:10) a robotic system would would need. In (00:59:13) this partic particular case, dual Orins, (00:59:16) the next generation dual Thors. These (00:59:19) processors are designed for robotic (00:59:21) systems and was designed for the sa (00:59:24) highest level of safety capability. This (00:59:26) car just got rated. It just went to (00:59:31) production. The Mercedes-Benz CLA was (00:59:34) just rated by NCAAP, the world's safest (00:59:38) car. (00:59:42) >> [applause] (00:59:44) >> It is the only system that I know that (00:59:46) has every single line of code, the chip, (00:59:50) the system, every line of code safety (00:59:53) certified. The entire model system is (00:59:55) based on a sensors are diverse and (00:59:58) redundant and so is the self-driving car (01:00:01) stack. The Alpha Mayo stack is trained (01:00:03) end to end and has incredible skills. (01:00:07) However, nobody knows until you drive it (01:00:10) forever that it's going to be perfectly (01:00:12) safe. And so that we the way we guard (01:00:15) rail that is with another software (01:00:17) stack, an entire AV stack underneath. (01:00:20) That entire AV stack is built to be (01:00:22) fully traceable and it's taken us some (01:00:25) five years to build that some six, seven (01:00:27) years actually to build that second (01:00:28) stack. These two software stacks are (01:00:31) mirroring each other and then we have a (01:00:34) policy and safety evaluator to decide is (01:00:36) this something that I'm very confident (01:00:38) and can reason about driving very (01:00:41) safely. If so, I'm going to have Alpamo (01:00:43) do it. If it's a circumstance that I'm (01:00:44) not very confident in and the safety um (01:00:47) policy evaluator decide that we're going (01:00:50) to go back to a a very a simpler, safer (01:00:52) guard rail system, then it goes back to (01:00:54) the classical AV stack. We're the only (01:00:56) car in the world with both of these AV (01:00:58) stacks running and all safety systems (01:01:01) should have diversity and redundancy. (01:01:03) Well, our vision is that someday every (01:01:05) single car, every single truck will be (01:01:07) autonomous. And we've been working (01:01:08) towards that future. This entire stack (01:01:11) is vertically integrated. Of course, in (01:01:13) the case of Mercedes-Benz, we built the (01:01:15) entire stack together. We're going to (01:01:16) deploy the car. We're going to operate (01:01:18) the stack. We're going to maintain the (01:01:19) stack for as long as we shall live. (01:01:21) However, like everything else we do as a (01:01:24) company, we build the entire stack, but (01:01:27) the entire stack is open for the (01:01:29) ecosystem. And these the ecosystem (01:01:32) working with us to build L4 and robo (01:01:34) taxis is expanding and it's going (01:01:36) everywhere. (01:01:38) I fully expect this to be well this is (01:01:40) already a giant business for us. It's a (01:01:42) giant business for us because they use (01:01:44) it for training our training data, (01:01:46) processing data and training their (01:01:48) models. They use it for synthetic data (01:01:50) generation in some cases. In some car, (01:01:52) in some companies, they pretty much just (01:01:55) build uh the computers, the chips that (01:01:57) are inside the car. And some companies (01:01:59) work with us full stack. Some companies (01:02:01) work with us some partial part of that. (01:02:03) Okay? So, it doesn't matter uh how much (01:02:06) you decide to use. You know, my only (01:02:07) request is use a little bit of video (01:02:09) wherever you can and uh you know, but uh (01:02:13) the entire thing is open. Now this is (01:02:17) going to be the first largecale (01:02:20) mainstream (01:02:21) um AI physical AI market and this is now (01:02:25) I think we can all agree fully here and (01:02:28) this inflection point of going from not (01:02:31) autonomous vehicles to autonomous (01:02:33) vehicles is probably happening right (01:02:35) about this time in in the next 10 years (01:02:38) I'm fairly certain a very very large (01:02:41) percentage of the world's cars will be (01:02:43) autonomous or highly autonomous but this (01:02:45) This basic technique that I just (01:02:47) described in using the three computers (01:02:50) using synthetic data generation and (01:02:52) simulation applies to every form of (01:02:55) robotic systems. It could be a robot (01:02:57) that is just an articulator, a (01:02:59) manipulator, maybe it's a mobile robot, (01:03:01) maybe it's a fully humanoid robot. And (01:03:04) so the next journey, (01:03:07) the next era for robotic systems is (01:03:10) going to be, you know, robots. And these (01:03:12) robots are going to come in all kinds of (01:03:13) different sizes and and uh I invited (01:03:16) some friends. Did they come? (01:03:24) >> Hey guys, (01:03:26) hurry up. I got a lot of stuff to cover. (01:03:30) >> Come on, hurry. (01:03:35) Did you tell R2-D2 you were going to be (01:03:37) here? (01:03:38) >> Did you? And C3PO. (01:03:44) Okay. All right. Come here. Before now, (01:03:47) one of the things that one of the things (01:03:48) that's really You have Jetson's. They (01:03:50) have little Jetson computers inside (01:03:51) them. They're trained inside Omniverse. (01:03:55) And how about this? Let's show everybody (01:03:58) the simulator that you were that you (01:04:00) guys learned how to how to be robots in. (01:04:03) You you guys want to look at that? (01:04:05) >> Okay, let's look at that. Run it, (01:04:06) please. (01:04:10) See, (01:04:28) Okay. (01:05:08) Isn't that amazing? (01:05:13) That's how you learn to be a robot. You (01:05:15) did it all inside Omniverse. And the (01:05:18) robot simulator is called Isaac. Isaac (01:05:20) Sim and Isaac Lab. And anybody who wants (01:05:23) to build a robot, you know, nobody could (01:05:26) nobody's going to be as cute as you. (01:05:28) But now we have all look at all these (01:05:31) look at all these friends that we have (01:05:32) building robots. We have we're building (01:05:34) big ones. No, like I said, nobody's as (01:05:36) cute as you guys are. But we have (01:05:38) Neurobot and we have we have Aubot. (01:05:40) Aubot over there, you know. We have uh (01:05:44) LG over here. They just announced a new (01:05:46) robot, Caterpillar. They've got the (01:05:48) largest robots ever. That one delivers (01:05:52) food to your house. That's connected to (01:05:54) Uber Eats. And that's Surf Robot. I love (01:05:56) those guys. Agility, Boston Dynamics, (01:06:01) incredible. You got surgical robots, you (01:06:03) got manipulator robots from Franka, (01:06:07) you got universal robotics robot, (01:06:09) incredible number of different robots. (01:06:11) And so this is the next chapter. We're (01:06:14) going to talk a lot more about robotics (01:06:15) in the future, but it's not just about (01:06:18) the robots in the end. I know (01:06:19) everything's about you guys. It's about (01:06:21) getting there. And one of the air one of (01:06:24) the most important industries in the (01:06:25) world that will be revolutionized by (01:06:27) physical AI and AI physics (01:06:31) is the industry that started all of us (01:06:34) at NVIDIA. It wouldn't be possible if (01:06:37) not for the companies that I'm about to (01:06:39) talk to. And I'm so happy that all of (01:06:41) them starting with Cadence is going to (01:06:43) accelerate everything. Cadence CUDA X (01:06:46) integrated into all of their simulations (01:06:48) and solvers. They've got uh Nvidia (01:06:51) physical physical AIs that they're going (01:06:53) to use for uh for different um physical (01:06:56) plants and plant simulations. You got AI (01:06:58) physics being integrated into these (01:07:00) systems. So whether it's an EDA or STA (01:07:04) um and in the future robotic systems, (01:07:06) we're going to have basically the same (01:07:08) technology that made you guys possible (01:07:11) now completely revolutionize these (01:07:13) design stacks. Synopsis without synopsis (01:07:16) you know synopsis and cadence are (01:07:19) completely completely indispensable in (01:07:22) the world of chip design. Synopsis is uh (01:07:25) leads in uh and uh logic design and and (01:07:29) IP uh in the case of cadence they lead (01:07:32) physical design the place and route uh (01:07:35) and emulation and verification. Cadence (01:07:37) is incredible at emulation and (01:07:39) verification. Both of them are moving (01:07:41) into the world of system design and (01:07:42) system simulation. And so in the future, (01:07:46) we're going to design your chips inside (01:07:49) Cadence and inside Synopsis. We're going (01:07:51) to design your systems and emulate the (01:07:53) whole thing and simulate everything (01:07:56) inside these tools. That's your future. (01:07:58) We're going to give Yeah, you're going (01:07:59) to be born inside these inside these (01:08:02) platforms. Pretty amazing, right? And so (01:08:05) we're so happy that we're working with (01:08:06) these these industries just as we've (01:08:08) integrated NVIDIA into Palunteer and (01:08:11) Service Now we're integrating NVIDIA (01:08:13) into the most computationally intensive (01:08:16) simulation industries synopsis and (01:08:19) cadence. And today we're announcing that (01:08:22) Seammens is also doing the same thing. (01:08:24) We're going to integrate CUDA X physical (01:08:27) AI agentic AI neo neotron deeply (01:08:30) integrated into the world of seammens. (01:08:33) And the reason for that is this. First, (01:08:36) we designed the chips (01:08:39) and all of it in the future will be (01:08:40) accelerated by Nvidia. You're going to (01:08:42) be very happy about that. We're going to (01:08:44) have Agentic chip designers and system (01:08:46) designers working with us, helping us do (01:08:49) design just as we have agentic software (01:08:52) engineers helping our software engineers (01:08:54) code today. And so, we'll have agentic (01:08:56) chip designers and system designers. (01:08:58) We're going to create you inside this. (01:09:01) But then we have to build you. We have (01:09:04) to build the plants, the factories that (01:09:07) make manufacture you. We have to design (01:09:11) the manufacturing lines that assemble (01:09:13) all of you. And these manufacturing (01:09:16) plants are going to be essentially (01:09:18) gigantic robots. Incredible, isn't that (01:09:21) right? (01:09:22) I know. I know. And so you're going to (01:09:25) be designed in a computer. You're going (01:09:28) to be made in a computer. You're gonna (01:09:30) be tested and evaluated in a computer (01:09:32) long before long before you have to (01:09:35) spend any time dealing with gravity. (01:09:38) I know. (01:09:40) Do you know how to deal with gravity? (01:09:43) Can you jump? (01:09:46) Can you jump? (01:09:56) >> [laughter] (01:09:58) >> Okay. All right. Don't show off. Okay. (01:10:00) So, so this so now (01:10:04) the industry the industry that made (01:10:06) Nvidia possible, we're I'm just so happy (01:10:09) that that now the technology that we're (01:10:11) creating is at a level of sophistication (01:10:13) and capability that we can now help them (01:10:16) revolutionize their industry. And so (01:10:18) what started with with uh with them, we (01:10:21) now have the opportunity to go back and (01:10:22) and help them revolutionize theirs. (01:10:25) Let's take a look at the stuff that (01:10:26) we're going to do with Semens. (01:10:28) Come on. (01:10:30) Breakthroughs in physical AI are letting (01:10:33) AI move from screens to our physical (01:10:36) world. (01:10:38) And just in time, as the world builds (01:10:41) factories of every kind for chips, (01:10:44) computers, life-saving drugs, and AI, as (01:10:48) the global labor shortage worsens, we (01:10:50) need automation powered [music] by (01:10:52) physical AI and robotics more than ever. (01:10:57) This, where AI meets the world's largest (01:11:00) physical industries, is the foundation (01:11:02) of NVIDIA and Seaman's partnership. For (01:11:05) nearly two centuries, Seammens has built (01:11:08) the world's industries. (01:11:10) And now [music] it is reinventing it for (01:11:12) the age of AI. (01:11:15) Seammens is integrating NVIDIA CUDA X (01:11:18) libraries, AI models, and Omniverse (01:11:22) into its portfolio of EDA, (01:11:27) CAE, (01:11:29) and digital [music] twin tools and (01:11:31) platforms. (01:11:33) Together, we're bringing physical AI to (01:11:36) the full industrial life cycle. (01:11:39) From design and simulation (01:11:42) to production (01:11:46) and operations, (01:11:48) we stand at the beginning of a new (01:11:50) industrial revolution, the age of (01:11:52) physical AI built by Nvidia and Seammens (01:11:56) for the next age of industries. (01:12:02) Incredible, right guys? (01:12:06) What do you think? All right, I'll hang (01:12:08) on tight. Just hang on tight. And so so (01:12:11) this is, you know, if you look at look (01:12:13) at the world's models, there's no (01:12:16) question OpenAI is the the the leading (01:12:19) token generator today. More to more open (01:12:22) AAI tokens are generated than just about (01:12:23) anything else. The second largest group, (01:12:26) the second largest is probably open (01:12:28) models. And my guess is that over time (01:12:30) because there are so many companies, so (01:12:32) many researchers, so many different (01:12:34) types of domains and modalities that (01:12:36) open-source models will be by far the (01:12:38) largest. Let's talk about somebody (01:12:40) really special. You guys want to do (01:12:42) that? Let's talk about Vera Rubin. (01:12:46) Vera Rubin. Yeah, go ahead. She's a (01:12:49) American astronomer. (01:12:51) She was the first to observe. She (01:12:53) noticed that the tails of the galaxies (01:12:56) were moving about as fast (01:12:59) as the center of the galaxies. Well, I (01:13:03) know it makes no sense. It makes no (01:13:05) sense. Newtonian physics would say just (01:13:07) like the solar system, the planets (01:13:09) further away from the sun is circulating (01:13:13) circ cir circling the sun slower than (01:13:16) the planets closer to the sun. And (01:13:19) therefore it makes no sense that this (01:13:21) happens unless there's (01:13:24) invisible bodies we call them she (01:13:26) discovered dark body dark matter um that (01:13:31) occupies space even though we don't see (01:13:33) it and so Vera Rubin is the person that (01:13:35) we named our next computer after. Isn't (01:13:39) that a good idea? (01:13:41) I know. (01:13:45) Okay. Okay, Vera Rubin is designed to (01:13:47) address this fundamental challenge that (01:13:49) we have. The amount of computation (01:13:51) necessary for AI is skyrocketing. The (01:13:55) demand for NVIDIA GPUs is skyrocketing. (01:13:58) It's skyrocketing because models are (01:14:00) increasing by a factor of 10, an order (01:14:02) of a magnitude every single year. And (01:14:06) not to mention, as I mentioned, 01's (01:14:08) introduction was an inflection point for (01:14:11) AI. Instead of a oneshot answer, (01:14:14) inference is now a thinking process. And (01:14:17) in order to teach the AI how to think, (01:14:20) reinforcement learning and very (01:14:23) significant computation was introduced (01:14:25) into post training. It wasn't no long (01:14:28) it's no longer supervised fine-tuning or (01:14:31) otherwise known as imitation learning or (01:14:33) supervision training. (01:14:35) You now have reinforcement learning. (01:14:37) Essentially the computer trial it trying (01:14:40) different iterations itself learning how (01:14:42) to perform a task. The amount of (01:14:45) computation for pre pre-training for (01:14:48) post- training for test time scaling has (01:14:50) exploded as a result of that. And now (01:14:53) every single inference that we do (01:14:55) instead of just one shot the number of (01:14:57) tokens you can just see the AI think (01:14:59) which we appreciate. The longer it (01:15:01) thinks oftentimes it produces a better (01:15:02) answer. And so test time scaling causes (01:15:05) the number of tokens to be generated to (01:15:07) increase by 5x every single year. Not to (01:15:10) mention, (01:15:12) meanwhile, the race is on for AI. (01:15:15) Everybody's trying to get to the next (01:15:17) level. Everybody's trying to get to the (01:15:18) next frontier. And every time they get (01:15:20) to the next frontier, the last (01:15:22) generation AI tokens, the cost starts to (01:15:26) starts to decline about a factor of 10x (01:15:29) every year. The 10x decline every year (01:15:31) is actually telling you something (01:15:33) different. It's saying that the race is (01:15:35) so intense. Everybody's trying to get to (01:15:37) the next level and somebody is getting (01:15:39) to the next level. And so therefore, all (01:15:42) of it is a computing problem. The faster (01:15:44) you compute, the sooner you can get to (01:15:46) the next level of the next frontier. All (01:15:49) of these things are simultaneously (01:15:50) happening at the same time. And so we (01:15:53) decided that we have to advance (01:15:56) the state-of-the-art of computation (01:15:59) every single year. Not one year left (01:16:02) behind. And now we've been shipping (01:16:05) GB200s (01:16:07) year and a half ago. Right now we're in (01:16:09) fullscale manufacturing of GB300. (01:16:13) And if Vera Rubin is going to be in time (01:16:16) for this year, it must be in production (01:16:19) by now. And so today I can tell you that (01:16:22) Vera Rubin is in full production. (01:16:30) You guys want to take a look at Vera (01:16:31) Rubin? (01:16:32) >> All right. Come on. (01:16:34) >> Play it, please. (01:16:38) Vera Rubin arrives just in time for the (01:16:41) next frontier of AI. (01:16:44) This is [music] the story of how we (01:16:45) built it. The architecture, a system of (01:16:49) six chips [music] engineered to work as (01:16:51) one, born from extreme code design. It (01:16:54) begins with Vera, [music] a (01:16:55) custom-designed CPU, double the (01:16:57) performance of the previous generation. (01:16:59) And the Reuben GPU, Vera and Reuben are (01:17:02) co-designed from the [music] start to (01:17:04) birectionally and coherently share data (01:17:07) faster and with lower latency. (01:17:10) Then 17,000 components come together on (01:17:14) a Ver Rubin compute board. (01:17:18) High-speed robots place components with (01:17:21) micro precision before the Vera CPU and (01:17:24) two Reuben GPUs complete the assembly. (01:17:28) Capable of delivering 100 pedlops of AI, (01:17:32) five times that of its predecessor. (01:17:36) AI needs data fast. (01:17:39) Connect X9 delivers 1.6 6 terabts per (01:17:42) second of scale out bandwidth to each (01:17:45) GPU. (01:17:48) Bluefield 4 DPU offloads storage and (01:17:50) security [music] so compute stays fully (01:17:53) focused on AI. (01:17:55) The Vera Rubin compute tray completely (01:17:58) redesigned with no cables, hoses, or (01:18:01) fans. Featuring a Bluefield 4 DPU, eight (01:18:05) Connect X9 Nix, two Vera CPUs, and four (01:18:10) Reuben GPUs. The compute building block (01:18:13) of the Vera Rubin AI supercomput. (01:18:16) Next, the sixth generation MVLink (01:18:20) switch. Moving more data than the global (01:18:23) internet, connecting 18 compute nodes, (01:18:26) scaling up to 72 Reuben GPUs, operating (01:18:29) as one. (01:18:33) Then Spectrum X Ethernet Photonix, (01:18:38) the world's first Ethernet [music] (01:18:40) switch with 512 lanes and 200 Gbit (01:18:43) capable co-packaged optics scale out (01:18:46) thousands of racks into an AI factory. (01:18:51) 15,000 engineer years since design (01:18:53) began, the first Vera Rubin MVL 72 (01:18:57) [music] rack comes online. Six (01:19:00) breakthrough chips, 18 compute trades, (01:19:03) nine MVLink switch trays, 220 trillion (01:19:06) transistors weighing nearly two tons. (01:19:12) One giant leap to the next frontier of (01:19:15) AI. (01:19:17) Reuben is here. (01:19:24) What do you guys think? (01:19:29) This is a Reuben pod. 1152 GPUs (01:19:35) in 16 racks. Each one of the racks, as (01:19:39) you know, has 72 (01:19:44) Vera Rubin or 72 Reubins. Each one of (01:19:48) the Reubins is two actual GPU dies (01:19:51) connected together. I'm going to show (01:19:53) I'm going to show it to you, but there (01:19:55) are several things that Well, I'll tell (01:19:58) you later. (01:20:00) I can't tell you everything right away. (01:20:04) Well, we designed six different chips. (01:20:07) First of all, we have a rule inside our (01:20:08) company and it's a good rule. No new (01:20:11) generation should have more than one or (01:20:14) two chips change. But the problem is (01:20:16) this. As you could see, we were (01:20:19) describing the total number of (01:20:20) transistors in each one of the chips (01:20:22) that were being described. And we know (01:20:24) that Moore's law has largely slowed. And (01:20:26) so, the number of transistors we can get (01:20:29) year after year after year can't (01:20:32) possibly keep up with the 10 times (01:20:35) larger models. It can't possibly keep up (01:20:38) with five times per year more tokens (01:20:41) generated. It can't possibly keep up (01:20:43) with the fact that (01:20:45) cost decline of the tokens are going to (01:20:47) be so aggressive. It is impossible to (01:20:50) keep up with those kind of rates if the (01:20:52) indust for the industry to continue to (01:20:54) advance unless we deploy aggressive (01:20:58) extreme code design basically innovating (01:21:01) across all of the chips across the (01:21:03) entire stack all at the same time. which (01:21:06) is the reason why we decided that this (01:21:08) generation we had no choice but to (01:21:10) design every chip over again. Now every (01:21:14) single chip that we were describing just (01:21:16) now can be a press conference in all in (01:21:18) itself and there's an entire company (01:21:20) who's probably dedicated to doing that (01:21:21) back in the old days. Each one of them (01:21:23) are completely revolutionary and the (01:21:25) best of its kind. (01:21:28) The Vera CPU I'm so proud of it in a (01:21:31) power constrained world. Gray CPU is two (01:21:35) times the performance in a power (01:21:38) constrained world. It's twice the (01:21:40) performance per watt of the world's most (01:21:42) advanced CPUs. Its data rate is insane. (01:21:45) It was designed to process (01:21:48) supercomputers and Vera was an (01:21:51) incredible GPU. Grace was an incredible (01:21:53) GPU. Now Vera increases the single (01:21:57) threaded performance, increases the (01:21:59) capacity of the memory, increases (01:22:01) everything just dramatically. It's a (01:22:03) giant chip. This is the Vera CPU. (01:22:06) This is one CPU. (01:22:09) And this is connected to (01:22:14) the Reuben GPU. Look at that thing. (01:22:18) It's a giant chip. Now, the thing that's (01:22:20) really special, and I I'll go through (01:22:23) these. It's going to take three hands. I (01:22:25) think four hands to do this. Okay. So, (01:22:28) this is the Vera CPU. It's got 88 CPU (01:22:31) cores. And the CPU cores are designed to (01:22:33) be multi-threaded. But the (01:22:34) multi-threaded nature of of Vera was (01:22:37) designed so that each one of the 176 (01:22:40) threads could get its full full (01:22:43) performance. So it's essentially as of (01:22:45) there's 176 cores but only 88 physical (01:22:48) cores. So these cores were designed in (01:22:50) in using a technology called spatial (01:22:52) multi-threading. But the IO performance (01:22:55) is incredible. This is the Reuben GPU. (01:22:57) It's 5x blackwell in floating (01:23:00) performance. But the important thing is (01:23:02) go to the bottom line. The bottom line (01:23:03) it's only 1.6 times the number of (01:23:06) transistors in black wall. That kind of (01:23:07) tells you something about the the levels (01:23:10) of semiconductor physics today. If we (01:23:12) don't do code design, if we do don't do (01:23:15) extreme code design at the level of (01:23:17) basically every single chip (01:23:20) across the entire system, how is it (01:23:22) possible we deliver performance levels (01:23:25) that is, you know, at best one point 1 (01:23:28) 1.6 times each year? Because that's the (01:23:30) total number of transistors you have. (01:23:32) And even if you were to have a little (01:23:34) bit more performance per transistor, say (01:23:36) 25%, you're this impossible to get a (01:23:39) 100% yield out of the number of (01:23:41) transistors you get. And so 1.6x kind of (01:23:44) puts a ceiling on how far performance (01:23:46) can go each year unless you do something (01:23:48) extreme. And we call it extreme code (01:23:50) design. Well, one of the things that one (01:23:52) of the things that we did and it was a (01:23:53) great invention. It's called MVF FP4 (01:23:56) tensor core. The transformer engine (01:23:59) inside our chip is not just a 4bit (01:24:02) floatingoint number somehow that we put (01:24:04) into the data path. It is an entire (01:24:06) processor, a processing unit that (01:24:10) understands how to dynamically, (01:24:12) adaptively adjust its precision and (01:24:15) structure to deal with different levels (01:24:18) of the transformer so that you can (01:24:19) achieve higher throughput wherever it's (01:24:22) possible to lose precision and to go (01:24:25) back to the highest possible precision (01:24:26) wherever you need to. That ability to (01:24:29) dynamically do that. You can't do this (01:24:32) in software because obviously it's just (01:24:34) running too fast. And so you have to be (01:24:37) able to do it adaptively inside the (01:24:39) processor. That's what an MVF FP4 is. (01:24:42) When somebody says FP4 or FP8, it almost (01:24:45) means nothing to us. And the reason for (01:24:47) that is because it's the tensor core (01:24:49) structure in all of the algorithms that (01:24:50) makes makes it work. MVFP4, we've (01:24:53) published papers on this already. The (01:24:55) precision that the the level of (01:24:57) throughput and precision is able to (01:24:58) retain is in completely incredible. This (01:25:01) is groundbreaking work. I would not be (01:25:03) surprised that the industry would like (01:25:05) us to make this format and this (01:25:06) structure and industry standard in the (01:25:08) future. This is completely (01:25:10) revolutionary. This is how we were able (01:25:12) to deliver such a gigantic step up in (01:25:15) performance even though we only have 1.6 (01:25:18) times the number of transistors. Okay. (01:25:21) So this is and now once you have a great (01:25:23) processing node and this is the (01:25:25) processor node and inside so this is (01:25:29) this is for example here let me do this. (01:25:36) This is this is wow it's super heavy. (01:25:40) You have to be a co in really good shape (01:25:42) to do this job. (01:25:46) Okay. All right. So, this thing is I'm (01:25:50) gonna guess this is probably I don't (01:25:53) know couple of hundred pounds. (01:25:59) [laughter] (01:26:01) I thought that was funny, too. (01:26:04) Come on. It could have been. Everybody's (01:26:07) gone. No, I don't think so. (01:26:10) All right. [clears throat] (01:26:11) So, so look at this. This is the last (01:26:13) one. We revolutionized the entire MGX (01:26:17) chassis. This node, (01:26:20) 43 cables, (01:26:22) zero cables, six tubes, (01:26:28) z just two of them here. It takes two (01:26:32) hours to assemble this. (01:26:35) If you're lucky, it takes two hours. And (01:26:38) of course, you're probably going to (01:26:39) assemble it wrong. You're going to have (01:26:40) to retest it, test it, reassemble it. So (01:26:43) the assembly process is incredibly (01:26:45) complicated and it was understandable as (01:26:47) one of our first supercomputers that's (01:26:49) deconstructed in this way. This from 2 (01:26:52) hours to 5 minutes (01:27:00) 80% liquid cool. (01:27:03) 100% liquid cool. (01:27:06) Yeah. Really really a breakthrough. (01:27:09) Okay. So, so this is the new compute (01:27:12) chassis and what connects all of these (01:27:16) to the top of rack switches, the east (01:27:18) west traffic is called the Spectrox (01:27:20) Nick. This is the world's best nick. (01:27:23) Unquestionably, Nvidia's Melanox, the (01:27:26) acquisition Melanox that joined us a (01:27:28) long time ago now. Um, this their (01:27:30) networking technology for high (01:27:31) performance computing is the world's (01:27:33) best bar none. the algorithms, the chip (01:27:36) design, all of the interconnects, all (01:27:38) the software stacks that run on top of (01:27:39) it, their RDMA, absolutely absolutely (01:27:42) bar none, the world's best. And now it (01:27:44) has the ability to do programmable RDMA (01:27:46) and data path accelerator so that our (01:27:49) partners like AI labs could create their (01:27:52) own algorithms for how they want to move (01:27:54) data around the system. But this is (01:27:55) completely world worldclass connect (01:27:58) X9 and the Vera CPU were co-designed and (01:28:02) we never revealed it. not never never (01:28:04) released it until CX9 came along because (01:28:08) we we co-designed it for a new type of (01:28:10) processor. (01:28:12) You know, Connect X9 or CX8 and Spectrum (01:28:15) X revolutionized how Ethernet was done (01:28:19) for artificial intelligence. Ethernet (01:28:21) traffic for AI is much much more (01:28:24) intense, requires much lower latency. (01:28:27) The the instantaneous surge of traffic (01:28:30) is unlike anything Ethernet sees. And so (01:28:32) we created Spectrum X which is AI (01:28:35) Ethernet. (01:28:37) Two years ago we announced Spectrum X. (01:28:39) NVIDIA today is the largest networking (01:28:42) company the world has ever seen. So it's (01:28:45) been so successful and used in so many (01:28:47) different installations. It is just (01:28:49) sweeping uh the AI landscape. The (01:28:52) performance is incredible especially (01:28:54) when you have a 200 um megawatt data (01:28:59) center or if you have a gigawatt data (01:29:00) center. These are billions of dollars. (01:29:03) Let's say a gigawatt data center is $50 (01:29:05) billion dollars. If the networking (01:29:07) performance allows you to deliver an (01:29:10) extra 10% (01:29:13) in the case of Spectrum X, delivering (01:29:15) 25% higher throughput is not uncommon. (01:29:18) If we were to just deliver 10% that's (01:29:20) worth $5 billion. The networking is (01:29:23) completely free, which is the reason (01:29:25) why, well, (01:29:28) everybody uses Spectrum X. It's just an (01:29:30) incredible thing. And now we're going to (01:29:32) invent a new type a new type of uh uh (01:29:35) data processing. And so spectral is for (01:29:38) east west traffic. We now have a new (01:29:41) processor called blue field 4. Blue (01:29:43) field 4 allows us to take a large large (01:29:45) very large data center isolate different (01:29:48) parts of it so that different users (01:29:50) could use different parts of it. Make (01:29:51) sure that everything could be (01:29:52) virtualized if they decide to be (01:29:54) virtualized. So you offload a lot of the (01:29:57) um virtualization software, the security (01:29:59) software, the networking software for (01:30:01) your north south traffic. And so (01:30:03) Bluefield 4 comes standard with every (01:30:06) single one of these compute nodes. (01:30:08) Bluefield 4 has a second application I'm (01:30:10) going to talk about in just a second. (01:30:12) This is a revolutionary processor and (01:30:14) I'm so excited about it. This is the (01:30:16) MVLink 6 switch (01:30:19) and (01:30:21) it's right here. (01:30:22) This is the this switch. This switchip (01:30:26) there are four of them inside the MVLink (01:30:28) switch here. (01:30:30) Each one of these switchips has the (01:30:32) fastest certis in history. The world is (01:30:35) barely getting to 200 gigabits. This is (01:30:38) 400 gigabits per second switch. The (01:30:41) reason why this is so important is so (01:30:43) that we could have every single GPU talk (01:30:46) to every other GPU at exactly the same (01:30:48) time. This switch, this switch on the (01:30:52) back plane of one of these racks enables (01:30:56) us to move the equivalent of twice the (01:30:59) amount of the global internet data, (01:31:03) twice as all of the world's internet (01:31:05) data at twice the speed. You take the (01:31:09) cross-sectional bandwidth of the entire (01:31:11) planet's internet, it's about 100 (01:31:13) terabytes per second. This is 240 (01:31:16) terabytes per second. So it kind of puts (01:31:18) it in perspective. This is so that every (01:31:20) single GPU can work with every single (01:31:22) other GPU at exactly the same time. (01:31:24) Okay. Then on top of that (01:31:29) on top of that okay so this is one rack. (01:31:31) This is one rack. Each one of the racks (01:31:33) as you could see the number of (01:31:35) transistors in this one rack (01:31:39) is 1.7 times. (01:31:44) Yeah. Could you do this for me? So, this (01:31:46) is it's usually about two tons, but (01:31:49) today it's two and a half tons because (01:31:52) um when they shipped it, they forgot to (01:31:54) drain the water out of it. (01:31:58) So, we we shipped a lot of water from (01:32:00) California. (01:32:02) [clears throat] (01:32:05) Can you hear it squealing? (01:32:07) You know, when you're rotating two and a (01:32:08) half tons, (01:32:11) you're going to squeal a little. (01:32:14) Oh, you could do it. Wow. (01:32:19) Okay, we just we won't make you do that (01:32:21) twice. All right. So, so um so behind (01:32:25) behind this are the MVLink spines. (01:32:29) Basically, two miles of copper cables. (01:32:32) Copper is the best conductor we know. (01:32:34) And these are all shielded copper (01:32:36) cables, structured copper cables, the (01:32:38) most the world's ever used in computing (01:32:40) systems ever. and and um uh our certis (01:32:45) drive the copper cables from the top of (01:32:47) the rack all the way to the bottom of (01:32:48) the rack at 400 gigabits per second. (01:32:51) It's incredible. And so uh this has two (01:32:54) miles of total copper cables, 5,000 (01:32:56) copper cables, and this makes the MVLink (01:33:00) uh spine possible. This is the (01:33:02) revolution that that really started the (01:33:05) NGX system. Now we we decided that we (01:33:09) would create an industry standard system (01:33:11) so that the entire ecosystem all of our (01:33:13) supply chain could standardize on these (01:33:16) components. There some 80,000 (01:33:20) different components that make up this (01:33:23) these NGX systems and it's a total waste (01:33:26) if we're to change it every single year. (01:33:28) Every single major computer company from (01:33:30) Foxcon to Quanta to Wistron, you know, (01:33:32) the list goes on and on and on to HP and (01:33:35) Dell and Lenovo, everybody knows how to (01:33:38) build these systems. And so the fact (01:33:40) that we could squeeze Ruben, Vera Rubin (01:33:43) into this even though the performance is (01:33:46) so much so much higher and very (01:33:49) importantly the power is twice as high. (01:33:52) The power of Vera Rubin is twice as high (01:33:54) as Grace Blackwell. And yet, and this is (01:33:58) the miracle, (01:33:59) the air that goes into it, the the air (01:34:03) flow is about the same. And very (01:34:05) importantly, the water that goes into it (01:34:07) is the same temperature, 45° C. With 45° (01:34:11) C, no water chillers are necessary for (01:34:15) data centers. We're basically cooling (01:34:18) this supercomput with hot water. Is so (01:34:22) incredibly efficient. And so (01:34:25) this is um this is the new the new rack. (01:34:28) 1.7 times more transistors but five (01:34:31) times more peak inference performance. (01:34:34) Three and a half times more peak um uh (01:34:37) uh training performance. (01:34:40) Okay. (01:34:42) They're connected on top using Spectrum (01:34:44) X. Oh, thank you. (01:34:52) This is this is the world's first (01:34:53) manufacturing chip using (01:34:56) uh TSMC's (01:34:58) new process that we co-inovated called (01:35:01) coupe. is a silicon photonix integrated (01:35:03) silicon photonix process technology. And (01:35:06) this allows us to take silicon photonix (01:35:09) directly right to the chip. And this is (01:35:12) 512 ports at 200 Gbits per second. And (01:35:16) this is the new Ethernet AI switch, the (01:35:20) Spectrum X Ethernet switch. And look at (01:35:22) this giant chip. But what's really (01:35:24) amazing, it's got silicon photonics (01:35:26) directly connected to it. And lasers (01:35:29) come in (01:35:33) Lasers come in through here. Lasers come (01:35:35) in through here. The optics are here and (01:35:38) they connect out to the rest of the data (01:35:41) data center. This I'll show you in a (01:35:42) second, but this is on top of the rack. (01:35:44) And this is the new Spectrumax (01:35:47) um (01:35:49) Silicon Photonix switch. Okay. (01:35:54) And we have something new I want to tell (01:35:56) you about. So just as I mentioned a (01:35:58) couple years ago, (01:36:00) we introduced Spectrum X so that we (01:36:03) could reinvent the way that networking (01:36:05) is done. Um Ethernet is really easy to (01:36:08) manage and everybody has an Ethernet (01:36:09) stack and every data center in the world (01:36:11) knows how to deal with Ethernet. Um and (01:36:13) the only thing that we were we were (01:36:15) using at the time was called Infiniband (01:36:17) which is used for supercomputers. (01:36:19) Infiniband is very low latency. Um but (01:36:23) of course the software stack the entire (01:36:25) manageability of Infiniband is very (01:36:27) alien to the people who use Ethernet. So (01:36:29) we decided to enter the Ethernet switch (01:36:31) market for the very first time. Spectrum (01:36:33) X that just took off and it made us the (01:36:37) largest networking company in the world (01:36:38) as I mentioned. This next generation (01:36:41) Spectrum X is going to carry on that (01:36:42) tradition. But just as I said earlier AI (01:36:46) has reinvented the whole computing (01:36:48) stack, every layer of the computing (01:36:49) stack. It stands to reason that when AI (01:36:53) starts to get deployed in the world's (01:36:55) enterprises, it's going to also reinvent (01:36:57) the way storage is done. Well, AI (01:36:59) doesn't use SQL. AI use semantics (01:37:02) information. And when AI is being used, (01:37:05) it creates this temporary knowledge, (01:37:08) temporary temporary memory calls KV (01:37:10) cache, K key value combinations, but (01:37:14) it's a KV cache. Basically, the cache of (01:37:16) the AI, the working memory of the AI. (01:37:18) And the working memory of the AI is (01:37:20) stored in the HBM memory. Every single (01:37:24) token for every single token, (01:37:27) the H the GPU reads in the model, the (01:37:32) entire model, it reads in the entire (01:37:34) working memory and it produces one token (01:37:38) and it stores that one token back into (01:37:40) the KV cache. And then the next to the (01:37:43) next time it does that, it reads in the (01:37:45) entire memory, reads it and it streams (01:37:48) it through our GPU and then generates (01:37:50) another token. Well, it does this (01:37:52) repeatedly, token after token after (01:37:54) token. And obviously, if you have a long (01:37:56) conversation with that AI over time, (01:37:58) that memory, that context memory is (01:38:00) going to grow tremendously. Not to (01:38:02) mention, the models are growing, the (01:38:03) number of turns that we're using, the AI (01:38:06) are are increasing. We would like to (01:38:07) have this AI stay with us our entire (01:38:09) life and remember every single (01:38:11) conversation we've ever had with it, (01:38:13) right? Every single lick of research (01:38:14) that I've asked it for. Of course, we (01:38:16) the number of people that will be (01:38:18) sharing the supercomputers is going to (01:38:19) continue to grow. And so this context (01:38:22) memory which started out fitting inside (01:38:24) an HBM is no longer large enough. Last (01:38:27) year we created Grace Blackwell's (01:38:32) very fast memory. we called fast context (01:38:35) memory in that's the reason why we (01:38:37) connected grace directly to hopper (01:38:40) that's why we connected grace directly (01:38:42) to blackwell so that we can expand the (01:38:44) context memory but even that is not (01:38:46) enough and so the next solution of (01:38:48) course is to go off onto the network the (01:38:51) north south network off to the storage (01:38:54) of the company but if you have a whole (01:38:57) lot of AI running at the same time that (01:39:00) network is no longer going to be fast (01:39:02) enough so the answer is very clearly to (01:39:04) do it different. And so we intro we (01:39:06) created Bluefield 4 so that we could (01:39:09) essentially have a very fast KV cache (01:39:13) context memory store right in the rack. (01:39:17) And so I'll show you in just one second, (01:39:20) but there's a whole new category of (01:39:22) storage systems. And the industry is so (01:39:25) excited because this is a pain point for (01:39:27) just about everybody who does a lot of (01:39:29) token generation today. the AI labs, the (01:39:31) cloud service providers, they're really (01:39:34) suffering from the amount of network (01:39:36) traffic that's causing being caused by (01:39:38) KV cache moving around. And so the idea (01:39:41) that we would create a new platform, a (01:39:43) new processor to run the entire Dynamo (01:39:47) KV cache context memory management (01:39:50) system and to put it very close to the (01:39:53) rest of the rack is completely (01:39:54) revolutionary. So this is it. This is it (01:39:58) sits right here. (01:40:00) So this this is all the compute nodes. (01:40:04) Each one of these is MVLink 72. So this (01:40:07) is Vera Rubin MVLink 72.4 (01:40:12) U Reuben GPUs. This is the context (01:40:16) memory that's stored here. Behind each (01:40:18) one of these are four blue fields. (01:40:20) Behind each blue field is 150 gigab 150 (01:40:24) terabytes (01:40:25) 150 terabytes of memory context memory. (01:40:29) And for each GPU once you allocate it (01:40:32) across each GPU will get an additional (01:40:35) 16 terabytes. Now inside this node each (01:40:40) GPU essentially has one terabyte. And (01:40:43) now with this backing store here (01:40:47) directly on the same east west traffic (01:40:49) at exactly the same data rate 200 (01:40:51) gigabits per second across literally the (01:40:55) entire fabric of this compute node. (01:40:58) you're going to get an additional 16 (01:41:00) terabytes of memory. Okay. And this is (01:41:02) the management plane. These are these (01:41:06) are the spectrum X (01:41:11) switches that connects all of them (01:41:13) together. (01:41:15) And over here, these switches at the end (01:41:20) connects them to the rest of the data (01:41:21) center. Okay? And so this is the Vera (01:41:25) Rubin. Now there's several things that's (01:41:27) really incredible about it. So the first (01:41:29) thing that I mentioned is that the this (01:41:33) entire system is twice the energy (01:41:37) efficiency essentially the the twice the (01:41:40) the the temperature performance in the (01:41:42) sense that that even though the power is (01:41:45) twice as high the amount of energy use (01:41:47) is twice as high the amount of (01:41:49) computation is many times higher than (01:41:50) that but the liquid that goes into it (01:41:54) still 45 degrees C that enables us to (01:41:57) save about 6% % of the world's data (01:41:59) center power. So that's a very big deal. (01:42:02) The second very big deal (01:42:05) is that this entire system is now (01:42:07) confidential computing safe. Meaning (01:42:09) everything is encoded in transit at rest (01:42:12) and during compute and every single bus (01:42:15) is now encrypted. every PCI Express, (01:42:19) every MV link, every H you know for a MV (01:42:22) link between CPU me and GPU between GPU (01:42:25) to GPU, everything is now encrypted and (01:42:29) so it's confidential computing safe. (01:42:31) This allows companies to feel safe that (01:42:36) their models are being deployed by (01:42:37) somebody else, but it will never be seen (01:42:39) by anybody else. Okay? And so this (01:42:42) particular system is not only incredibly (01:42:45) energy efficient and there's one other (01:42:47) thing that's incredible (01:42:49) because of the nature of the workload of (01:42:52) AI it spikes instantaneously (01:42:55) with this computation layer called all (01:42:57) reduce the amount of current the amount (01:43:00) of energy that is used sp simultaneously (01:43:04) is really off the charts oftentimes (01:43:06) it'll spike up 25%. We now have power (01:43:09) smoothing across the entire system so (01:43:12) that you don't have to overprovision by (01:43:14) 25 times or if you overprovision by 25 (01:43:18) times you don't have to leave 25 times (01:43:21) 25% 25% not 25 times 25% of the energy (01:43:26) um squandered or unused and so now you (01:43:30) could fill up the entire power budget (01:43:32) and you don't have to over you don't (01:43:34) have to proceed you don't have to (01:43:35) provision beyond that and then the Last (01:43:37) thing of course is performance. So let's (01:43:40) take a look at the performance of this. (01:43:42) These are only charts that people who (01:43:44) build AI super supercomputers would (01:43:46) love. It took exact it took every single (01:43:49) one of these chips complete redesign of (01:43:51) every single one of the systems and (01:43:52) rewriting the entire stack for us to (01:43:54) make this possible. Basically (01:43:58) this is training the AI model. This (01:44:00) first column, the faster you train AI (01:44:03) models, the faster you can get the next (01:44:05) frontier out to the world. This is your (01:44:07) time to market. This is technology (01:44:09) leadership. This is your pricing power. (01:44:12) And so in the case of the green, this is (01:44:15) essentially (01:44:18) a 10 trillion parameter model. We scaled (01:44:22) it up from deepse. That's why we call it (01:44:24) deep C++. A training a 10 trillion (01:44:26) parameter model on a 100 trillion (01:44:30) tokens. Okay. And that's this is our (01:44:33) simulation projection of what it would (01:44:35) take for us to build the next frontier (01:44:37) model. The next frontier model uh Elon's (01:44:39) already mentioned that the next version (01:44:41) of Grock Grock 5 I think is 7 trillion (01:44:43) parameters. This is 10 and in the green (01:44:46) is black well and here in the case of um (01:44:51) Reuben notice the throughput is so much (01:44:54) higher and therefore it only takes 1/4th (01:44:58) as many of these systems in order to (01:45:00) train the model in the time that we gave (01:45:04) it here which is one month. Okay. And so (01:45:08) time time is the same for everybody. Now (01:45:10) how much how fast you can train that (01:45:12) model and how large of a model you can (01:45:13) train is how you're going to get to the (01:45:15) frontier first. The second part is your (01:45:17) factory throughput. (01:45:20) Blackwell is green again. And factory (01:45:22) throughput is important because your (01:45:24) factory is in the case of a gigawatt is (01:45:27) $50 billion. A $50 billion data center (01:45:32) can only consume one gawatt of power. (01:45:35) And so if your performance, your (01:45:39) throughput per watt is very good versus (01:45:43) quite poor, that directly translates to (01:45:46) your revenues. Your revenues of your (01:45:49) data center is directly related to the (01:45:51) second second column. And in the case of (01:45:55) Blackwell, it was about 10 times over (01:45:57) Hopper. In the case of Reuben, it's (01:45:59) going to be about 10 times higher again. (01:46:01) Okay? And in the case of now the um the (01:46:06) cost of the tokens, how cost effectively (01:46:09) it is to generate the token. This is (01:46:12) Reuben about onetenth just as in the (01:46:14) case of Yep. [clears throat] (01:46:20) >> [applause] (01:46:22) >> So that's how this is how we're going to (01:46:23) get everybody to the next frontier (01:46:26) to um push AI to the next level and of (01:46:30) course to build these data centers (01:46:32) energy efficiently and costefficiently. (01:46:36) So this is it. This is Nvidia today. You (01:46:40) know, we mentioned that we build chips, (01:46:43) but as you know, Nvidia builds entire (01:46:45) systems now. And AI is a full stack. We (01:46:50) we're reinventing AI across everything (01:46:52) from chips to infrastructure to models (01:46:55) to applications. And our job is to (01:46:58) create the entire stack so that all of (01:47:00) you could create incredible applications (01:47:03) uh for the rest of the world. Thank you (01:47:05) all for coming. Have a great CES. (01:47:08) Now, Before [applause and cheering] (01:47:10) Before I Before I let you guys go, uh (01:47:13) there were a whole bunch of slides we (01:47:14) have to cut we had to leave on the (01:47:16) cutting floor and so we have some out (01:47:18) takes here. I think it'll be fun for (01:47:19) you. Have a great see us guys (01:47:26) and cut. (01:47:31) Nvidia live at CES. Take four. Marker (01:47:35) >> boom mic (01:47:37) action. (01:47:40) >> Sorry guys. Platform shift, huh? (01:47:47) >> That should do it. (01:47:49) >> And let's [music] roll camera. (01:47:53) >> A shade of green. A bright happy green. (01:47:58) >> World's most powerful AI supercomput you (01:48:01) can plug into the wall. next to my (01:48:04) toaster. (01:48:07) >> Hey guys, I'm I'm stuck again. I'm so (01:48:09) sorry. (01:48:10) >> This slide is never going to work. Let's (01:48:11) just cut it. (01:48:12) >> Hello. Can you hear me? (01:48:17) >> So, like [music] I was saying, the (01:48:19) router. Because not every problem needs (01:48:21) the biggest, smartest model. Just the (01:48:24) right one. (01:48:26) >> No, no, don't lose any of them. This new (01:48:30) six chip Reuben [music] platform makes (01:48:32) one amazing AI supercomputer. (01:48:36) >> There you go, little guy. (01:48:38) >> Oh no, no, not the scaling laws. (01:48:41) >> There is a squirrel on the car. Be ready (01:48:43) to make the squirrel [music] go away. (01:48:45) Ask the squirrel gently to move away. (01:48:48) >> Did you know the best models today are (01:48:50) all mixture of experts? (01:48:55) >> Hey (01:48:57) >> [music] (01:49:06) >> Where'd everybody go? (01:49:50) Hey, (01:49:53) hey, (01:49:56) hey. [music] (01:50:03) Hey. (01:50:20) Hey. Hey. (01:50:33) Hey. (01:50:39) Hey. Hey.

Leave a Reply

Your email address will not be published. Required fields are marked *