↔
Title: Nvidia CEO Jensen Huang talks about his company’s latest innovations at CES 2026
Duration: 01:50:40
Total Correct Answers:
Current Caption
Correct
Learning Modes
YouTube Video Transcript Hide
Ask AI:
Export as:
Ask AI Result
The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here
(00:00:14)
lay.
(00:00:42)
Please take your seats. Our event is
(00:00:44)
about to begin.
(00:00:54)
[music]
(00:00:59)
>> Hey
(00:01:12)
lays
(00:01:28)
excuse
(00:01:30)
Or would you Go fetus
(00:01:46)
glory
(00:01:55)
glory
(00:01:59)
kicked into the cur.
(00:02:10)
It's better than
(00:02:25)
my name.
(00:02:54)
I got a fire in me. You're going to set
(00:02:57)
burn.
(00:03:17)
[music]
(00:03:36)
my nails.
(00:03:38)
Loose
(00:03:50)
go.
(00:07:41)
Hey,
(00:07:49)
hey, hey.
(00:08:18)
Heat. Heat.
(00:09:57)
Hey.
(00:09:58)
Hey. Hey.
(00:10:16)
Yeah.
(00:10:23)
Yeah. Yeah.
(00:13:59)
Natalie.
(00:14:26)
Hey,
(00:14:46)
hey,
(00:14:49)
hey.
(00:15:30)
Heat. Heat.
(00:15:40)
Ready,
(00:16:10)
Let's go.
(00:17:08)
Heat. Heat.
(00:17:30)
Heat. Heat.
(00:18:06)
Heat.
(00:18:10)
Heat.
(00:18:58)
>> [music]
(00:19:36)
>> Welcome to the stage, Nvidia founder and
(00:19:39)
CEO, Jensen Wong.
(00:19:49)
Hello, Las Vegas.
(00:19:53)
Happy New Year.
(00:19:55)
>> Welcome to CES.
(00:19:58)
Well, we have about 15 kilos worth of
(00:20:01)
material to pack in here. I'm so happy
(00:20:03)
to see all of you. You got 3,000 people
(00:20:05)
in this auditorium. There's 2,000 people
(00:20:07)
in a courtyard watching us. There's
(00:20:09)
another thousand people apparently in
(00:20:11)
the fourth floor where there were
(00:20:13)
supposed to be Nvidia show floors all
(00:20:15)
watching this keynote and of course
(00:20:17)
millions around the world are going to
(00:20:18)
be watching this to kick off this new
(00:20:21)
year. Well, every 10 to 15 years the
(00:20:25)
computer industry resets.
(00:20:28)
A new platform shift happens
(00:20:31)
from mainframe to PC, PC to internet,
(00:20:34)
internet to cloud, cloud to mobile. Each
(00:20:37)
time
(00:20:39)
the world of applications
(00:20:42)
target a new platform, that's why it's
(00:20:43)
called a platform shift. You write new
(00:20:45)
applications for a new computer.
(00:20:49)
Except this time
(00:20:51)
there are two simultaneous platform
(00:20:53)
shifts in fact happening at the same
(00:20:55)
time.
(00:20:58)
While we now move to AI, applications
(00:21:01)
are now going to be built on top of AI.
(00:21:04)
At first, people thought AIS are
(00:21:06)
applications. And in fact, AIs are
(00:21:08)
applications. But you're going to build
(00:21:10)
applications on top of AIS.
(00:21:13)
But in addition to that,
(00:21:16)
how you run the software, how you
(00:21:19)
develop the software
(00:21:21)
fundamentally changed. The entire
(00:21:24)
fabulary stack of the computer industry
(00:21:26)
is being reinvented.
(00:21:28)
You no longer program the software, you
(00:21:31)
train the software. You don't run it on
(00:21:34)
CPUs, you run it on GPUs.
(00:21:38)
And whereas applications were
(00:21:41)
pre-recorded, pre-ompiled
(00:21:43)
and run on your device, now applications
(00:21:47)
understand the context and generate
(00:21:50)
every single pixel, every single token
(00:21:52)
completely from scratch every single
(00:21:55)
time.
(00:21:56)
Computing has been fundamentally
(00:21:58)
reshaped as a result of accelerated
(00:22:00)
computing, as a result of artificial
(00:22:02)
intelligence. Every single layer of that
(00:22:04)
five layer cake is now being re
(00:22:07)
reinvented.
(00:22:08)
Well, what that means is some 10
(00:22:11)
trillion dollars or so of the last
(00:22:13)
decade of computing is now being
(00:22:15)
modernized to this new way of doing
(00:22:17)
computing. What that means is hundreds
(00:22:21)
of billions of dollars, a couple hundred
(00:22:22)
billion dollars in VC funding each year
(00:22:24)
is going into modernize and inventing
(00:22:27)
this new world. And what it means is a
(00:22:30)
hundred trillion dollars of industry,
(00:22:33)
several percent of which is R&D budget
(00:22:35)
is shifting over to artificial
(00:22:38)
intelligence. People ask where is the
(00:22:40)
money coming from? That's where the
(00:22:42)
money is coming from. the modernization
(00:22:45)
of AI to AI, the shifting of R&D budgets
(00:22:49)
from classical methods to now artificial
(00:22:51)
intelligence methods. Enormous amounts
(00:22:54)
of investments coming into this
(00:22:56)
industry, which explains why we're so
(00:22:58)
busy. And this last year was no
(00:23:00)
difference. This last year was
(00:23:02)
incredible.
(00:23:03)
This last year, there's a slide coming.
(00:23:07)
This is what happens when you don't
(00:23:08)
practice.
(00:23:11)
It's the first keynote of the year. I
(00:23:13)
hope it's your first keynote of the
(00:23:14)
year. Otherwise, you can been you have
(00:23:15)
been pretty pretty busy. This is our
(00:23:17)
first keynote of the year. We're going
(00:23:18)
to get the spiderwebs out. And so 2025
(00:23:23)
was an incredible year.
(00:23:26)
It's just see it seemed like everything
(00:23:27)
was happening all at the same time. And
(00:23:29)
it in fact it probably was. The first
(00:23:31)
thing of course is scaling loss.
(00:23:35)
In 2015,
(00:23:38)
the first language model that I thought
(00:23:40)
was really going to make a difference
(00:23:42)
made a huge difference. It was called
(00:23:43)
BERT. 2017, Transformers came. It wasn't
(00:23:48)
until 5 years later, 2022, that Chad GPT
(00:23:51)
moment happened and it awakened the
(00:23:53)
world to the possibilities of artificial
(00:23:55)
intelligence.
(00:23:57)
Something very important happened a year
(00:23:59)
after that.
(00:24:01)
The first 01 model from Chad GPT, the
(00:24:04)
first reasoning model, completely
(00:24:06)
revolutionary, invented this idea called
(00:24:09)
test time scaling, which is a very
(00:24:10)
common sense, common sensical thing. Not
(00:24:13)
only do we pre-train a model to learn,
(00:24:15)
we postrain it with our re reinforcement
(00:24:18)
learning so that it could learn skills.
(00:24:20)
And now we also have test time scaling,
(00:24:22)
which is another way of saying thinking.
(00:24:25)
You think in real time. Each one of
(00:24:27)
these phases of artificial intelligence
(00:24:30)
requires enormous amount of compute and
(00:24:32)
the computing law continued to scale.
(00:24:34)
Large language models continue to get
(00:24:36)
better. Meanwhile, another breakthrough
(00:24:39)
happened and this breakthrough happened
(00:24:41)
in 2024.
(00:24:43)
Agentic systems started to emerge in
(00:24:46)
2025. It started to pervase to to uh
(00:24:49)
proliferate just about everywhere.
(00:24:51)
Agentic models that have the ability to
(00:24:54)
reason,
(00:24:55)
look up information, do research, use
(00:24:58)
tools, plan futures, simulate outcomes.
(00:25:03)
All of a sudden started to solve very
(00:25:05)
very important problems. One of my
(00:25:07)
favorite Agentic models is called cursor
(00:25:10)
which revolutionized the way we do
(00:25:12)
software programming at NVIDIA. Agentic
(00:25:14)
systems are going to really take off
(00:25:16)
from here. Of course, there were other
(00:25:19)
types of AI. We know that large language
(00:25:20)
models isn't the only type of
(00:25:22)
information. Wherever the universe has
(00:25:24)
information, wherever the universe has
(00:25:26)
structure, we could teach a large
(00:25:29)
language model a form of language model
(00:25:32)
to go understand that information to
(00:25:35)
understand its representation and to
(00:25:38)
turn that into an AI. One of the biggest
(00:25:40)
most important one is physical AI. AI
(00:25:44)
that understand the laws of nature. And
(00:25:47)
then of course physical AI is about AI's
(00:25:50)
interacting with the world but the world
(00:25:52)
itself has information encoded
(00:25:54)
information and that's called AI
(00:25:56)
physics. AI that in the case of physical
(00:25:59)
AI you have AI that interacts with the
(00:26:01)
physical world and you have AI physics
(00:26:04)
AI that understands the laws of physics.
(00:26:07)
And then lastly one of the most
(00:26:09)
important things that happened last year
(00:26:11)
the advancement of open models. We can
(00:26:14)
now know that AI is going to proliferate
(00:26:17)
everywhere when open source when open
(00:26:20)
innovation when innovation across every
(00:26:23)
single company and every industry around
(00:26:24)
the world is activated. At the same
(00:26:26)
time, open models really took off last
(00:26:29)
year. In fact,
(00:26:31)
last year we saw the advance of DeepSeek
(00:26:36)
R1, the first open model that's a
(00:26:40)
reasoning system. It caught the world by
(00:26:44)
surprise and it activated literally this
(00:26:48)
entire movement. Really, really exciting
(00:26:50)
work. We're so happy with it. Now we
(00:26:53)
have openings open model systems all
(00:26:56)
over the world of all different kinds
(00:26:57)
and we now know that open models have
(00:27:00)
also reached the frontier. still solidly
(00:27:03)
is six months behind the frontier models
(00:27:06)
but every single six months a new model
(00:27:08)
is emerging and these models are getting
(00:27:11)
smarter and smarter because of that you
(00:27:15)
could see the number of downloads has
(00:27:17)
exploded
(00:27:19)
the number of downloads is growing so
(00:27:21)
fast because startups want to
(00:27:23)
participate in the AI revolution large
(00:27:26)
companies want to researchers want to
(00:27:28)
students want to just about every single
(00:27:30)
country wants
(00:27:31)
How is it possible that intelligence,
(00:27:34)
the digital form of intelligence will
(00:27:36)
leave anyone behind? And so open models
(00:27:40)
has really revolutionized artificial
(00:27:42)
intelligence last year. This entire
(00:27:44)
industry is going to be reshaped as a
(00:27:46)
result of that. Now, we had this inkling
(00:27:48)
some time ago. You might have heard that
(00:27:51)
several years ago, we started to build
(00:27:55)
and operate our own AI supercomputers.
(00:27:57)
We call them DGX clouds. A lot of people
(00:28:00)
asked, are you going to in going into
(00:28:02)
the cloud business? The answer is no.
(00:28:04)
We're building these DGX supercomputers
(00:28:06)
for our own use. Well, it turns out we
(00:28:09)
have billions of dollars of
(00:28:11)
supercomputers in operation so that we
(00:28:13)
could develop our open models. I am so
(00:28:17)
pleased with the work that we're doing.
(00:28:19)
It is starting to attract attention all
(00:28:21)
over the world and all over the
(00:28:22)
industries because we are doing frontier
(00:28:25)
AI model work in so many different
(00:28:27)
domains. The work that we did in
(00:28:29)
proteins in digital biology. La protina
(00:28:32)
to be able to synthesize and generate
(00:28:34)
proteins. Open fold 3 to understand the
(00:28:37)
understand the structure of proteins.
(00:28:40)
[snorts] EVO 2 how to understand and
(00:28:43)
generate
(00:28:45)
multiple proteins otherwise the
(00:28:47)
beginnings of cellular cellular
(00:28:49)
representation. Earth 2 AI that
(00:28:52)
understands laws of physics. The work
(00:28:54)
that we did with forecast net, the work
(00:28:56)
that we did with Cordiff really
(00:28:58)
revolutionized the way that people are
(00:29:00)
doing weather prediction. Neotron,
(00:29:03)
we've now doing groundbreaking work
(00:29:05)
there. The first hybrid transformer SSM
(00:29:09)
model that's incredibly fast can and
(00:29:11)
therefore can think for a very long time
(00:29:14)
or can think very quickly with that for
(00:29:17)
not a very long time and produce very
(00:29:19)
very smart intelligent answers.
(00:29:20)
Neimotron 3 is groundbreaking work and
(00:29:23)
you can expect us to deliver other
(00:29:25)
versions of Neimotron 3 in the near
(00:29:26)
future. Cosmos
(00:29:30)
a frontier open world foundation model
(00:29:34)
one that understand how the world works.
(00:29:37)
Groot a humanoid robotic system
(00:29:39)
articulation mobility locomotion. These
(00:29:43)
models, these technologies are now being
(00:29:46)
integrated and in the each one of these
(00:29:48)
cases open to the world. Frontier human
(00:29:51)
and robotics models open to the world.
(00:29:53)
And then today we're going to talk a
(00:29:55)
little bit about Alpamo, the work that
(00:29:56)
we've been doing in self-driving cars.
(00:29:58)
Not only do we open source the models,
(00:30:01)
we also open source the data that we use
(00:30:04)
to train those models because that in
(00:30:07)
that way only in that way can you truly
(00:30:10)
trust how the models came to be. We open
(00:30:14)
source all the models. We help you make
(00:30:16)
derivatives from them. We have a whole
(00:30:18)
suite of libraries we call the Nemo
(00:30:20)
libraries, physics li physics Nemo
(00:30:23)
libraries and the clarono libraries.
(00:30:25)
Each biono libraries each one of these
(00:30:27)
libraries are life cycle management
(00:30:29)
systems of AIS so that you could process
(00:30:32)
the data you could generate data you
(00:30:33)
could train the model you could create
(00:30:35)
the model evaluate the model guardrail
(00:30:37)
the model all the way to deploying the
(00:30:39)
model each one of these libraries are
(00:30:42)
incredibly complex and all of it is open
(00:30:44)
sourced and so now on top of this
(00:30:47)
platform NVIDIA is a frontier AI model
(00:30:51)
builder and we build it in a very
(00:30:54)
special way we build it completely in
(00:30:56)
the open so that we can enable every
(00:30:59)
company, every industry, every country
(00:31:01)
to be part of this AI revolution. I'm
(00:31:04)
incredibly proud of the work that we're
(00:31:06)
doing there. In fact, if you notice the
(00:31:08)
the charts, the chart shows that our
(00:31:12)
contribution to this industry is bar
(00:31:15)
none and you're going to see us in fact
(00:31:16)
continue to do that if not accelerate.
(00:31:19)
These models are also world class.
(00:31:24)
All systems are down.
(00:31:28)
This never happens in Santa Clara.
(00:31:32)
Is it because of Las Vegas?
(00:31:40)
Somebody must have went won a jackpot
(00:31:42)
outside.
(00:31:44)
[clears throat] All systems are down.
(00:31:49)
Okay, I think my system's still down,
(00:31:52)
but that's okay. I I I've I make it up
(00:31:55)
as I go. And so so uh not only are these
(00:31:58)
models uh frontier capable, not only are
(00:32:02)
they open, they're also top the
(00:32:04)
leaderboards. This is an area where
(00:32:05)
we're very proud. They top leaderboards
(00:32:07)
in intelligence. Uh we have uh uh
(00:32:10)
important models that understand
(00:32:12)
multimodality documents, otherwise known
(00:32:15)
as PDFs. The most valuable content in
(00:32:18)
the world are captured in PDFs, but
(00:32:20)
there it takes artificial intelligence
(00:32:22)
to find out what's inside, interpret
(00:32:25)
what's inside, and help you read it. And
(00:32:27)
so our PDF retrievers, our PDF parsers
(00:32:30)
are worldclass.
(00:32:32)
Our speech recognition models absolutely
(00:32:34)
worldclass. Our retrieval models,
(00:32:37)
basically search, semantic search, AI
(00:32:40)
search, the database engine of the
(00:32:42)
modern AI era, worldclass. So, we're on
(00:32:46)
top of leaderboards constantly. This is
(00:32:48)
an area we're very proud of. And all of
(00:32:50)
that is in service of your ability to
(00:32:54)
build AI agents. This is really a
(00:32:58)
groundbreaking area of development. You
(00:33:00)
know, at first when pe when chat GPT
(00:33:01)
came out, people said, you know, uh
(00:33:04)
gosh, it it produced really interesting
(00:33:06)
results, but it hallucinated greatly.
(00:33:08)
And the reason why it hallucinated, of
(00:33:10)
course, it could memorize everything um
(00:33:12)
in the past, but it can't memorize
(00:33:14)
everything in the future, in the
(00:33:15)
current. And so it needs to be grounded
(00:33:17)
in research. It has to do fundamental
(00:33:19)
research before it answers a question.
(00:33:22)
The ability to reason about do I have to
(00:33:25)
do research? Do I have to use tools? How
(00:33:26)
do I break up a problem into steps? Each
(00:33:29)
one of these steps something that that
(00:33:31)
the AI model knows how to do. And
(00:33:34)
together it is able to compose it into a
(00:33:37)
sequence of steps to perform something
(00:33:39)
it's never done before, never been
(00:33:40)
trained to do. This is the wonderful
(00:33:43)
capability of reasoning. We could we
(00:33:45)
could be we can encounter a circumstance
(00:33:47)
we've never seen before and break it
(00:33:49)
down into circumstances and knowledge or
(00:33:52)
rules that we know how to do because
(00:33:55)
we've experienced it in the past. And so
(00:33:58)
the ability for AI models now to be able
(00:33:59)
to reason incredibly powerful.
(00:34:02)
The reasoning capability of agents open
(00:34:04)
the doors to all of these different
(00:34:06)
applications. We no longer have to train
(00:34:08)
an AI model to know everything on day
(00:34:11)
one. Just as we don't have to know
(00:34:13)
everything on day one that we should be
(00:34:16)
able to in every circumstance reason
(00:34:18)
about how to solve that problem. Large
(00:34:21)
language models has now made this
(00:34:23)
fundamental leap. The ability to use
(00:34:25)
reinforcement learning and chain of
(00:34:26)
thought and you know search and planning
(00:34:29)
and all these different techniques in
(00:34:30)
reinforcement learning has made it
(00:34:32)
possible for us to have this basic
(00:34:34)
capability and is also now completely
(00:34:37)
open sourced. But the thing that's
(00:34:39)
really terrific is another breakthrough
(00:34:41)
that happened and the first time I saw
(00:34:43)
it was with Arvin's perplexity.
(00:34:46)
Perplexity, the search company, the AI
(00:34:49)
search company, really f really
(00:34:51)
innovative company. And the first time I
(00:34:53)
realized they were using multiple models
(00:34:55)
at the same time, I thought it was
(00:34:57)
completely genius. Of course, we would
(00:34:58)
do that. Of course, an AI would also
(00:35:02)
call upon all of the world's great AIs
(00:35:05)
to solve the problem it wants to solve
(00:35:07)
at any part of the reasoning chain. And
(00:35:10)
this is the reason why AIs are really
(00:35:14)
multi-modal
(00:35:16)
meaning they understand speech and
(00:35:19)
images and text and videos and 3D
(00:35:22)
graphics and proteins. It's multimodal.
(00:35:25)
It's also multi-model
(00:35:28)
meaning that it should be able to use
(00:35:29)
any model that best fits the task. It is
(00:35:35)
multicloud by definition. Therefore,
(00:35:37)
because these AI models are sitting in
(00:35:39)
all these different places and it also
(00:35:42)
is hybrid cloud because if you're an
(00:35:44)
enterprise company or you've built a
(00:35:47)
robot or whatever that device is,
(00:35:49)
sometimes it's at the edge, sometimes a
(00:35:51)
radio cell tower, maybe sometimes it's
(00:35:54)
in an enterprise or maybe it's a place
(00:35:56)
where a hospital where you need to have
(00:35:58)
the the data in real time right next to
(00:36:01)
you. Whatever those applications are, we
(00:36:04)
know now this is what an AI application
(00:36:07)
looks like in the future. Or another way
(00:36:09)
to think about that because future
(00:36:12)
applications are built on AIS.
(00:36:15)
This is the basic framework of future
(00:36:17)
applications.
(00:36:19)
This basic framework, this basic
(00:36:22)
structure of agentic AIs that could do
(00:36:25)
the things that I'm talking about that
(00:36:26)
is multi-model
(00:36:28)
has now turbocharged
(00:36:31)
AI startups of all kinds. And now you
(00:36:34)
can also because of all of the open
(00:36:37)
models and all the tools that we
(00:36:38)
provided you, you could also customize
(00:36:41)
your AIs to teach your AI skills that
(00:36:44)
nobody else is teaching. Nobody else is
(00:36:47)
causing their AI to become intelligent
(00:36:49)
or smart in that way. You could do it
(00:36:51)
for yourself. And that's the work that
(00:36:53)
we do with Neimotron, Nemo, and all of
(00:36:56)
the things that we do with open models
(00:36:58)
is intended to do. You put a smart
(00:37:00)
router in front of it. And that router
(00:37:02)
is essentially a manager that decides
(00:37:04)
which one of the task based on the
(00:37:06)
intention of the prompts that you give
(00:37:08)
it, which one of the models is best fit
(00:37:11)
for that application for that solving
(00:37:13)
that problem. Okay. So now when you
(00:37:16)
think about this architecture, what do
(00:37:18)
you have?
(00:37:20)
When you think about this architecture,
(00:37:21)
all of a sudden you have an AI that's on
(00:37:24)
the one hand completely customizable by
(00:37:27)
you. Something that you could teach to
(00:37:29)
do your own very skills for your
(00:37:31)
company, something that's domain secret,
(00:37:35)
something where you have deep domain
(00:37:36)
expertise. Maybe you've got all of the
(00:37:38)
data that you need to train that AI
(00:37:41)
model. On the other hand, your AI is
(00:37:45)
always at the frontier by definition.
(00:37:48)
You're always at the frontier on the one
(00:37:50)
hand. You're always customized. On the
(00:37:51)
other hand, it should just run. And so
(00:37:54)
we thought we would make the simplest of
(00:37:56)
examples to make it available to you.
(00:37:59)
This entire framework we call a
(00:38:01)
blueprint and we have blueprints that
(00:38:03)
are integrated into enterprise SAS
(00:38:06)
platforms all over the world and we're
(00:38:08)
really pleased with the progress. But
(00:38:09)
what we do is show you a a short example
(00:38:11)
of something that anybody can do.
(00:38:16)
Let's build a personal assistant.
(00:38:19)
I wanted to help me with my calendar,
(00:38:21)
emails, [music] to-do lists, and even
(00:38:23)
keep an eye on my home. I use Brev to
(00:38:26)
turn my DGX Spark into a personal cloud.
(00:38:29)
So, I can use the same interface whether
(00:38:31)
I'm using a cloud GPU or a DGX Spark. I
(00:38:34)
use a Frontier model API to easily get
(00:38:36)
started. [music]
(00:38:40)
I want him to help me with my emails.
(00:38:42)
So, I create an email tool for my agent
(00:38:44)
to call.
(00:38:46)
I want my emails to stay private. So,
(00:38:48)
I'll add an open model that's running
(00:38:49)
locally on the Spark.
(00:38:53)
Now, for any job, [music] I want the
(00:38:55)
agent to use the right model for the
(00:38:57)
right task. So, I'll use an intentbased
(00:38:59)
model router.
(00:39:02)
This way, prompts that need email will
(00:39:04)
stay on my Spark, and everything else
(00:39:07)
can call the Frontier model. I want my
(00:39:10)
assistant to interact with my world, so
(00:39:11)
I'll hook it up to Hugging Faces Reachi
(00:39:13)
mini robot.
(00:39:15)
My agent controls the head, ears, and
(00:39:18)
camera of the Reichi with tool calls. I
(00:39:20)
want to give Richi a voice, and I really
(00:39:22)
like 11 Labs, so I'll hook up their API.
(00:39:25)
>> Hi, I'm Richi running on DGX Spark.
(00:39:28)
[music]
(00:39:28)
>> Hey Richi, what's on my to-do list
(00:39:30)
today?
(00:39:31)
your to-do list today.
(00:39:34)
Grab groceries, eggs, milk, butter, and
(00:39:37)
send Jensen the new script.
(00:39:40)
>> Okay, let's send Jensen an update. Tell
(00:39:42)
him we'll have it for him by the end of
(00:39:43)
the day.
(00:39:44)
>> We'll do.
(00:39:45)
>> Richi, there's a sketch, too. Can you
(00:39:47)
turn it into an architectural rendering?
(00:39:50)
>> Sure.
(00:39:59)
>> Nice. Now make a video and show me
(00:40:01)
around the room.
(00:40:04)
>> Here you go.
(00:40:05)
>> That's great.
(00:40:08)
>> With Brev, I can share access to my
(00:40:10)
Spark and Reachi, so I'm going to share
(00:40:11)
it with Anna.
(00:40:14)
>> Hey Richi, what's Potato up to?
(00:40:18)
>> He's on the couch.
(00:40:20)
I remember you don't like this. I'll
(00:40:22)
tell him to get off. Potato, off the
(00:40:25)
couch.
(00:40:26)
With all the progress in open source,
(00:40:28)
it's incredible to see what you can
(00:40:29)
build. I'd love to see what you create.
(00:40:37)
>> Isn't that incredible?
(00:40:39)
Now, the amazing thing is that is
(00:40:42)
utterly trivial now. That is utterly
(00:40:45)
trivial now. And yet, just a couple
(00:40:48)
years ago, all of that would have been
(00:40:50)
impossible. Absolutely unimaginable.
(00:40:52)
Well, this basic framework, this basic
(00:40:55)
way of building applications using
(00:40:58)
language models
(00:41:07)
using language models
(00:41:11)
[clears throat] using language models
(00:41:13)
using language models that are
(00:41:15)
pre-trained and they're proprietary.
(00:41:18)
They're frontier. combine it with
(00:41:20)
customized language models into a aentic
(00:41:24)
framework, a reasoning framework that
(00:41:26)
allows you to access tools and files and
(00:41:29)
maybe even connect to other agents.
(00:41:32)
This is basically the architecture of AI
(00:41:36)
applications or applications in the
(00:41:39)
modern age and the ability for us to
(00:41:41)
create these applications are incredibly
(00:41:43)
fast. And notice
(00:41:46)
if you give it this application um
(00:41:49)
information that it's never seen before
(00:41:52)
or in a structure that has is not
(00:41:54)
represented exactly as you thought it
(00:41:58)
can still reason through it and make it
(00:42:01)
best effort to reason through the data
(00:42:03)
the information to try to understand how
(00:42:05)
to solve the problem artificial
(00:42:07)
intelligence. Okay. Okay, so this basic
(00:42:09)
framework is now being integrated and
(00:42:11)
everything that I just described, we had
(00:42:12)
the benefit of working with some of the
(00:42:14)
world's leading enterprise platform
(00:42:16)
companies. Uh, Palunteer for example
(00:42:20)
um their their entire AI and data
(00:42:23)
processing platform is being integrated
(00:42:25)
accelerated by Nvidia today. Service Now
(00:42:28)
the world's leading customer service and
(00:42:31)
um employee service platform. Snowflake
(00:42:33)
the world's top data platform in the
(00:42:36)
cloud. Uh, incredible work that that is
(00:42:39)
being done there. Uh, Code Rabbit, we're
(00:42:42)
using Code Rabbit all over Nvidia. Uh,
(00:42:44)
Crowdstrike creating AIS to detect to
(00:42:47)
find AI threats. Uh, NetApp, their AI,
(00:42:51)
their data platform now has NVIDIA's
(00:42:53)
semantic AI on top of it and agentic
(00:42:56)
systems on top of it uh to for uh for
(00:42:59)
them to do customer service. But the
(00:43:01)
important thing is this. Not only is
(00:43:03)
this the way that you develop
(00:43:04)
applications now, this is going to be
(00:43:07)
the user interface of your platform. So
(00:43:09)
whether it's Palanteer or Service Now or
(00:43:12)
Snowflake and many other companies that
(00:43:14)
we're working with, the agentic system
(00:43:17)
is the interface. It's no longer Excel
(00:43:21)
with a bunch of, you know, squares that
(00:43:23)
you enter enter information into. Maybe
(00:43:26)
it's no longer could just command line.
(00:43:28)
the any all of that multimodality
(00:43:31)
information is now possible and the way
(00:43:33)
you interact with your platform is much
(00:43:36)
more well if you will simple like you're
(00:43:38)
interacting with people and so that's
(00:43:41)
enterprise AI being revolutionized by
(00:43:44)
angentic systems the next thing is
(00:43:47)
physical AI this is an area that you've
(00:43:49)
seen me talk about for several years in
(00:43:51)
fact we've been working on this for
(00:43:52)
eight years the question is how do you
(00:43:56)
take something that is intelligent
(00:43:58)
inside a computer and interacts with you
(00:44:01)
with screens and speakers to something
(00:44:05)
that can interact with the world.
(00:44:07)
Meaning it can understand the common
(00:44:10)
sense of how the world works. Object
(00:44:13)
permanence. If I look away and I look
(00:44:14)
back, that object is still there. Um
(00:44:18)
causality. If I push it, it tips over.
(00:44:20)
It understands friction and gravity. It
(00:44:23)
understands inertia. that a heavy truck
(00:44:26)
rolling down the road is going to need a
(00:44:28)
little bit more time to stop, that a
(00:44:31)
ball is going to keep on rolling.
(00:44:34)
These ideas are common sense to even a
(00:44:36)
little child, but for AI, it's
(00:44:39)
completely unknown. And so we have to
(00:44:42)
create a system that allows AIS to learn
(00:44:45)
the the common sense of the physical
(00:44:46)
world, learn its laws, but also to be
(00:44:52)
able to of course learn from data and
(00:44:54)
the data is quite scarce and to be able
(00:44:57)
to evaluate whether that AI is working,
(00:44:59)
meaning it has to simulate in an
(00:45:02)
environment. How does an AI know that
(00:45:05)
the the actions that it's performing is
(00:45:08)
consistent with what it should do if it
(00:45:10)
doesn't have the ability to simulate the
(00:45:12)
response of the physical world back on
(00:45:14)
its actions. The response of its actions
(00:45:16)
is really important to simulate
(00:45:18)
otherwise there's no way to evaluate it.
(00:45:20)
It's different every time. And so this
(00:45:22)
basic system requires three computers.
(00:45:26)
One computer of course the one that we
(00:45:28)
know that Nvidia builds for training the
(00:45:30)
AI models. Another computer that we know
(00:45:34)
is to inference the computer. Inference
(00:45:36)
the models. Inferencing the model is
(00:45:38)
essentially a robotics computer that
(00:45:40)
runs in a car or runs in a robot or runs
(00:45:43)
in a factory, runs anywhere at the edge.
(00:45:45)
But there has to be another computer
(00:45:47)
that's designed for simulation and
(00:45:50)
simulation is at the heart of almost
(00:45:52)
everything Nvidia does. This is this is
(00:45:54)
where we are most comfortable and
(00:45:57)
simulation was really the foundations of
(00:46:00)
almost everything that we've done with
(00:46:01)
physical AI. So we have three computers
(00:46:04)
and multiple stacks that run on these
(00:46:07)
computers, these libraries to make them
(00:46:08)
useful. Omniverse is our digital twin
(00:46:11)
physically based simulation world.
(00:46:14)
Cosmos as I mentioned earlier is our
(00:46:17)
foundation model not a foundation model
(00:46:19)
for language but a foundation model of
(00:46:21)
the world.
(00:46:23)
and is also aligned with language. You
(00:46:26)
could say something like, you know,
(00:46:27)
what's happening to the ball and they'll
(00:46:28)
they'll tell you the ball's rolling down
(00:46:30)
the street. And so a world foundation
(00:46:32)
model and then of course the robotics
(00:46:34)
models. We have two of them. One of them
(00:46:37)
is called Groot. The other one's called
(00:46:39)
Alpamo that I'm going to tell you about.
(00:46:41)
Now the one of the most important things
(00:46:43)
that we have to do with physical AI is
(00:46:45)
to create the data to train the AI in
(00:46:47)
the first place. Where does that data
(00:46:48)
come from? rather than instead of having
(00:46:52)
languages because we created a bunch of
(00:46:54)
texts that are what we consider ground
(00:46:57)
truth that the AI can learn from. How do
(00:47:00)
we teach an AI the ground truth of
(00:47:02)
physics? There lots and lots of videos,
(00:47:05)
lots and lots of videos, but hardly
(00:47:07)
enough to capture the diversity and the
(00:47:09)
type of interactions that we need. And
(00:47:12)
so this is where great minds came
(00:47:15)
together and transformed
(00:47:18)
what used to be compute into data. Now
(00:47:23)
using synthetic data generation that is
(00:47:25)
grounded and conditioned by the laws of
(00:47:28)
physics, grounded and conditioned by
(00:47:31)
ground truth, we can now selectively
(00:47:35)
cleverly generate data that we can then
(00:47:38)
use to train the AI. So for example,
(00:47:41)
what comes into this AI, this Cosmos AI
(00:47:43)
world model on the left on over here is
(00:47:47)
the output of a traffic simulator.
(00:47:51)
Now this traffic simulator
(00:47:54)
is hardly enough for an AI to learn
(00:47:56)
from. We can take this, put it into a
(00:47:59)
Cosmos foundation model and generate
(00:48:03)
surround video that is physically based
(00:48:06)
and physically plausible that the AI can
(00:48:09)
now learn from. And there are so many
(00:48:11)
examples of this. Let me show you what
(00:48:13)
Cosmos can do.
(00:48:18)
The chat GPT moment for physical AI is
(00:48:21)
nearly here, but the challenge is clear.
(00:48:25)
The physical world is diverse and
(00:48:27)
unpredictable.
(00:48:29)
Collecting real world training data is
(00:48:32)
slow and costly and it's never enough.
(00:48:35)
The answer is synthetic data. It starts
(00:48:39)
with NVIDIA Cosmos, an open Frontier
(00:48:43)
World Foundation model for physical AI
(00:48:47)
pre-trained on internet scale video,
(00:48:50)
real driving and robotics data, and 3D
(00:48:52)
[music] simulation.
(00:48:55)
Cosmos learned a unified representation
(00:48:57)
of the world, able to align language,
(00:49:00)
images, 3D, and action.
(00:49:04)
It performs physical AI skills like
(00:49:06)
generation, reasoning, and trajectory
(00:49:09)
prediction
(00:49:11)
from a single image. Cosmos generates
(00:49:14)
realistic video
(00:49:17)
from 3D scene descriptions, physically
(00:49:21)
coherent motion,
(00:49:24)
from driving telemetry and sensor logs,
(00:49:26)
surround video
(00:49:30)
from planning simulators,
(00:49:32)
multi- camera environments,
(00:49:35)
or from scenario prompts. It brings edge
(00:49:38)
cases to life.
(00:49:41)
Developers can run interactive closed
(00:49:43)
loop simulations in Cosmos. When actions
(00:49:46)
are made, the world responds.
(00:49:51)
Cosmos reasons.
(00:49:54)
It analyzes edge scenarios,
(00:49:56)
breaks them down into familiar physical
(00:49:58)
interactions, and [music] reasons about
(00:50:01)
what could happen next.
(00:50:04)
Cosmos turns compute [music] into data,
(00:50:07)
training AVs for the longtail and robots
(00:50:10)
how to adapt for every scenario.
(00:50:21)
I know it's incredible.
(00:50:24)
Cosmos is the world's leading foundation
(00:50:28)
model. World foundation model. It's been
(00:50:30)
downloaded millions of times, used all
(00:50:32)
over the world, getting world getting
(00:50:35)
the world ready for this new era of
(00:50:36)
physical AI. We use it ourselves as
(00:50:39)
well. We use it ourselves to create our
(00:50:42)
self-driving car,
(00:50:45)
using it for scenario generation and
(00:50:48)
using it for evaluation.
(00:50:50)
We could have something that allows us
(00:50:53)
to effectively travel billions,
(00:50:56)
trillions of miles, but doing it inside
(00:50:59)
a computer. And we've made enormous
(00:51:01)
progress. Today, we're announcing Alpio,
(00:51:06)
the world's first
(00:51:08)
thinking reasoning autonomous vehicle
(00:51:12)
AI. Alpo is trained end to end.
(00:51:17)
Literally from camera in to actuation
(00:51:20)
out. The camera in lots and lots of
(00:51:23)
miles that are driven by itself
(00:51:26)
where human drive it dri using human
(00:51:30)
demonstration
(00:51:31)
and we have lots and lots of miles that
(00:51:33)
are generated by cosmos. In addition to
(00:51:36)
that, hundreds of thousands of examples
(00:51:39)
are labeled very, very carefully so that
(00:51:42)
we could teach the car how to drive.
(00:51:44)
Alpha Mayo does something that's really
(00:51:46)
special. Not only does it take sensor
(00:51:49)
input and activates steering wheel,
(00:51:53)
brakes, and and acceleration, it also
(00:51:57)
reasons about what action it is about to
(00:52:01)
take. It tells you what action it's
(00:52:03)
going to take. the reason by which it
(00:52:05)
came about that action and then of
(00:52:07)
course the trajectory.
(00:52:10)
All of these are coupled directly and
(00:52:12)
trained very specifically by a large
(00:52:15)
combination of human trained and as well
(00:52:17)
as Cosmos generated data. The result of
(00:52:22)
it is just really incredible. Not only
(00:52:24)
does your car drive as you would expect
(00:52:26)
it to drive and it drives so naturally
(00:52:29)
because it learned directly from human
(00:52:31)
demonstrators but in every single
(00:52:34)
scenario when it comes up to the
(00:52:35)
scenario it reasons about it tells you
(00:52:37)
what it's going to do and it reasons
(00:52:38)
about what you what's about to do. Now
(00:52:40)
the reason why this is so important is
(00:52:43)
because of the long tale of driving
(00:52:45)
there. It's impossible for us to simply
(00:52:48)
collect every single possible scenario
(00:52:51)
for everything that could ever happen in
(00:52:53)
every single country in every single
(00:52:54)
circumstance that's possibly ever going
(00:52:57)
to happen for all the population.
(00:53:00)
However, it is very unlikely is very
(00:53:03)
likely that every scenario if decomposed
(00:53:07)
into a whole bunch of other smaller
(00:53:09)
scenarios are quite normal for you to
(00:53:11)
understand. And so these long tails will
(00:53:15)
be decomposed into quite normal
(00:53:17)
circumstances that the card knows how to
(00:53:19)
deal with. It just needs to reason about
(00:53:21)
it. And so let's take a look. Everything
(00:53:22)
you're about to see is one shot. It's a
(00:53:26)
no hands.
(00:53:31)
>> Routing to your destination.
(00:53:34)
Buckle up.
(00:53:38)
Heat. Heat.
(00:54:03)
Heat. Heat.
(00:55:01)
Hallelujah.
(00:55:43)
Heat.
(00:55:47)
Heat.
(00:56:06)
You have arrived.
(00:56:19)
>> [applause]
(00:56:22)
>> We started working on self-driving cars
(00:56:23)
eight years ago. And the reason for that
(00:56:25)
is because we reasoned early on that
(00:56:29)
deep learning and artificial
(00:56:30)
intelligence was going to reinvent the
(00:56:31)
entire computing stack. And if we were
(00:56:34)
ever going to understand how to navigate
(00:56:38)
ourselves and how to guide the industry
(00:56:40)
towards this new future, we have to get
(00:56:42)
good at building the entire stack. Well,
(00:56:46)
as I mentioned earlier, AI is a five
(00:56:49)
layer cake. The lowest layer is land
(00:56:51)
power and shell. In the case of
(00:56:53)
robotics, the lowest layer is the car.
(00:56:55)
The next layer above it is chips, GPUs,
(00:56:58)
networking chips, CPUs, all that kind of
(00:57:00)
stuff. The next layer above that is the
(00:57:03)
infrastructure.
(00:57:05)
That infrastructure in this particular
(00:57:07)
case as I mentioned with physical AI is
(00:57:10)
omniverse and cosmos.
(00:57:12)
And then above that are the models. And
(00:57:16)
in the case of the models above that I
(00:57:20)
just shown you,
(00:57:22)
the model here is called Alpha Mayo. And
(00:57:25)
Alpha Mayo today is open sourced. We
(00:57:28)
this incredible body of work. It took
(00:57:31)
several thousand people. Our AV team is
(00:57:34)
several thousand people. Just to put in
(00:57:36)
perspective, our partner uh Ola, I think
(00:57:40)
Ola's here in the audience somewhere.
(00:57:42)
Uh, Mercedes agreed to partner with us
(00:57:45)
five years ago to go make all of this
(00:57:48)
possible. We imagine that someday a
(00:57:51)
billion cars on the road will all be
(00:57:52)
autonomous. You could either have it be
(00:57:55)
a robo taxi that you're you're
(00:57:57)
you'rechestrating
(00:57:58)
and and renting from somebody or you
(00:58:00)
could own it and is driving for driving
(00:58:02)
by itself or you could decide to drive
(00:58:04)
for yourself and so but every single car
(00:58:06)
will have autonomous vehicle capability.
(00:58:08)
every single car will be AI powered. And
(00:58:10)
so the the the model layer in this case
(00:58:13)
is Alpha Mayo and the application above
(00:58:15)
that is the Mercedes-Benz.
(00:58:18)
Okay. And so, so this entire stack is
(00:58:21)
our first Nvidia first entire stack
(00:58:24)
endeavor and we've been working on it
(00:58:26)
for this entire time and I'm just so
(00:58:28)
happy that the first AV car from Nvidia
(00:58:32)
is going to be on the road in Q1 and
(00:58:35)
then it goes Europe in Q2 here in the
(00:58:38)
United States in Q1 then Europe in Q2
(00:58:40)
and I think it's Asia in Q3 and Q4 and
(00:58:43)
the powerful thing is that we're going
(00:58:44)
to keep on updating it with next ver
(00:58:47)
next versions of Alpa Mayo and versions
(00:58:48)
after that. There's no question in my
(00:58:51)
mind now that this is going to be one of
(00:58:53)
the largest robotics industries and I'm
(00:58:55)
so happy that we worked on it and it
(00:58:57)
taught us enormous amount about how to
(00:59:00)
help the rest of the world build robotic
(00:59:02)
systems. That deep understanding in
(00:59:05)
knowing how to build it ourselves,
(00:59:06)
building the entire infrastructure
(00:59:08)
ourselves and knowing what kind of chips
(00:59:10)
a robotic system would would need. In
(00:59:13)
this partic particular case, dual Orins,
(00:59:16)
the next generation dual Thors. These
(00:59:19)
processors are designed for robotic
(00:59:21)
systems and was designed for the sa
(00:59:24)
highest level of safety capability. This
(00:59:26)
car just got rated. It just went to
(00:59:31)
production. The Mercedes-Benz CLA was
(00:59:34)
just rated by NCAAP, the world's safest
(00:59:38)
car.
(00:59:42)
>> [applause]
(00:59:44)
>> It is the only system that I know that
(00:59:46)
has every single line of code, the chip,
(00:59:50)
the system, every line of code safety
(00:59:53)
certified. The entire model system is
(00:59:55)
based on a sensors are diverse and
(00:59:58)
redundant and so is the self-driving car
(01:00:01)
stack. The Alpha Mayo stack is trained
(01:00:03)
end to end and has incredible skills.
(01:00:07)
However, nobody knows until you drive it
(01:00:10)
forever that it's going to be perfectly
(01:00:12)
safe. And so that we the way we guard
(01:00:15)
rail that is with another software
(01:00:17)
stack, an entire AV stack underneath.
(01:00:20)
That entire AV stack is built to be
(01:00:22)
fully traceable and it's taken us some
(01:00:25)
five years to build that some six, seven
(01:00:27)
years actually to build that second
(01:00:28)
stack. These two software stacks are
(01:00:31)
mirroring each other and then we have a
(01:00:34)
policy and safety evaluator to decide is
(01:00:36)
this something that I'm very confident
(01:00:38)
and can reason about driving very
(01:00:41)
safely. If so, I'm going to have Alpamo
(01:00:43)
do it. If it's a circumstance that I'm
(01:00:44)
not very confident in and the safety um
(01:00:47)
policy evaluator decide that we're going
(01:00:50)
to go back to a a very a simpler, safer
(01:00:52)
guard rail system, then it goes back to
(01:00:54)
the classical AV stack. We're the only
(01:00:56)
car in the world with both of these AV
(01:00:58)
stacks running and all safety systems
(01:01:01)
should have diversity and redundancy.
(01:01:03)
Well, our vision is that someday every
(01:01:05)
single car, every single truck will be
(01:01:07)
autonomous. And we've been working
(01:01:08)
towards that future. This entire stack
(01:01:11)
is vertically integrated. Of course, in
(01:01:13)
the case of Mercedes-Benz, we built the
(01:01:15)
entire stack together. We're going to
(01:01:16)
deploy the car. We're going to operate
(01:01:18)
the stack. We're going to maintain the
(01:01:19)
stack for as long as we shall live.
(01:01:21)
However, like everything else we do as a
(01:01:24)
company, we build the entire stack, but
(01:01:27)
the entire stack is open for the
(01:01:29)
ecosystem. And these the ecosystem
(01:01:32)
working with us to build L4 and robo
(01:01:34)
taxis is expanding and it's going
(01:01:36)
everywhere.
(01:01:38)
I fully expect this to be well this is
(01:01:40)
already a giant business for us. It's a
(01:01:42)
giant business for us because they use
(01:01:44)
it for training our training data,
(01:01:46)
processing data and training their
(01:01:48)
models. They use it for synthetic data
(01:01:50)
generation in some cases. In some car,
(01:01:52)
in some companies, they pretty much just
(01:01:55)
build uh the computers, the chips that
(01:01:57)
are inside the car. And some companies
(01:01:59)
work with us full stack. Some companies
(01:02:01)
work with us some partial part of that.
(01:02:03)
Okay? So, it doesn't matter uh how much
(01:02:06)
you decide to use. You know, my only
(01:02:07)
request is use a little bit of video
(01:02:09)
wherever you can and uh you know, but uh
(01:02:13)
the entire thing is open. Now this is
(01:02:17)
going to be the first largecale
(01:02:20)
mainstream
(01:02:21)
um AI physical AI market and this is now
(01:02:25)
I think we can all agree fully here and
(01:02:28)
this inflection point of going from not
(01:02:31)
autonomous vehicles to autonomous
(01:02:33)
vehicles is probably happening right
(01:02:35)
about this time in in the next 10 years
(01:02:38)
I'm fairly certain a very very large
(01:02:41)
percentage of the world's cars will be
(01:02:43)
autonomous or highly autonomous but this
(01:02:45)
This basic technique that I just
(01:02:47)
described in using the three computers
(01:02:50)
using synthetic data generation and
(01:02:52)
simulation applies to every form of
(01:02:55)
robotic systems. It could be a robot
(01:02:57)
that is just an articulator, a
(01:02:59)
manipulator, maybe it's a mobile robot,
(01:03:01)
maybe it's a fully humanoid robot. And
(01:03:04)
so the next journey,
(01:03:07)
the next era for robotic systems is
(01:03:10)
going to be, you know, robots. And these
(01:03:12)
robots are going to come in all kinds of
(01:03:13)
different sizes and and uh I invited
(01:03:16)
some friends. Did they come?
(01:03:24)
>> Hey guys,
(01:03:26)
hurry up. I got a lot of stuff to cover.
(01:03:30)
>> Come on, hurry.
(01:03:35)
Did you tell R2-D2 you were going to be
(01:03:37)
here?
(01:03:38)
>> Did you? And C3PO.
(01:03:44)
Okay. All right. Come here. Before now,
(01:03:47)
one of the things that one of the things
(01:03:48)
that's really You have Jetson's. They
(01:03:50)
have little Jetson computers inside
(01:03:51)
them. They're trained inside Omniverse.
(01:03:55)
And how about this? Let's show everybody
(01:03:58)
the simulator that you were that you
(01:04:00)
guys learned how to how to be robots in.
(01:04:03)
You you guys want to look at that?
(01:04:05)
>> Okay, let's look at that. Run it,
(01:04:06)
please.
(01:04:10)
See,
(01:04:28)
Okay.
(01:05:08)
Isn't that amazing?
(01:05:13)
That's how you learn to be a robot. You
(01:05:15)
did it all inside Omniverse. And the
(01:05:18)
robot simulator is called Isaac. Isaac
(01:05:20)
Sim and Isaac Lab. And anybody who wants
(01:05:23)
to build a robot, you know, nobody could
(01:05:26)
nobody's going to be as cute as you.
(01:05:28)
But now we have all look at all these
(01:05:31)
look at all these friends that we have
(01:05:32)
building robots. We have we're building
(01:05:34)
big ones. No, like I said, nobody's as
(01:05:36)
cute as you guys are. But we have
(01:05:38)
Neurobot and we have we have Aubot.
(01:05:40)
Aubot over there, you know. We have uh
(01:05:44)
LG over here. They just announced a new
(01:05:46)
robot, Caterpillar. They've got the
(01:05:48)
largest robots ever. That one delivers
(01:05:52)
food to your house. That's connected to
(01:05:54)
Uber Eats. And that's Surf Robot. I love
(01:05:56)
those guys. Agility, Boston Dynamics,
(01:06:01)
incredible. You got surgical robots, you
(01:06:03)
got manipulator robots from Franka,
(01:06:07)
you got universal robotics robot,
(01:06:09)
incredible number of different robots.
(01:06:11)
And so this is the next chapter. We're
(01:06:14)
going to talk a lot more about robotics
(01:06:15)
in the future, but it's not just about
(01:06:18)
the robots in the end. I know
(01:06:19)
everything's about you guys. It's about
(01:06:21)
getting there. And one of the air one of
(01:06:24)
the most important industries in the
(01:06:25)
world that will be revolutionized by
(01:06:27)
physical AI and AI physics
(01:06:31)
is the industry that started all of us
(01:06:34)
at NVIDIA. It wouldn't be possible if
(01:06:37)
not for the companies that I'm about to
(01:06:39)
talk to. And I'm so happy that all of
(01:06:41)
them starting with Cadence is going to
(01:06:43)
accelerate everything. Cadence CUDA X
(01:06:46)
integrated into all of their simulations
(01:06:48)
and solvers. They've got uh Nvidia
(01:06:51)
physical physical AIs that they're going
(01:06:53)
to use for uh for different um physical
(01:06:56)
plants and plant simulations. You got AI
(01:06:58)
physics being integrated into these
(01:07:00)
systems. So whether it's an EDA or STA
(01:07:04)
um and in the future robotic systems,
(01:07:06)
we're going to have basically the same
(01:07:08)
technology that made you guys possible
(01:07:11)
now completely revolutionize these
(01:07:13)
design stacks. Synopsis without synopsis
(01:07:16)
you know synopsis and cadence are
(01:07:19)
completely completely indispensable in
(01:07:22)
the world of chip design. Synopsis is uh
(01:07:25)
leads in uh and uh logic design and and
(01:07:29)
IP uh in the case of cadence they lead
(01:07:32)
physical design the place and route uh
(01:07:35)
and emulation and verification. Cadence
(01:07:37)
is incredible at emulation and
(01:07:39)
verification. Both of them are moving
(01:07:41)
into the world of system design and
(01:07:42)
system simulation. And so in the future,
(01:07:46)
we're going to design your chips inside
(01:07:49)
Cadence and inside Synopsis. We're going
(01:07:51)
to design your systems and emulate the
(01:07:53)
whole thing and simulate everything
(01:07:56)
inside these tools. That's your future.
(01:07:58)
We're going to give Yeah, you're going
(01:07:59)
to be born inside these inside these
(01:08:02)
platforms. Pretty amazing, right? And so
(01:08:05)
we're so happy that we're working with
(01:08:06)
these these industries just as we've
(01:08:08)
integrated NVIDIA into Palunteer and
(01:08:11)
Service Now we're integrating NVIDIA
(01:08:13)
into the most computationally intensive
(01:08:16)
simulation industries synopsis and
(01:08:19)
cadence. And today we're announcing that
(01:08:22)
Seammens is also doing the same thing.
(01:08:24)
We're going to integrate CUDA X physical
(01:08:27)
AI agentic AI neo neotron deeply
(01:08:30)
integrated into the world of seammens.
(01:08:33)
And the reason for that is this. First,
(01:08:36)
we designed the chips
(01:08:39)
and all of it in the future will be
(01:08:40)
accelerated by Nvidia. You're going to
(01:08:42)
be very happy about that. We're going to
(01:08:44)
have Agentic chip designers and system
(01:08:46)
designers working with us, helping us do
(01:08:49)
design just as we have agentic software
(01:08:52)
engineers helping our software engineers
(01:08:54)
code today. And so, we'll have agentic
(01:08:56)
chip designers and system designers.
(01:08:58)
We're going to create you inside this.
(01:09:01)
But then we have to build you. We have
(01:09:04)
to build the plants, the factories that
(01:09:07)
make manufacture you. We have to design
(01:09:11)
the manufacturing lines that assemble
(01:09:13)
all of you. And these manufacturing
(01:09:16)
plants are going to be essentially
(01:09:18)
gigantic robots. Incredible, isn't that
(01:09:21)
right?
(01:09:22)
I know. I know. And so you're going to
(01:09:25)
be designed in a computer. You're going
(01:09:28)
to be made in a computer. You're gonna
(01:09:30)
be tested and evaluated in a computer
(01:09:32)
long before long before you have to
(01:09:35)
spend any time dealing with gravity.
(01:09:38)
I know.
(01:09:40)
Do you know how to deal with gravity?
(01:09:43)
Can you jump?
(01:09:46)
Can you jump?
(01:09:56)
>> [laughter]
(01:09:58)
>> Okay. All right. Don't show off. Okay.
(01:10:00)
So, so this so now
(01:10:04)
the industry the industry that made
(01:10:06)
Nvidia possible, we're I'm just so happy
(01:10:09)
that that now the technology that we're
(01:10:11)
creating is at a level of sophistication
(01:10:13)
and capability that we can now help them
(01:10:16)
revolutionize their industry. And so
(01:10:18)
what started with with uh with them, we
(01:10:21)
now have the opportunity to go back and
(01:10:22)
and help them revolutionize theirs.
(01:10:25)
Let's take a look at the stuff that
(01:10:26)
we're going to do with Semens.
(01:10:28)
Come on.
(01:10:30)
Breakthroughs in physical AI are letting
(01:10:33)
AI move from screens to our physical
(01:10:36)
world.
(01:10:38)
And just in time, as the world builds
(01:10:41)
factories of every kind for chips,
(01:10:44)
computers, life-saving drugs, and AI, as
(01:10:48)
the global labor shortage worsens, we
(01:10:50)
need automation powered [music] by
(01:10:52)
physical AI and robotics more than ever.
(01:10:57)
This, where AI meets the world's largest
(01:11:00)
physical industries, is the foundation
(01:11:02)
of NVIDIA and Seaman's partnership. For
(01:11:05)
nearly two centuries, Seammens has built
(01:11:08)
the world's industries.
(01:11:10)
And now [music] it is reinventing it for
(01:11:12)
the age of AI.
(01:11:15)
Seammens is integrating NVIDIA CUDA X
(01:11:18)
libraries, AI models, and Omniverse
(01:11:22)
into its portfolio of EDA,
(01:11:27)
CAE,
(01:11:29)
and digital [music] twin tools and
(01:11:31)
platforms.
(01:11:33)
Together, we're bringing physical AI to
(01:11:36)
the full industrial life cycle.
(01:11:39)
From design and simulation
(01:11:42)
to production
(01:11:46)
and operations,
(01:11:48)
we stand at the beginning of a new
(01:11:50)
industrial revolution, the age of
(01:11:52)
physical AI built by Nvidia and Seammens
(01:11:56)
for the next age of industries.
(01:12:02)
Incredible, right guys?
(01:12:06)
What do you think? All right, I'll hang
(01:12:08)
on tight. Just hang on tight. And so so
(01:12:11)
this is, you know, if you look at look
(01:12:13)
at the world's models, there's no
(01:12:16)
question OpenAI is the the the leading
(01:12:19)
token generator today. More to more open
(01:12:22)
AAI tokens are generated than just about
(01:12:23)
anything else. The second largest group,
(01:12:26)
the second largest is probably open
(01:12:28)
models. And my guess is that over time
(01:12:30)
because there are so many companies, so
(01:12:32)
many researchers, so many different
(01:12:34)
types of domains and modalities that
(01:12:36)
open-source models will be by far the
(01:12:38)
largest. Let's talk about somebody
(01:12:40)
really special. You guys want to do
(01:12:42)
that? Let's talk about Vera Rubin.
(01:12:46)
Vera Rubin. Yeah, go ahead. She's a
(01:12:49)
American astronomer.
(01:12:51)
She was the first to observe. She
(01:12:53)
noticed that the tails of the galaxies
(01:12:56)
were moving about as fast
(01:12:59)
as the center of the galaxies. Well, I
(01:13:03)
know it makes no sense. It makes no
(01:13:05)
sense. Newtonian physics would say just
(01:13:07)
like the solar system, the planets
(01:13:09)
further away from the sun is circulating
(01:13:13)
circ cir circling the sun slower than
(01:13:16)
the planets closer to the sun. And
(01:13:19)
therefore it makes no sense that this
(01:13:21)
happens unless there's
(01:13:24)
invisible bodies we call them she
(01:13:26)
discovered dark body dark matter um that
(01:13:31)
occupies space even though we don't see
(01:13:33)
it and so Vera Rubin is the person that
(01:13:35)
we named our next computer after. Isn't
(01:13:39)
that a good idea?
(01:13:41)
I know.
(01:13:45)
Okay. Okay, Vera Rubin is designed to
(01:13:47)
address this fundamental challenge that
(01:13:49)
we have. The amount of computation
(01:13:51)
necessary for AI is skyrocketing. The
(01:13:55)
demand for NVIDIA GPUs is skyrocketing.
(01:13:58)
It's skyrocketing because models are
(01:14:00)
increasing by a factor of 10, an order
(01:14:02)
of a magnitude every single year. And
(01:14:06)
not to mention, as I mentioned, 01's
(01:14:08)
introduction was an inflection point for
(01:14:11)
AI. Instead of a oneshot answer,
(01:14:14)
inference is now a thinking process. And
(01:14:17)
in order to teach the AI how to think,
(01:14:20)
reinforcement learning and very
(01:14:23)
significant computation was introduced
(01:14:25)
into post training. It wasn't no long
(01:14:28)
it's no longer supervised fine-tuning or
(01:14:31)
otherwise known as imitation learning or
(01:14:33)
supervision training.
(01:14:35)
You now have reinforcement learning.
(01:14:37)
Essentially the computer trial it trying
(01:14:40)
different iterations itself learning how
(01:14:42)
to perform a task. The amount of
(01:14:45)
computation for pre pre-training for
(01:14:48)
post- training for test time scaling has
(01:14:50)
exploded as a result of that. And now
(01:14:53)
every single inference that we do
(01:14:55)
instead of just one shot the number of
(01:14:57)
tokens you can just see the AI think
(01:14:59)
which we appreciate. The longer it
(01:15:01)
thinks oftentimes it produces a better
(01:15:02)
answer. And so test time scaling causes
(01:15:05)
the number of tokens to be generated to
(01:15:07)
increase by 5x every single year. Not to
(01:15:10)
mention,
(01:15:12)
meanwhile, the race is on for AI.
(01:15:15)
Everybody's trying to get to the next
(01:15:17)
level. Everybody's trying to get to the
(01:15:18)
next frontier. And every time they get
(01:15:20)
to the next frontier, the last
(01:15:22)
generation AI tokens, the cost starts to
(01:15:26)
starts to decline about a factor of 10x
(01:15:29)
every year. The 10x decline every year
(01:15:31)
is actually telling you something
(01:15:33)
different. It's saying that the race is
(01:15:35)
so intense. Everybody's trying to get to
(01:15:37)
the next level and somebody is getting
(01:15:39)
to the next level. And so therefore, all
(01:15:42)
of it is a computing problem. The faster
(01:15:44)
you compute, the sooner you can get to
(01:15:46)
the next level of the next frontier. All
(01:15:49)
of these things are simultaneously
(01:15:50)
happening at the same time. And so we
(01:15:53)
decided that we have to advance
(01:15:56)
the state-of-the-art of computation
(01:15:59)
every single year. Not one year left
(01:16:02)
behind. And now we've been shipping
(01:16:05)
GB200s
(01:16:07)
year and a half ago. Right now we're in
(01:16:09)
fullscale manufacturing of GB300.
(01:16:13)
And if Vera Rubin is going to be in time
(01:16:16)
for this year, it must be in production
(01:16:19)
by now. And so today I can tell you that
(01:16:22)
Vera Rubin is in full production.
(01:16:30)
You guys want to take a look at Vera
(01:16:31)
Rubin?
(01:16:32)
>> All right. Come on.
(01:16:34)
>> Play it, please.
(01:16:38)
Vera Rubin arrives just in time for the
(01:16:41)
next frontier of AI.
(01:16:44)
This is [music] the story of how we
(01:16:45)
built it. The architecture, a system of
(01:16:49)
six chips [music] engineered to work as
(01:16:51)
one, born from extreme code design. It
(01:16:54)
begins with Vera, [music] a
(01:16:55)
custom-designed CPU, double the
(01:16:57)
performance of the previous generation.
(01:16:59)
And the Reuben GPU, Vera and Reuben are
(01:17:02)
co-designed from the [music] start to
(01:17:04)
birectionally and coherently share data
(01:17:07)
faster and with lower latency.
(01:17:10)
Then 17,000 components come together on
(01:17:14)
a Ver Rubin compute board.
(01:17:18)
High-speed robots place components with
(01:17:21)
micro precision before the Vera CPU and
(01:17:24)
two Reuben GPUs complete the assembly.
(01:17:28)
Capable of delivering 100 pedlops of AI,
(01:17:32)
five times that of its predecessor.
(01:17:36)
AI needs data fast.
(01:17:39)
Connect X9 delivers 1.6 6 terabts per
(01:17:42)
second of scale out bandwidth to each
(01:17:45)
GPU.
(01:17:48)
Bluefield 4 DPU offloads storage and
(01:17:50)
security [music] so compute stays fully
(01:17:53)
focused on AI.
(01:17:55)
The Vera Rubin compute tray completely
(01:17:58)
redesigned with no cables, hoses, or
(01:18:01)
fans. Featuring a Bluefield 4 DPU, eight
(01:18:05)
Connect X9 Nix, two Vera CPUs, and four
(01:18:10)
Reuben GPUs. The compute building block
(01:18:13)
of the Vera Rubin AI supercomput.
(01:18:16)
Next, the sixth generation MVLink
(01:18:20)
switch. Moving more data than the global
(01:18:23)
internet, connecting 18 compute nodes,
(01:18:26)
scaling up to 72 Reuben GPUs, operating
(01:18:29)
as one.
(01:18:33)
Then Spectrum X Ethernet Photonix,
(01:18:38)
the world's first Ethernet [music]
(01:18:40)
switch with 512 lanes and 200 Gbit
(01:18:43)
capable co-packaged optics scale out
(01:18:46)
thousands of racks into an AI factory.
(01:18:51)
15,000 engineer years since design
(01:18:53)
began, the first Vera Rubin MVL 72
(01:18:57)
[music] rack comes online. Six
(01:19:00)
breakthrough chips, 18 compute trades,
(01:19:03)
nine MVLink switch trays, 220 trillion
(01:19:06)
transistors weighing nearly two tons.
(01:19:12)
One giant leap to the next frontier of
(01:19:15)
AI.
(01:19:17)
Reuben is here.
(01:19:24)
What do you guys think?
(01:19:29)
This is a Reuben pod. 1152 GPUs
(01:19:35)
in 16 racks. Each one of the racks, as
(01:19:39)
you know, has 72
(01:19:44)
Vera Rubin or 72 Reubins. Each one of
(01:19:48)
the Reubins is two actual GPU dies
(01:19:51)
connected together. I'm going to show
(01:19:53)
I'm going to show it to you, but there
(01:19:55)
are several things that Well, I'll tell
(01:19:58)
you later.
(01:20:00)
I can't tell you everything right away.
(01:20:04)
Well, we designed six different chips.
(01:20:07)
First of all, we have a rule inside our
(01:20:08)
company and it's a good rule. No new
(01:20:11)
generation should have more than one or
(01:20:14)
two chips change. But the problem is
(01:20:16)
this. As you could see, we were
(01:20:19)
describing the total number of
(01:20:20)
transistors in each one of the chips
(01:20:22)
that were being described. And we know
(01:20:24)
that Moore's law has largely slowed. And
(01:20:26)
so, the number of transistors we can get
(01:20:29)
year after year after year can't
(01:20:32)
possibly keep up with the 10 times
(01:20:35)
larger models. It can't possibly keep up
(01:20:38)
with five times per year more tokens
(01:20:41)
generated. It can't possibly keep up
(01:20:43)
with the fact that
(01:20:45)
cost decline of the tokens are going to
(01:20:47)
be so aggressive. It is impossible to
(01:20:50)
keep up with those kind of rates if the
(01:20:52)
indust for the industry to continue to
(01:20:54)
advance unless we deploy aggressive
(01:20:58)
extreme code design basically innovating
(01:21:01)
across all of the chips across the
(01:21:03)
entire stack all at the same time. which
(01:21:06)
is the reason why we decided that this
(01:21:08)
generation we had no choice but to
(01:21:10)
design every chip over again. Now every
(01:21:14)
single chip that we were describing just
(01:21:16)
now can be a press conference in all in
(01:21:18)
itself and there's an entire company
(01:21:20)
who's probably dedicated to doing that
(01:21:21)
back in the old days. Each one of them
(01:21:23)
are completely revolutionary and the
(01:21:25)
best of its kind.
(01:21:28)
The Vera CPU I'm so proud of it in a
(01:21:31)
power constrained world. Gray CPU is two
(01:21:35)
times the performance in a power
(01:21:38)
constrained world. It's twice the
(01:21:40)
performance per watt of the world's most
(01:21:42)
advanced CPUs. Its data rate is insane.
(01:21:45)
It was designed to process
(01:21:48)
supercomputers and Vera was an
(01:21:51)
incredible GPU. Grace was an incredible
(01:21:53)
GPU. Now Vera increases the single
(01:21:57)
threaded performance, increases the
(01:21:59)
capacity of the memory, increases
(01:22:01)
everything just dramatically. It's a
(01:22:03)
giant chip. This is the Vera CPU.
(01:22:06)
This is one CPU.
(01:22:09)
And this is connected to
(01:22:14)
the Reuben GPU. Look at that thing.
(01:22:18)
It's a giant chip. Now, the thing that's
(01:22:20)
really special, and I I'll go through
(01:22:23)
these. It's going to take three hands. I
(01:22:25)
think four hands to do this. Okay. So,
(01:22:28)
this is the Vera CPU. It's got 88 CPU
(01:22:31)
cores. And the CPU cores are designed to
(01:22:33)
be multi-threaded. But the
(01:22:34)
multi-threaded nature of of Vera was
(01:22:37)
designed so that each one of the 176
(01:22:40)
threads could get its full full
(01:22:43)
performance. So it's essentially as of
(01:22:45)
there's 176 cores but only 88 physical
(01:22:48)
cores. So these cores were designed in
(01:22:50)
in using a technology called spatial
(01:22:52)
multi-threading. But the IO performance
(01:22:55)
is incredible. This is the Reuben GPU.
(01:22:57)
It's 5x blackwell in floating
(01:23:00)
performance. But the important thing is
(01:23:02)
go to the bottom line. The bottom line
(01:23:03)
it's only 1.6 times the number of
(01:23:06)
transistors in black wall. That kind of
(01:23:07)
tells you something about the the levels
(01:23:10)
of semiconductor physics today. If we
(01:23:12)
don't do code design, if we do don't do
(01:23:15)
extreme code design at the level of
(01:23:17)
basically every single chip
(01:23:20)
across the entire system, how is it
(01:23:22)
possible we deliver performance levels
(01:23:25)
that is, you know, at best one point 1
(01:23:28)
1.6 times each year? Because that's the
(01:23:30)
total number of transistors you have.
(01:23:32)
And even if you were to have a little
(01:23:34)
bit more performance per transistor, say
(01:23:36)
25%, you're this impossible to get a
(01:23:39)
100% yield out of the number of
(01:23:41)
transistors you get. And so 1.6x kind of
(01:23:44)
puts a ceiling on how far performance
(01:23:46)
can go each year unless you do something
(01:23:48)
extreme. And we call it extreme code
(01:23:50)
design. Well, one of the things that one
(01:23:52)
of the things that we did and it was a
(01:23:53)
great invention. It's called MVF FP4
(01:23:56)
tensor core. The transformer engine
(01:23:59)
inside our chip is not just a 4bit
(01:24:02)
floatingoint number somehow that we put
(01:24:04)
into the data path. It is an entire
(01:24:06)
processor, a processing unit that
(01:24:10)
understands how to dynamically,
(01:24:12)
adaptively adjust its precision and
(01:24:15)
structure to deal with different levels
(01:24:18)
of the transformer so that you can
(01:24:19)
achieve higher throughput wherever it's
(01:24:22)
possible to lose precision and to go
(01:24:25)
back to the highest possible precision
(01:24:26)
wherever you need to. That ability to
(01:24:29)
dynamically do that. You can't do this
(01:24:32)
in software because obviously it's just
(01:24:34)
running too fast. And so you have to be
(01:24:37)
able to do it adaptively inside the
(01:24:39)
processor. That's what an MVF FP4 is.
(01:24:42)
When somebody says FP4 or FP8, it almost
(01:24:45)
means nothing to us. And the reason for
(01:24:47)
that is because it's the tensor core
(01:24:49)
structure in all of the algorithms that
(01:24:50)
makes makes it work. MVFP4, we've
(01:24:53)
published papers on this already. The
(01:24:55)
precision that the the level of
(01:24:57)
throughput and precision is able to
(01:24:58)
retain is in completely incredible. This
(01:25:01)
is groundbreaking work. I would not be
(01:25:03)
surprised that the industry would like
(01:25:05)
us to make this format and this
(01:25:06)
structure and industry standard in the
(01:25:08)
future. This is completely
(01:25:10)
revolutionary. This is how we were able
(01:25:12)
to deliver such a gigantic step up in
(01:25:15)
performance even though we only have 1.6
(01:25:18)
times the number of transistors. Okay.
(01:25:21)
So this is and now once you have a great
(01:25:23)
processing node and this is the
(01:25:25)
processor node and inside so this is
(01:25:29)
this is for example here let me do this.
(01:25:36)
This is this is wow it's super heavy.
(01:25:40)
You have to be a co in really good shape
(01:25:42)
to do this job.
(01:25:46)
Okay. All right. So, this thing is I'm
(01:25:50)
gonna guess this is probably I don't
(01:25:53)
know couple of hundred pounds.
(01:25:59)
[laughter]
(01:26:01)
I thought that was funny, too.
(01:26:04)
Come on. It could have been. Everybody's
(01:26:07)
gone. No, I don't think so.
(01:26:10)
All right. [clears throat]
(01:26:11)
So, so look at this. This is the last
(01:26:13)
one. We revolutionized the entire MGX
(01:26:17)
chassis. This node,
(01:26:20)
43 cables,
(01:26:22)
zero cables, six tubes,
(01:26:28)
z just two of them here. It takes two
(01:26:32)
hours to assemble this.
(01:26:35)
If you're lucky, it takes two hours. And
(01:26:38)
of course, you're probably going to
(01:26:39)
assemble it wrong. You're going to have
(01:26:40)
to retest it, test it, reassemble it. So
(01:26:43)
the assembly process is incredibly
(01:26:45)
complicated and it was understandable as
(01:26:47)
one of our first supercomputers that's
(01:26:49)
deconstructed in this way. This from 2
(01:26:52)
hours to 5 minutes
(01:27:00)
80% liquid cool.
(01:27:03)
100% liquid cool.
(01:27:06)
Yeah. Really really a breakthrough.
(01:27:09)
Okay. So, so this is the new compute
(01:27:12)
chassis and what connects all of these
(01:27:16)
to the top of rack switches, the east
(01:27:18)
west traffic is called the Spectrox
(01:27:20)
Nick. This is the world's best nick.
(01:27:23)
Unquestionably, Nvidia's Melanox, the
(01:27:26)
acquisition Melanox that joined us a
(01:27:28)
long time ago now. Um, this their
(01:27:30)
networking technology for high
(01:27:31)
performance computing is the world's
(01:27:33)
best bar none. the algorithms, the chip
(01:27:36)
design, all of the interconnects, all
(01:27:38)
the software stacks that run on top of
(01:27:39)
it, their RDMA, absolutely absolutely
(01:27:42)
bar none, the world's best. And now it
(01:27:44)
has the ability to do programmable RDMA
(01:27:46)
and data path accelerator so that our
(01:27:49)
partners like AI labs could create their
(01:27:52)
own algorithms for how they want to move
(01:27:54)
data around the system. But this is
(01:27:55)
completely world worldclass connect
(01:27:58)
X9 and the Vera CPU were co-designed and
(01:28:02)
we never revealed it. not never never
(01:28:04)
released it until CX9 came along because
(01:28:08)
we we co-designed it for a new type of
(01:28:10)
processor.
(01:28:12)
You know, Connect X9 or CX8 and Spectrum
(01:28:15)
X revolutionized how Ethernet was done
(01:28:19)
for artificial intelligence. Ethernet
(01:28:21)
traffic for AI is much much more
(01:28:24)
intense, requires much lower latency.
(01:28:27)
The the instantaneous surge of traffic
(01:28:30)
is unlike anything Ethernet sees. And so
(01:28:32)
we created Spectrum X which is AI
(01:28:35)
Ethernet.
(01:28:37)
Two years ago we announced Spectrum X.
(01:28:39)
NVIDIA today is the largest networking
(01:28:42)
company the world has ever seen. So it's
(01:28:45)
been so successful and used in so many
(01:28:47)
different installations. It is just
(01:28:49)
sweeping uh the AI landscape. The
(01:28:52)
performance is incredible especially
(01:28:54)
when you have a 200 um megawatt data
(01:28:59)
center or if you have a gigawatt data
(01:29:00)
center. These are billions of dollars.
(01:29:03)
Let's say a gigawatt data center is $50
(01:29:05)
billion dollars. If the networking
(01:29:07)
performance allows you to deliver an
(01:29:10)
extra 10%
(01:29:13)
in the case of Spectrum X, delivering
(01:29:15)
25% higher throughput is not uncommon.
(01:29:18)
If we were to just deliver 10% that's
(01:29:20)
worth $5 billion. The networking is
(01:29:23)
completely free, which is the reason
(01:29:25)
why, well,
(01:29:28)
everybody uses Spectrum X. It's just an
(01:29:30)
incredible thing. And now we're going to
(01:29:32)
invent a new type a new type of uh uh
(01:29:35)
data processing. And so spectral is for
(01:29:38)
east west traffic. We now have a new
(01:29:41)
processor called blue field 4. Blue
(01:29:43)
field 4 allows us to take a large large
(01:29:45)
very large data center isolate different
(01:29:48)
parts of it so that different users
(01:29:50)
could use different parts of it. Make
(01:29:51)
sure that everything could be
(01:29:52)
virtualized if they decide to be
(01:29:54)
virtualized. So you offload a lot of the
(01:29:57)
um virtualization software, the security
(01:29:59)
software, the networking software for
(01:30:01)
your north south traffic. And so
(01:30:03)
Bluefield 4 comes standard with every
(01:30:06)
single one of these compute nodes.
(01:30:08)
Bluefield 4 has a second application I'm
(01:30:10)
going to talk about in just a second.
(01:30:12)
This is a revolutionary processor and
(01:30:14)
I'm so excited about it. This is the
(01:30:16)
MVLink 6 switch
(01:30:19)
and
(01:30:21)
it's right here.
(01:30:22)
This is the this switch. This switchip
(01:30:26)
there are four of them inside the MVLink
(01:30:28)
switch here.
(01:30:30)
Each one of these switchips has the
(01:30:32)
fastest certis in history. The world is
(01:30:35)
barely getting to 200 gigabits. This is
(01:30:38)
400 gigabits per second switch. The
(01:30:41)
reason why this is so important is so
(01:30:43)
that we could have every single GPU talk
(01:30:46)
to every other GPU at exactly the same
(01:30:48)
time. This switch, this switch on the
(01:30:52)
back plane of one of these racks enables
(01:30:56)
us to move the equivalent of twice the
(01:30:59)
amount of the global internet data,
(01:31:03)
twice as all of the world's internet
(01:31:05)
data at twice the speed. You take the
(01:31:09)
cross-sectional bandwidth of the entire
(01:31:11)
planet's internet, it's about 100
(01:31:13)
terabytes per second. This is 240
(01:31:16)
terabytes per second. So it kind of puts
(01:31:18)
it in perspective. This is so that every
(01:31:20)
single GPU can work with every single
(01:31:22)
other GPU at exactly the same time.
(01:31:24)
Okay. Then on top of that
(01:31:29)
on top of that okay so this is one rack.
(01:31:31)
This is one rack. Each one of the racks
(01:31:33)
as you could see the number of
(01:31:35)
transistors in this one rack
(01:31:39)
is 1.7 times.
(01:31:44)
Yeah. Could you do this for me? So, this
(01:31:46)
is it's usually about two tons, but
(01:31:49)
today it's two and a half tons because
(01:31:52)
um when they shipped it, they forgot to
(01:31:54)
drain the water out of it.
(01:31:58)
So, we we shipped a lot of water from
(01:32:00)
California.
(01:32:02)
[clears throat]
(01:32:05)
Can you hear it squealing?
(01:32:07)
You know, when you're rotating two and a
(01:32:08)
half tons,
(01:32:11)
you're going to squeal a little.
(01:32:14)
Oh, you could do it. Wow.
(01:32:19)
Okay, we just we won't make you do that
(01:32:21)
twice. All right. So, so um so behind
(01:32:25)
behind this are the MVLink spines.
(01:32:29)
Basically, two miles of copper cables.
(01:32:32)
Copper is the best conductor we know.
(01:32:34)
And these are all shielded copper
(01:32:36)
cables, structured copper cables, the
(01:32:38)
most the world's ever used in computing
(01:32:40)
systems ever. and and um uh our certis
(01:32:45)
drive the copper cables from the top of
(01:32:47)
the rack all the way to the bottom of
(01:32:48)
the rack at 400 gigabits per second.
(01:32:51)
It's incredible. And so uh this has two
(01:32:54)
miles of total copper cables, 5,000
(01:32:56)
copper cables, and this makes the MVLink
(01:33:00)
uh spine possible. This is the
(01:33:02)
revolution that that really started the
(01:33:05)
NGX system. Now we we decided that we
(01:33:09)
would create an industry standard system
(01:33:11)
so that the entire ecosystem all of our
(01:33:13)
supply chain could standardize on these
(01:33:16)
components. There some 80,000
(01:33:20)
different components that make up this
(01:33:23)
these NGX systems and it's a total waste
(01:33:26)
if we're to change it every single year.
(01:33:28)
Every single major computer company from
(01:33:30)
Foxcon to Quanta to Wistron, you know,
(01:33:32)
the list goes on and on and on to HP and
(01:33:35)
Dell and Lenovo, everybody knows how to
(01:33:38)
build these systems. And so the fact
(01:33:40)
that we could squeeze Ruben, Vera Rubin
(01:33:43)
into this even though the performance is
(01:33:46)
so much so much higher and very
(01:33:49)
importantly the power is twice as high.
(01:33:52)
The power of Vera Rubin is twice as high
(01:33:54)
as Grace Blackwell. And yet, and this is
(01:33:58)
the miracle,
(01:33:59)
the air that goes into it, the the air
(01:34:03)
flow is about the same. And very
(01:34:05)
importantly, the water that goes into it
(01:34:07)
is the same temperature, 45° C. With 45°
(01:34:11)
C, no water chillers are necessary for
(01:34:15)
data centers. We're basically cooling
(01:34:18)
this supercomput with hot water. Is so
(01:34:22)
incredibly efficient. And so
(01:34:25)
this is um this is the new the new rack.
(01:34:28)
1.7 times more transistors but five
(01:34:31)
times more peak inference performance.
(01:34:34)
Three and a half times more peak um uh
(01:34:37)
uh training performance.
(01:34:40)
Okay.
(01:34:42)
They're connected on top using Spectrum
(01:34:44)
X. Oh, thank you.
(01:34:52)
This is this is the world's first
(01:34:53)
manufacturing chip using
(01:34:56)
uh TSMC's
(01:34:58)
new process that we co-inovated called
(01:35:01)
coupe. is a silicon photonix integrated
(01:35:03)
silicon photonix process technology. And
(01:35:06)
this allows us to take silicon photonix
(01:35:09)
directly right to the chip. And this is
(01:35:12)
512 ports at 200 Gbits per second. And
(01:35:16)
this is the new Ethernet AI switch, the
(01:35:20)
Spectrum X Ethernet switch. And look at
(01:35:22)
this giant chip. But what's really
(01:35:24)
amazing, it's got silicon photonics
(01:35:26)
directly connected to it. And lasers
(01:35:29)
come in
(01:35:33)
Lasers come in through here. Lasers come
(01:35:35)
in through here. The optics are here and
(01:35:38)
they connect out to the rest of the data
(01:35:41)
data center. This I'll show you in a
(01:35:42)
second, but this is on top of the rack.
(01:35:44)
And this is the new Spectrumax
(01:35:47)
um
(01:35:49)
Silicon Photonix switch. Okay.
(01:35:54)
And we have something new I want to tell
(01:35:56)
you about. So just as I mentioned a
(01:35:58)
couple years ago,
(01:36:00)
we introduced Spectrum X so that we
(01:36:03)
could reinvent the way that networking
(01:36:05)
is done. Um Ethernet is really easy to
(01:36:08)
manage and everybody has an Ethernet
(01:36:09)
stack and every data center in the world
(01:36:11)
knows how to deal with Ethernet. Um and
(01:36:13)
the only thing that we were we were
(01:36:15)
using at the time was called Infiniband
(01:36:17)
which is used for supercomputers.
(01:36:19)
Infiniband is very low latency. Um but
(01:36:23)
of course the software stack the entire
(01:36:25)
manageability of Infiniband is very
(01:36:27)
alien to the people who use Ethernet. So
(01:36:29)
we decided to enter the Ethernet switch
(01:36:31)
market for the very first time. Spectrum
(01:36:33)
X that just took off and it made us the
(01:36:37)
largest networking company in the world
(01:36:38)
as I mentioned. This next generation
(01:36:41)
Spectrum X is going to carry on that
(01:36:42)
tradition. But just as I said earlier AI
(01:36:46)
has reinvented the whole computing
(01:36:48)
stack, every layer of the computing
(01:36:49)
stack. It stands to reason that when AI
(01:36:53)
starts to get deployed in the world's
(01:36:55)
enterprises, it's going to also reinvent
(01:36:57)
the way storage is done. Well, AI
(01:36:59)
doesn't use SQL. AI use semantics
(01:37:02)
information. And when AI is being used,
(01:37:05)
it creates this temporary knowledge,
(01:37:08)
temporary temporary memory calls KV
(01:37:10)
cache, K key value combinations, but
(01:37:14)
it's a KV cache. Basically, the cache of
(01:37:16)
the AI, the working memory of the AI.
(01:37:18)
And the working memory of the AI is
(01:37:20)
stored in the HBM memory. Every single
(01:37:24)
token for every single token,
(01:37:27)
the H the GPU reads in the model, the
(01:37:32)
entire model, it reads in the entire
(01:37:34)
working memory and it produces one token
(01:37:38)
and it stores that one token back into
(01:37:40)
the KV cache. And then the next to the
(01:37:43)
next time it does that, it reads in the
(01:37:45)
entire memory, reads it and it streams
(01:37:48)
it through our GPU and then generates
(01:37:50)
another token. Well, it does this
(01:37:52)
repeatedly, token after token after
(01:37:54)
token. And obviously, if you have a long
(01:37:56)
conversation with that AI over time,
(01:37:58)
that memory, that context memory is
(01:38:00)
going to grow tremendously. Not to
(01:38:02)
mention, the models are growing, the
(01:38:03)
number of turns that we're using, the AI
(01:38:06)
are are increasing. We would like to
(01:38:07)
have this AI stay with us our entire
(01:38:09)
life and remember every single
(01:38:11)
conversation we've ever had with it,
(01:38:13)
right? Every single lick of research
(01:38:14)
that I've asked it for. Of course, we
(01:38:16)
the number of people that will be
(01:38:18)
sharing the supercomputers is going to
(01:38:19)
continue to grow. And so this context
(01:38:22)
memory which started out fitting inside
(01:38:24)
an HBM is no longer large enough. Last
(01:38:27)
year we created Grace Blackwell's
(01:38:32)
very fast memory. we called fast context
(01:38:35)
memory in that's the reason why we
(01:38:37)
connected grace directly to hopper
(01:38:40)
that's why we connected grace directly
(01:38:42)
to blackwell so that we can expand the
(01:38:44)
context memory but even that is not
(01:38:46)
enough and so the next solution of
(01:38:48)
course is to go off onto the network the
(01:38:51)
north south network off to the storage
(01:38:54)
of the company but if you have a whole
(01:38:57)
lot of AI running at the same time that
(01:39:00)
network is no longer going to be fast
(01:39:02)
enough so the answer is very clearly to
(01:39:04)
do it different. And so we intro we
(01:39:06)
created Bluefield 4 so that we could
(01:39:09)
essentially have a very fast KV cache
(01:39:13)
context memory store right in the rack.
(01:39:17)
And so I'll show you in just one second,
(01:39:20)
but there's a whole new category of
(01:39:22)
storage systems. And the industry is so
(01:39:25)
excited because this is a pain point for
(01:39:27)
just about everybody who does a lot of
(01:39:29)
token generation today. the AI labs, the
(01:39:31)
cloud service providers, they're really
(01:39:34)
suffering from the amount of network
(01:39:36)
traffic that's causing being caused by
(01:39:38)
KV cache moving around. And so the idea
(01:39:41)
that we would create a new platform, a
(01:39:43)
new processor to run the entire Dynamo
(01:39:47)
KV cache context memory management
(01:39:50)
system and to put it very close to the
(01:39:53)
rest of the rack is completely
(01:39:54)
revolutionary. So this is it. This is it
(01:39:58)
sits right here.
(01:40:00)
So this this is all the compute nodes.
(01:40:04)
Each one of these is MVLink 72. So this
(01:40:07)
is Vera Rubin MVLink 72.4
(01:40:12)
U Reuben GPUs. This is the context
(01:40:16)
memory that's stored here. Behind each
(01:40:18)
one of these are four blue fields.
(01:40:20)
Behind each blue field is 150 gigab 150
(01:40:24)
terabytes
(01:40:25)
150 terabytes of memory context memory.
(01:40:29)
And for each GPU once you allocate it
(01:40:32)
across each GPU will get an additional
(01:40:35)
16 terabytes. Now inside this node each
(01:40:40)
GPU essentially has one terabyte. And
(01:40:43)
now with this backing store here
(01:40:47)
directly on the same east west traffic
(01:40:49)
at exactly the same data rate 200
(01:40:51)
gigabits per second across literally the
(01:40:55)
entire fabric of this compute node.
(01:40:58)
you're going to get an additional 16
(01:41:00)
terabytes of memory. Okay. And this is
(01:41:02)
the management plane. These are these
(01:41:06)
are the spectrum X
(01:41:11)
switches that connects all of them
(01:41:13)
together.
(01:41:15)
And over here, these switches at the end
(01:41:20)
connects them to the rest of the data
(01:41:21)
center. Okay? And so this is the Vera
(01:41:25)
Rubin. Now there's several things that's
(01:41:27)
really incredible about it. So the first
(01:41:29)
thing that I mentioned is that the this
(01:41:33)
entire system is twice the energy
(01:41:37)
efficiency essentially the the twice the
(01:41:40)
the the temperature performance in the
(01:41:42)
sense that that even though the power is
(01:41:45)
twice as high the amount of energy use
(01:41:47)
is twice as high the amount of
(01:41:49)
computation is many times higher than
(01:41:50)
that but the liquid that goes into it
(01:41:54)
still 45 degrees C that enables us to
(01:41:57)
save about 6% % of the world's data
(01:41:59)
center power. So that's a very big deal.
(01:42:02)
The second very big deal
(01:42:05)
is that this entire system is now
(01:42:07)
confidential computing safe. Meaning
(01:42:09)
everything is encoded in transit at rest
(01:42:12)
and during compute and every single bus
(01:42:15)
is now encrypted. every PCI Express,
(01:42:19)
every MV link, every H you know for a MV
(01:42:22)
link between CPU me and GPU between GPU
(01:42:25)
to GPU, everything is now encrypted and
(01:42:29)
so it's confidential computing safe.
(01:42:31)
This allows companies to feel safe that
(01:42:36)
their models are being deployed by
(01:42:37)
somebody else, but it will never be seen
(01:42:39)
by anybody else. Okay? And so this
(01:42:42)
particular system is not only incredibly
(01:42:45)
energy efficient and there's one other
(01:42:47)
thing that's incredible
(01:42:49)
because of the nature of the workload of
(01:42:52)
AI it spikes instantaneously
(01:42:55)
with this computation layer called all
(01:42:57)
reduce the amount of current the amount
(01:43:00)
of energy that is used sp simultaneously
(01:43:04)
is really off the charts oftentimes
(01:43:06)
it'll spike up 25%. We now have power
(01:43:09)
smoothing across the entire system so
(01:43:12)
that you don't have to overprovision by
(01:43:14)
25 times or if you overprovision by 25
(01:43:18)
times you don't have to leave 25 times
(01:43:21)
25% 25% not 25 times 25% of the energy
(01:43:26)
um squandered or unused and so now you
(01:43:30)
could fill up the entire power budget
(01:43:32)
and you don't have to over you don't
(01:43:34)
have to proceed you don't have to
(01:43:35)
provision beyond that and then the Last
(01:43:37)
thing of course is performance. So let's
(01:43:40)
take a look at the performance of this.
(01:43:42)
These are only charts that people who
(01:43:44)
build AI super supercomputers would
(01:43:46)
love. It took exact it took every single
(01:43:49)
one of these chips complete redesign of
(01:43:51)
every single one of the systems and
(01:43:52)
rewriting the entire stack for us to
(01:43:54)
make this possible. Basically
(01:43:58)
this is training the AI model. This
(01:44:00)
first column, the faster you train AI
(01:44:03)
models, the faster you can get the next
(01:44:05)
frontier out to the world. This is your
(01:44:07)
time to market. This is technology
(01:44:09)
leadership. This is your pricing power.
(01:44:12)
And so in the case of the green, this is
(01:44:15)
essentially
(01:44:18)
a 10 trillion parameter model. We scaled
(01:44:22)
it up from deepse. That's why we call it
(01:44:24)
deep C++. A training a 10 trillion
(01:44:26)
parameter model on a 100 trillion
(01:44:30)
tokens. Okay. And that's this is our
(01:44:33)
simulation projection of what it would
(01:44:35)
take for us to build the next frontier
(01:44:37)
model. The next frontier model uh Elon's
(01:44:39)
already mentioned that the next version
(01:44:41)
of Grock Grock 5 I think is 7 trillion
(01:44:43)
parameters. This is 10 and in the green
(01:44:46)
is black well and here in the case of um
(01:44:51)
Reuben notice the throughput is so much
(01:44:54)
higher and therefore it only takes 1/4th
(01:44:58)
as many of these systems in order to
(01:45:00)
train the model in the time that we gave
(01:45:04)
it here which is one month. Okay. And so
(01:45:08)
time time is the same for everybody. Now
(01:45:10)
how much how fast you can train that
(01:45:12)
model and how large of a model you can
(01:45:13)
train is how you're going to get to the
(01:45:15)
frontier first. The second part is your
(01:45:17)
factory throughput.
(01:45:20)
Blackwell is green again. And factory
(01:45:22)
throughput is important because your
(01:45:24)
factory is in the case of a gigawatt is
(01:45:27)
$50 billion. A $50 billion data center
(01:45:32)
can only consume one gawatt of power.
(01:45:35)
And so if your performance, your
(01:45:39)
throughput per watt is very good versus
(01:45:43)
quite poor, that directly translates to
(01:45:46)
your revenues. Your revenues of your
(01:45:49)
data center is directly related to the
(01:45:51)
second second column. And in the case of
(01:45:55)
Blackwell, it was about 10 times over
(01:45:57)
Hopper. In the case of Reuben, it's
(01:45:59)
going to be about 10 times higher again.
(01:46:01)
Okay? And in the case of now the um the
(01:46:06)
cost of the tokens, how cost effectively
(01:46:09)
it is to generate the token. This is
(01:46:12)
Reuben about onetenth just as in the
(01:46:14)
case of Yep. [clears throat]
(01:46:20)
>> [applause]
(01:46:22)
>> So that's how this is how we're going to
(01:46:23)
get everybody to the next frontier
(01:46:26)
to um push AI to the next level and of
(01:46:30)
course to build these data centers
(01:46:32)
energy efficiently and costefficiently.
(01:46:36)
So this is it. This is Nvidia today. You
(01:46:40)
know, we mentioned that we build chips,
(01:46:43)
but as you know, Nvidia builds entire
(01:46:45)
systems now. And AI is a full stack. We
(01:46:50)
we're reinventing AI across everything
(01:46:52)
from chips to infrastructure to models
(01:46:55)
to applications. And our job is to
(01:46:58)
create the entire stack so that all of
(01:47:00)
you could create incredible applications
(01:47:03)
uh for the rest of the world. Thank you
(01:47:05)
all for coming. Have a great CES.
(01:47:08)
Now, Before [applause and cheering]
(01:47:10)
Before I Before I let you guys go, uh
(01:47:13)
there were a whole bunch of slides we
(01:47:14)
have to cut we had to leave on the
(01:47:16)
cutting floor and so we have some out
(01:47:18)
takes here. I think it'll be fun for
(01:47:19)
you. Have a great see us guys
(01:47:26)
and cut.
(01:47:31)
Nvidia live at CES. Take four. Marker
(01:47:35)
>> boom mic
(01:47:37)
action.
(01:47:40)
>> Sorry guys. Platform shift, huh?
(01:47:47)
>> That should do it.
(01:47:49)
>> And let's [music] roll camera.
(01:47:53)
>> A shade of green. A bright happy green.
(01:47:58)
>> World's most powerful AI supercomput you
(01:48:01)
can plug into the wall. next to my
(01:48:04)
toaster.
(01:48:07)
>> Hey guys, I'm I'm stuck again. I'm so
(01:48:09)
sorry.
(01:48:10)
>> This slide is never going to work. Let's
(01:48:11)
just cut it.
(01:48:12)
>> Hello. Can you hear me?
(01:48:17)
>> So, like [music] I was saying, the
(01:48:19)
router. Because not every problem needs
(01:48:21)
the biggest, smartest model. Just the
(01:48:24)
right one.
(01:48:26)
>> No, no, don't lose any of them. This new
(01:48:30)
six chip Reuben [music] platform makes
(01:48:32)
one amazing AI supercomputer.
(01:48:36)
>> There you go, little guy.
(01:48:38)
>> Oh no, no, not the scaling laws.
(01:48:41)
>> There is a squirrel on the car. Be ready
(01:48:43)
to make the squirrel [music] go away.
(01:48:45)
Ask the squirrel gently to move away.
(01:48:48)
>> Did you know the best models today are
(01:48:50)
all mixture of experts?
(01:48:55)
>> Hey
(01:48:57)
>> [music]
(01:49:06)
>> Where'd everybody go?
(01:49:50)
Hey,
(01:49:53)
hey,
(01:49:56)
hey. [music]
(01:50:03)
Hey.
(01:50:20)
Hey. Hey.
(01:50:33)
Hey.
(01:50:39)
Hey. Hey.
