↔
Title: The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei Li
Duration: 01:19:34
Total Correct Answers:
Current Caption
Correct
Learning Modes
YouTube Video Transcript Hide
Ask AI:
Export as:
Ask AI Result
The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here
(00:00:00)
A lot of people call you the godmother
(00:00:01)
of AI. The work you did actually was the
(00:00:04)
spark that brought us out of AI winter
(00:00:06)
>> in the middle of 2015, middle of 2016.
(00:00:09)
Some tech companies avoid using the word
(00:00:12)
AI because they were not sure if AI was
(00:00:15)
a dirty word. 2017ish
(00:00:19)
was the beginning of companies calling
(00:00:21)
themselves AI companies.
(00:00:23)
>> There's this line, I think this was when
(00:00:24)
you were presenting to Congress, there's
(00:00:25)
nothing artificial about AI. It's
(00:00:27)
inspired by people. It's created by
(00:00:28)
people. And most importantly, it impacts
(00:00:30)
people.
(00:00:31)
>> It's not like I think AI will have no
(00:00:33)
impact on jobs or people. In fact, I
(00:00:36)
believe that whatever AI does currently
(00:00:39)
or in the future is up to us. It's up to
(00:00:42)
the people. I do believe technology is a
(00:00:45)
net positive for humanity. But I think
(00:00:48)
every technology is a double-edged
(00:00:50)
sword. If we're not doing the right
(00:00:52)
thing as a society, as individuals, we
(00:00:55)
can screw this up as well. you had this
(00:00:57)
breakthrough insight of just okay we can
(00:00:59)
train machines to think like humans but
(00:01:00)
it's just missing the data that humans
(00:01:02)
have to learn as a child
(00:01:03)
>> I chose to look at artificial
(00:01:05)
intelligence through the lens of visual
(00:01:07)
intelligence because humans are deeply
(00:01:10)
visual animals we need to train machines
(00:01:13)
with as much information as possible on
(00:01:15)
images of objects but objects are very
(00:01:19)
very difficult to learn a single object
(00:01:22)
can have infinite possibilities that is
(00:01:24)
shown on an image in order To train
(00:01:27)
computers with tens and thousands of
(00:01:30)
object concepts, you really need to show
(00:01:32)
it millions of examples.
(00:01:36)
Today, my guest is Dr. Feay Lee, who's
(00:01:39)
known as the godmother of AI. Feet has
(00:01:42)
been responsible for and at the center
(00:01:44)
of many of the biggest breakthroughs
(00:01:45)
that sparked the AI revolution that we
(00:01:47)
are currently living through. She
(00:01:49)
spearheaded the creation of ImageNet,
(00:01:51)
which was basically her realizing that
(00:01:53)
AI needed a ton of clean labelled data
(00:01:55)
to get smarter. And that data set became
(00:01:58)
the breakthrough that led to the current
(00:01:59)
approach to building and scaling AI
(00:02:01)
models. She was chief AI scientist at
(00:02:04)
Google Cloud, which is where some of the
(00:02:05)
biggest early technology breakthroughs
(00:02:07)
emerged from. She was director at Sale,
(00:02:09)
Stanford's artificial intelligence lab,
(00:02:11)
where many of the biggest AI minds came
(00:02:13)
out of. She's also co-creator of
(00:02:15)
Stanford's human- centered AI institute,
(00:02:17)
which is playing a vital role in the
(00:02:19)
direction that AI is taking. She's also
(00:02:21)
been on the board of Twitter. She was
(00:02:22)
named one of Time's 100 most influential
(00:02:25)
people in AI. She's also on the United
(00:02:28)
Nations Advisory Board. I could go on.
(00:02:30)
In our conversation, Fay shares a brief
(00:02:32)
history of how we got to today in the
(00:02:34)
world of AI, including this mind-blowing
(00:02:37)
reminder that 9 to 10 years ago, calling
(00:02:39)
yourself an AI company was basically a
(00:02:41)
death nail for your brand. because no
(00:02:44)
one believed that AI was actually going
(00:02:45)
to work. Today, it's completely
(00:02:47)
different. Every company is an AI
(00:02:49)
company. We also chat about her take on
(00:02:51)
how she sees AI impacting humanity in
(00:02:54)
the future, how far current technologies
(00:02:56)
will take us, why she's so passionate
(00:02:58)
about building a world model, and what
(00:03:00)
exactly world models are. And most
(00:03:02)
exciting of all, the launch of the
(00:03:04)
world's first large world model, Marble,
(00:03:06)
which just came out as this podcast
(00:03:08)
comes out. Anyone can go play with this
(00:03:10)
at marble.worldlabs.ai.
(00:03:12)
It's insane. Definitely check it out.
(00:03:14)
Fei is incredible and way too under the
(00:03:17)
radar for the impact that she's had on
(00:03:18)
the world. So, I am really excited to
(00:03:20)
have her on and to spread her wisdom
(00:03:22)
with more people. A huge thank you to
(00:03:24)
Ben Harowitz and Condisa Rice for
(00:03:26)
suggesting topics for this conversation.
(00:03:28)
If you enjoy this podcast, don't forget
(00:03:29)
to subscribe and follow it in your
(00:03:30)
favorite podcasting app or YouTube. With
(00:03:32)
that, I bring you Dr. Fay Lee after a
(00:03:35)
short word from our sponsors. This
(00:03:38)
episode is brought to you by Figma,
(00:03:39)
makers of Figma make. When I was a PM at
(00:03:42)
Airbnb, I still remember when Figma came
(00:03:44)
out and how much it improved how we
(00:03:46)
operated as a team. Suddenly, I could
(00:03:48)
involve my whole team in the design
(00:03:50)
process, give feedback on design
(00:03:52)
concepts really quickly, and it just
(00:03:54)
made the whole product development
(00:03:55)
process so much more fun. But Figma
(00:03:57)
never felt like it was for me. It was
(00:03:59)
great for giving feedback and designs,
(00:04:01)
but as a builder, I wanted to make
(00:04:03)
stuff. That's why Figma built Figma
(00:04:05)
Make. With just a few prompts, you can
(00:04:08)
make any idea or design into a fully
(00:04:11)
functional prototype or app that anyone
(00:04:13)
can iterate on and validate with
(00:04:14)
customers. Figma make is a different
(00:04:16)
kind of vibe coding tool. Because it's
(00:04:18)
all in Figma, you can use your team's
(00:04:20)
existing design building blocks, making
(00:04:22)
it easy to create outputs that look good
(00:04:25)
and feel real and are connected to how
(00:04:27)
your team builds. Stop spending so much
(00:04:29)
time telling people about your product
(00:04:31)
vision and instead show it to them. Make
(00:04:34)
codeback prototypes and apps fast with
(00:04:36)
Figma Makeake. Check it out at
(00:04:38)
figma.com/lenny.
(00:04:40)
Did you know that I have a whole team
(00:04:42)
that helps me with my podcast and with
(00:04:44)
my newsletter? I want everyone on that
(00:04:46)
team to be super happy and thrive in
(00:04:48)
their roles. Just Works knows that your
(00:04:50)
employees are more than just your
(00:04:51)
employees. They're your people. My team
(00:04:53)
is spread out across Colorado,
(00:04:55)
Australia, Nepal, West Africa, and San
(00:04:58)
Francisco. My life would be so
(00:05:00)
incredibly complicated to hire people
(00:05:02)
internationally, to pay people on time
(00:05:03)
and in their local currencies, and to
(00:05:05)
answer their HR questions 24/7. But with
(00:05:08)
Just Works, it's super easy. Whether
(00:05:10)
you're setting up your own automated
(00:05:12)
payroll, offering premium benefits, or
(00:05:14)
hiring internationally, JustWorks offers
(00:05:16)
simple software and 24/7 human support
(00:05:19)
from small business experts for you and
(00:05:21)
your people. They do your human
(00:05:23)
resources right so that you can do right
(00:05:25)
by your people. just works for your
(00:05:27)
people.
(00:05:30)
[Music]
(00:05:31)
Fay Fay, thank you so much for being
(00:05:33)
here and welcome to the podcast.
(00:05:35)
>> I'm excited to be here, Lenny.
(00:05:36)
>> I'm even more excited to have you here.
(00:05:39)
It is such a treat to get to chat with
(00:05:40)
you. There's so much that I want to talk
(00:05:42)
about. You've been at the center of this
(00:05:44)
AI explosion that we're seeing right now
(00:05:46)
for so long. We're going to talk about a
(00:05:48)
bunch of the history that I think a lot
(00:05:50)
of people don't even know about how this
(00:05:52)
whole thing started. But let me first
(00:05:54)
read a quote from Wyatt about you just
(00:05:55)
so people get a sense and in the intro
(00:05:57)
I'll share all of the other epic things
(00:05:58)
you've done but I think this is a good
(00:06:00)
way to just set context. Fay is one of a
(00:06:02)
tiny group of scientists a group perhaps
(00:06:04)
small enough to fit around a kitchen
(00:06:06)
table who are responsible for AI's
(00:06:08)
recent remarkable advances. A lot of
(00:06:11)
people call you the godmother of AI. And
(00:06:14)
unlike a lot of AI leaders, you're an AI
(00:06:17)
optimist. You don't think AI is going to
(00:06:19)
replace us. You don't think it's going
(00:06:21)
to take all our jobs. you don't think
(00:06:22)
it's going to kill us. So, I thought
(00:06:23)
it'd be fun to start there. Just what's
(00:06:25)
your perspective on how AI is going to
(00:06:28)
impact humanity over time?
(00:06:30)
>> Yeah. Okay. So, Lenny, let me be very
(00:06:32)
clear. I'm not a utopian. So, it's not
(00:06:36)
like I think AI will have no impact on
(00:06:38)
jobs or people. In fact, I'm a humanist.
(00:06:42)
I believe that whatever AI does in
(00:06:47)
currently or in the future is up to us.
(00:06:49)
It's up to the people. So I do believe
(00:06:52)
technology is a net positive for
(00:06:55)
humanity. If you look at the long course
(00:06:57)
of civilization, I think we are an
(00:07:01)
fundamentally we're an innovative
(00:07:03)
species that we you know if you look at
(00:07:07)
from you know written record thousands
(00:07:11)
of years ago um to to now humans just
(00:07:14)
kept innovating ourselves and innovating
(00:07:17)
our tools and with that we make lives
(00:07:20)
better. we make work better, we build
(00:07:22)
civilization, and I do believe AI is
(00:07:25)
part of that. So, that's where the
(00:07:27)
optimism comes from. But I think every
(00:07:31)
technology is uh is um a double-edged
(00:07:34)
sword. And uh if we're not doing the
(00:07:37)
right thing as a species, as a society,
(00:07:42)
as communities, as individuals, we can
(00:07:45)
screw this up as well. H there's this
(00:07:48)
line I think this was when you were
(00:07:49)
presenting to Congress. There's nothing
(00:07:51)
artificial about AI. It's inspired by
(00:07:53)
people. It's created by people and most
(00:07:54)
importantly it impacts people. Uh I
(00:07:57)
don't have a question there but what a
(00:07:58)
what a great line.
(00:07:59)
>> Yeah I I I feel pretty deeply. I you
(00:08:02)
know I started um working in AI two and
(00:08:06)
a half decades ago and I've been having
(00:08:09)
students for the past two decades and
(00:08:11)
almost every student who graduates I
(00:08:14)
remind them you know when they graduates
(00:08:17)
from my lab that your field is called
(00:08:20)
artificial intelligence but there's
(00:08:22)
nothing artificial about it.
(00:08:23)
>> Coming back to the point you just made
(00:08:24)
about how it's kind of up to us about
(00:08:26)
where this all goes. What is it you
(00:08:28)
think we need to get right? How how do
(00:08:29)
we set things on a path? I know this is
(00:08:31)
a a very difficult question to answer
(00:08:33)
but just what should what what's your
(00:08:35)
advice? What do you think we should
(00:08:36)
>> Yeah.
(00:08:37)
>> How many hours do we have?
(00:08:39)
>> How do we align AI? There we go. Let's
(00:08:41)
solve it.
(00:08:41)
>> Also, I think people should be
(00:08:44)
responsible individuals no matter what
(00:08:47)
we do. This is what we teach our
(00:08:49)
children and this is what we need to do
(00:08:51)
as grown-ups as well. No matter which
(00:08:55)
part of the AI development or AI
(00:08:59)
deployment or or AI application you are
(00:09:02)
participating in and most likely many of
(00:09:05)
us especially as technologists were were
(00:09:08)
in multiple points we should act like
(00:09:11)
responsible individuals and uh and care
(00:09:14)
about this actually care a lot about
(00:09:16)
this. I think everybody today should
(00:09:18)
care about AI because it is going to
(00:09:21)
impact your individual life. It is going
(00:09:24)
to impact your community. It's going to
(00:09:26)
impact the the society and the future
(00:09:29)
generation. And caring about it as a
(00:09:32)
responsible person is the first but also
(00:09:36)
the most important step.
(00:09:37)
>> Okay. So, let me let me actually take a
(00:09:39)
step back and kind of go to the
(00:09:41)
beginning of AI. Most people started
(00:09:44)
hearing and caring about AI is what it's
(00:09:47)
called today. Just like I don't know a
(00:09:48)
few years ago when JGBT came out. Maybe
(00:09:50)
it was like three years ago.
(00:09:51)
>> Three years ago. Almost one more month.
(00:09:54)
Three years ago.
(00:09:55)
>> Wow. Okay. That was JT GBT coming out.
(00:09:57)
Is that the milestone that you have in
(00:09:58)
mind? Okay. Cool. That's exactly how I
(00:10:00)
saw it. But very few people know there
(00:10:01)
was a long long history of people
(00:10:03)
working on it was called machine
(00:10:05)
learning back then and there's other
(00:10:06)
terms and now it's just everything's AI
(00:10:08)
and there was kind of like a long period
(00:10:10)
of just a lot of people working on it
(00:10:11)
and then there's this what people refer
(00:10:12)
to as the AI winter where people just
(00:10:14)
gave up almost people did and just okay
(00:10:17)
this this idea isn't going anywhere and
(00:10:19)
then the work you did actually was
(00:10:21)
essentially the spark that brought us
(00:10:23)
out of AI winter and is directly
(00:10:25)
responsible for the world we're in now
(00:10:27)
of just AI is all we talk about as you
(00:10:28)
just said it's going to impact
(00:10:30)
everything we do. So, I thought it'd be
(00:10:32)
really interesting to hear from you just
(00:10:33)
kind of like the brief history of what
(00:10:36)
the world was like before imageet, then
(00:10:38)
just the work you did to create
(00:10:41)
ImageNet, why that was so important, and
(00:10:42)
then just what happened after.
(00:10:44)
>> It is for me hard to keep in mind that
(00:10:48)
AI is so new for everybody. when I lived
(00:10:52)
my entire professional life in AI, it's
(00:10:56)
there's a part of me that is just it's
(00:10:59)
so satisfying to see a personal
(00:11:02)
curiosity that I started barely out of
(00:11:05)
teenagehood and and now has become a
(00:11:09)
transformative
(00:11:11)
force of our civilization. It generally
(00:11:14)
is a civilizational level uh technology.
(00:11:17)
So, so that journey is about about 30
(00:11:21)
years or 20 something 20 plus years and
(00:11:25)
uh it's it's just very satisfying. So,
(00:11:28)
where did it all start? Well, I'm not
(00:11:30)
even the first generation AI researcher.
(00:11:33)
The first generation really date back to
(00:11:36)
the 50s and 60s. And you know Alan
(00:11:39)
Touring was ahead of his time by in the
(00:11:42)
40s by asking daring humanity with the
(00:11:45)
question can we is there thinking
(00:11:47)
machines right and of course he has a
(00:11:50)
specific way of uh testing this concept
(00:11:54)
of thinking machine which is a
(00:11:56)
conversational chatbot which to his
(00:11:59)
standard we now have a thinking machine
(00:12:02)
but uh that was just a more anecdotal
(00:12:07)
inspir inspiration. The field really
(00:12:09)
began in the 50s um when computer
(00:12:12)
scientists came together and look at how
(00:12:15)
we can use computer programs and
(00:12:18)
algorithms to uh to build these programs
(00:12:23)
that can do things that have been only
(00:12:27)
capable by human cognition. So um and
(00:12:31)
and that was the beginning and the
(00:12:32)
founding fathers the Dartmouth workshop
(00:12:35)
in the 1956 uh you know we have
(00:12:38)
professor John McCarthy who later came
(00:12:40)
to uh Stanford who coined the term
(00:12:43)
artificial intelligence
(00:12:45)
and between the 50s60s 70s and 80s it
(00:12:50)
was the early days of AI exploration and
(00:12:54)
we had logic systems we had uh expert
(00:12:58)
systems We also had early exploration of
(00:13:01)
neuronet network and then it came to
(00:13:05)
around the late 80s, the 90s and the the
(00:13:10)
very beginning of the 21st century. That
(00:13:13)
stretch about 20 years is actually the
(00:13:16)
beginning of machine learning. It's the
(00:13:18)
marriage between computer programming
(00:13:20)
and statistical as uh learning. And that
(00:13:25)
marriage brought a very very critical
(00:13:28)
concept into AI which is that
(00:13:33)
purely rulebased
(00:13:35)
um uh program is not going to account
(00:13:39)
for the vast amount of cognitive
(00:13:43)
capabilities that we imagine computers
(00:13:45)
can do. So we have to use machines to
(00:13:49)
learn the patterns. Once the machines
(00:13:52)
can learn the patterns, it has a hope to
(00:13:55)
do more things. For example, if you give
(00:13:58)
it three cats, the hope is not just for
(00:14:01)
the machines to recognize these three
(00:14:03)
cats. The hope is the machines can
(00:14:06)
recognize the fourth cat, the fifth cat,
(00:14:08)
the sixth cat, and all the other cats.
(00:14:10)
And that's a learning ability that is
(00:14:13)
fundamental to humans and many animals.
(00:14:17)
and uh we we as a field realized we need
(00:14:20)
machine learning. So that was up till
(00:14:23)
the beginning of the 21st century. I
(00:14:27)
entered the field of AI literally in the
(00:14:29)
year of 2000. That's when my uh PhD
(00:14:32)
began at Caltech. And so I was one of
(00:14:35)
the first generation machine learning
(00:14:37)
researchers and we were already studying
(00:14:40)
this concept of machine learning
(00:14:42)
especially neuronet network. I remember
(00:14:44)
that was one of my first courses in at
(00:14:47)
Caltech is called neuro network but it
(00:14:50)
was very painful. It was still smack in
(00:14:52)
the middle of the so-called AI winter
(00:14:54)
meaning the public didn't look at this
(00:14:57)
too much. there wasn't that much funding
(00:14:59)
but there was also a lot of ideas
(00:15:02)
flowing around and I think two things
(00:15:06)
happened to myself that brought my own
(00:15:08)
career so close to the birth of modern
(00:15:11)
AI is that um I chose to look at
(00:15:16)
artificial intelligence through the lens
(00:15:18)
of visual intelligence because uh humans
(00:15:22)
are deeply visual animals. We can talk a
(00:15:25)
little more later, but so much of our
(00:15:28)
intelligence is built upon visual,
(00:15:32)
perceptual, spatial understanding, not
(00:15:34)
just language per se. I think they're
(00:15:36)
complimentary. So I chose to look at
(00:15:38)
visual intelligence and um my PhD and my
(00:15:41)
early uh professor years I um my
(00:15:46)
students and I are very committed to a
(00:15:48)
northstar problem which is solving the
(00:15:50)
problem of object recognition because
(00:15:53)
it's a building block for the perceptual
(00:15:56)
world. Right? We go around the world
(00:15:58)
interpreting, reasoning and interacting
(00:16:00)
with it more or less at the object
(00:16:03)
level. We don't interact with the world
(00:16:05)
at the molecular level. We don't
(00:16:08)
interact with the world as
(00:16:10)
um we sometimes do but we rarely for
(00:16:13)
example if you want to lift a teapot you
(00:16:15)
don't say okay the teapot is made of a
(00:16:18)
100 pieces of porcelain and let me work
(00:16:21)
on this 100 pieces you look at this as
(00:16:23)
one object and and interact with it. So
(00:16:26)
object is really important. So um I was
(00:16:30)
among the first uh uh researchers to
(00:16:33)
identify this as a northstar problem.
(00:16:36)
But I think what happened is that
(00:16:39)
as a student of AI and then a researcher
(00:16:42)
of AI, I was working on all kinds of
(00:16:46)
mathematical models including neuronet
(00:16:48)
network including Beijian network
(00:16:51)
including many many models and there was
(00:16:54)
one singular pain point is that these
(00:16:57)
models don't have data to be trained on
(00:17:00)
and uh as a field we were so focusing on
(00:17:04)
these models but It dawned on me that
(00:17:07)
human learning
(00:17:10)
as well as evolution is actually a big
(00:17:14)
data learning process. Humans learn with
(00:17:16)
so much experience you know constantly
(00:17:19)
and evolution if you look at time
(00:17:22)
animals evolve with just experiencing
(00:17:24)
the world. So I think my students and
(00:17:27)
and I conjectured
(00:17:30)
that a very critically overlooked
(00:17:33)
ingredient of bringing AI to life is big
(00:17:37)
data and then we began this image that
(00:17:40)
project in 2006 2007 we were very
(00:17:43)
ambitious we want to get the entire
(00:17:46)
internet's image data on objects now
(00:17:50)
granted internet was a lot smaller than
(00:17:52)
today so we I felt like that ambition
(00:17:55)
was at least not too crazy. Now it's
(00:17:58)
totally delusional to uh to think a
(00:18:02)
couple of graduate student and a
(00:18:04)
professor can do this. But uh and that's
(00:18:07)
what we did. We curated very carefully
(00:18:10)
15 million images on the internet.
(00:18:13)
Created a taxonomy of 22,000
(00:18:17)
concepts borrowing other researchers
(00:18:20)
work like a linguist work on wordnet and
(00:18:24)
it's a particular way of dictionarying
(00:18:27)
uh words and we combine that into image
(00:18:31)
that and we open source that to the
(00:18:34)
research community. We held an annual
(00:18:37)
image net challenge to encourage
(00:18:40)
everybody to participate in this. We
(00:18:42)
continue to do our own research. But
(00:18:45)
2012 was the moment that many people
(00:18:48)
think was the beginning of the deep
(00:18:50)
learning or birth of modern AI because a
(00:18:53)
group of Toronto researchers led by
(00:18:55)
professor Jeff Hinton
(00:18:58)
participated in imageet challenge used
(00:19:00)
the imageet big data and two GPUs from
(00:19:04)
Nvidia and created successfully the
(00:19:07)
first neuronet network algorithm that's
(00:19:10)
can it didn't fundamental it didn't
(00:19:14)
totally solved but made a huge progress
(00:19:17)
towards solving the problem of object
(00:19:19)
recognition and that combination of the
(00:19:22)
trio technology
(00:19:25)
uh big data neuronet network and GPU was
(00:19:29)
kind of the golden recipe for modern AI
(00:19:33)
and then fast forward the the the public
(00:19:36)
moment of AI which is the chat GPT
(00:19:41)
moment if you look at the ingredients of
(00:19:45)
what brought Chad GPT to to the to the
(00:19:48)
uh world technically still use these
(00:19:52)
three ingredients. Now it's internet
(00:19:55)
scale data mostly texts is a much more
(00:19:59)
com complex neuronet network um
(00:20:02)
architecture than 2012 but it's still
(00:20:05)
neuronet network and a lot more GPUs but
(00:20:08)
it's still GPUs. So these three
(00:20:11)
ingredients are still to at the core of
(00:20:14)
modern AI.
(00:20:16)
>> Incredible. I have never heard that full
(00:20:19)
story before. I love that it was two
(00:20:21)
GPUs was the f I love
(00:20:26)
and now it's I don't know hundreds of
(00:20:27)
thousands right that are orders of
(00:20:29)
magnitudes more powerful uh and those
(00:20:31)
two GPUs were they just bought they were
(00:20:33)
like gaming GPUs they just went to like
(00:20:35)
the game store right that people use for
(00:20:36)
playing games
(00:20:37)
>> as you said this continues to be in a
(00:20:40)
large way the way models get smarter
(00:20:42)
some of the fastest growing companies in
(00:20:43)
the world right now I've had them all
(00:20:44)
mostly on the podcast Merkore and Surge
(00:20:46)
and Scale like they do this they
(00:20:48)
continue to do this for labs just give
(00:20:50)
them more and more label data of the
(00:20:52)
things they're most excited about.
(00:20:53)
>> Yeah, I remember um Alex Wong from scale
(00:20:57)
very early days. I probably still has
(00:20:58)
his emails when he was starting scale.
(00:21:01)
He uh he was very kind. He keeps sending
(00:21:03)
me emails about how image that inspired
(00:21:07)
scale. I was very pleased to see that.
(00:21:09)
>> One of my other favorite takeaways from
(00:21:11)
what you just shared is just such an
(00:21:12)
example of high agency and just doing
(00:21:15)
things. That's kind of a meme on
(00:21:16)
Twitter. Just you can just do things.
(00:21:18)
you're just like okay this is probably
(00:21:20)
necessary to move AI and it was called
(00:21:22)
machine learning back then right was
(00:21:24)
that the term most people used
(00:21:25)
>> I think it was interchangeably it's true
(00:21:28)
like I do remember the companies the
(00:21:31)
tech companies I I'm not going to name
(00:21:33)
names but I was I was uh in a
(00:21:36)
conversation in one of the early days I
(00:21:38)
think is in the middle of 2015 middle of
(00:21:42)
2016 uh some tech companies avoids using
(00:21:46)
the word AI I because they were not sure
(00:21:49)
if AI was a dirty word. And I remember I
(00:21:52)
was actually
(00:21:54)
encouraging everybody to use the word AI
(00:21:57)
because to me that is one of the most
(00:22:00)
audacious question humanity has ever
(00:22:03)
asked in our quest for science and
(00:22:06)
technology and I feel very proud of this
(00:22:08)
term. But yes, at the beginning some
(00:22:11)
people were not sure.
(00:22:13)
>> What year was that roughly when AI was
(00:22:14)
developed? 2016 I think that was
(00:22:17)
>> less than 10 years ago
(00:22:18)
>> that was the changing like um some
(00:22:21)
people start calling it AI but I think
(00:22:24)
if you look at the Silicon Valley tech
(00:22:27)
company companies if you trace their
(00:22:30)
marketing term I think
(00:22:34)
2017ish
(00:22:36)
was the beginning of companies calling
(00:22:38)
themselves AI companies
(00:22:40)
>> that's incredible just how the world has
(00:22:43)
changed now you Can't not call yourself
(00:22:45)
an AI company.
(00:22:46)
>> I know.
(00:22:47)
>> Just nineish years later.
(00:22:49)
>> Yeah.
(00:22:49)
>> Oh man. Okay. Is there anything else
(00:22:52)
around the history that early history
(00:22:54)
that you think people don't know that
(00:22:55)
you think is important before we chat
(00:22:57)
about where think things are going in
(00:22:58)
the work that you're doing?
(00:23:01)
>> I think as all histories, you know, I'm
(00:23:04)
keenly aware that uh I am recognized for
(00:23:08)
being part of the history, but there are
(00:23:10)
so many heroes and so many researchers.
(00:23:13)
We're talking about generations of
(00:23:15)
researchers there. You know, in my own
(00:23:18)
world, there are so many people who have
(00:23:20)
in inspired me, which I I talked about
(00:23:23)
in my book. But I do feel our culture,
(00:23:27)
especially Silicon Valley tends to
(00:23:31)
assign um achievements to a single
(00:23:34)
person. Well, while I think it has
(00:23:37)
value, um but it's it's just to be
(00:23:40)
remembered. AI is a field of at this
(00:23:43)
point 70 years old and we have gone
(00:23:46)
through many generations. Um nobody no
(00:23:50)
one um could have gotten here by
(00:23:53)
themselves.
(00:23:54)
>> Okay. So let me ask you this question.
(00:23:56)
It feels like we're always on this
(00:23:58)
precipice of AGI. This kind of vague
(00:24:00)
term people throw around. AGI is coming.
(00:24:02)
Is it going to take over everything? How
(00:24:04)
what's your take on how far you think we
(00:24:06)
might be from AGI? Do you think we're
(00:24:07)
going to get there on the current
(00:24:09)
trajectory we're on? Do you think we
(00:24:10)
need more breakthroughs? Do you think
(00:24:11)
the current approach will get us there?
(00:24:13)
>> Yeah, this is a very interesting term,
(00:24:15)
Lenny. Um,
(00:24:18)
I don't know if anyone has ever defined
(00:24:21)
AGI.
(00:24:24)
You know, there are many different
(00:24:25)
definitions including, you know, some
(00:24:28)
kind of superpower for machines all the
(00:24:31)
way to can um a machines can become
(00:24:35)
economically viable agent in in the
(00:24:39)
society.
(00:24:41)
In other words, making salaries to live.
(00:24:43)
Is that the definition of AGI? As a
(00:24:46)
scientist, I I take science very
(00:24:49)
seriously and I enter the field because
(00:24:52)
I was inspired by this audacious
(00:24:55)
question of can machines think and do
(00:24:58)
things in the way that human humans can
(00:25:02)
do. For me, that's always the northstar
(00:25:05)
of AI. And from that point of view, I
(00:25:08)
don't know what's the difference between
(00:25:09)
AI and AGI. I think we've done very well
(00:25:14)
in achieving parts of the goal,
(00:25:16)
including conversational AI, but I don't
(00:25:19)
think we have completely conquered all
(00:25:21)
the goals uh of of AI. And I think our
(00:25:24)
founding fathers that Alan Turing, I
(00:25:28)
wonder if Alan Turing is around today
(00:25:31)
and you ask him to contrast AI versus
(00:25:33)
AGI, he might just shrug and said, well,
(00:25:37)
I asked the same question back in 1940s.
(00:25:40)
So, so I don't want to get get onto a
(00:25:44)
rabbit hole of defining AI versus AGI. I
(00:25:48)
feel AGI is more a marketing term than a
(00:25:52)
scientific term. As a scientist and
(00:25:54)
technologist,
(00:25:56)
AI is my northstar is my field's
(00:25:59)
northstar and I'm happy people call it
(00:26:01)
whatever name they want to call it.
(00:26:05)
>> So let me ask you maybe maybe this way
(00:26:07)
like you described there's kind of these
(00:26:09)
components that from ImageNet and
(00:26:11)
AlexNet kind of took us to where we're
(00:26:13)
today. GPUs essentially data label data
(00:26:17)
just like the algorithm of the model.
(00:26:20)
There's also just the transformer feels
(00:26:22)
like an important step in that
(00:26:24)
trajectory. Do you feel like those are
(00:26:26)
the same components that'll get us to I
(00:26:27)
don't know 10 times smarter model
(00:26:29)
something that's like life-changing for
(00:26:31)
the entire world or do you think we need
(00:26:33)
more breakthroughs? I know we're we're
(00:26:35)
going to talk about world models which I
(00:26:36)
think is a component of this but is
(00:26:38)
there anything else that you think is
(00:26:39)
like oh this will plateau or okay this
(00:26:42)
will take us just need more data more
(00:26:43)
compute more GPUs.
(00:26:44)
>> Oh no I definitely think we need more uh
(00:26:47)
innovations. I I think scaling loss of
(00:26:50)
more data, more GPUs and bigger current
(00:26:53)
model architecture is there's still a
(00:26:57)
lot to be done there. But I absolutely
(00:26:59)
think we need to innovate more. Um there
(00:27:02)
is not a single
(00:27:04)
deeply scientific discipline in human
(00:27:07)
history that has arrived at a place that
(00:27:11)
says we're done. We're done innovating.
(00:27:13)
And AI is one one of the if not the
(00:27:17)
youngest discipline in in human
(00:27:20)
civilization in terms of science and
(00:27:22)
technology. We're still scratching the
(00:27:24)
surface. Uh for example, um like I said,
(00:27:27)
we're going to segue into world models
(00:27:29)
today. You take a a model and and and
(00:27:34)
run it through a a video of a couple of
(00:27:37)
office rooms and ask the the model to
(00:27:40)
count the number of chairs. And this is
(00:27:42)
something a toddler could do or maybe
(00:27:44)
maybe a a a elementary school kid could
(00:27:47)
do and AI could not do that, right? So
(00:27:51)
um there's just so much AI today could
(00:27:53)
not do then let alone thinking about how
(00:27:57)
did you know um someone like Isaac
(00:28:00)
Newton look at the movements of the
(00:28:03)
celestial bodies and and and derive an
(00:28:07)
equation or or a set of equations that
(00:28:11)
governs the movement of all bodies that
(00:28:14)
level of creativity extrapolation
(00:28:17)
abstraction we have no way of enabling
(00:28:21)
AI to do that today. And then let's look
(00:28:24)
at emotional intelligence. If you look
(00:28:26)
at a student coming into a teacher's
(00:28:30)
office and have a conversation about
(00:28:33)
motivation, passion, what to learn,
(00:28:35)
what's the problem that's that's you
(00:28:38)
know really uh bothering you. that
(00:28:41)
conversation as powerful as as today's
(00:28:45)
conversational bots are, you don't get
(00:28:48)
that level of emotional cognitive
(00:28:51)
intelligence uh from today's AI. So
(00:28:54)
there's a lot we can do better. Um and I
(00:28:58)
do not believe we're done innovating.
(00:29:00)
>> Uh Demis had this really interesting
(00:29:02)
interview recently from deep mind Google
(00:29:04)
where someone asked him just like what
(00:29:05)
do you think uh how far are we from AGI?
(00:29:08)
What does it look like when it's through
(00:29:09)
there? He had a really interesting way
(00:29:10)
of approaching it is if we were to give
(00:29:12)
a the most cutting edge model all the
(00:29:14)
information until the end of the 20th
(00:29:17)
century see if it could come up with all
(00:29:19)
the breakthroughs Einstein had and so
(00:29:21)
far we're never near that but they can
(00:29:22)
>> no we're not in fact it's even worse
(00:29:26)
let's give AI all the data including
(00:29:30)
modern instruments data of celestial
(00:29:33)
bodies which Newton did not have and
(00:29:36)
give it to that and just ask AI to
(00:29:39)
create the six 17th century set of
(00:29:42)
equations on the laws of bodily
(00:29:45)
movements. Today's AI cannot do that.
(00:29:49)
>> All right, we're a ways away is what I'm
(00:29:50)
hearing.
(00:29:51)
>> Yeah.
(00:29:51)
>> Okay, so let's talk about world models.
(00:29:53)
This is uh to me this is just another
(00:29:55)
really amazing example of you being
(00:29:58)
ahead of where people end up. So you
(00:30:01)
were way ahead on okay, we just need a
(00:30:03)
lot of clean data for AI and neural
(00:30:06)
networks to learn. uh you've been
(00:30:07)
talking about this idea of world models
(00:30:09)
for a long time. You started a company
(00:30:10)
to build uh essentially there's language
(00:30:13)
models. This is a different thing. This
(00:30:14)
is a world model. We'll talk about what
(00:30:15)
that is. And now uh as I was preparing
(00:30:18)
for this, Elon's like talking about
(00:30:19)
world models. Jensen's talking about
(00:30:21)
world models. I know Google's working on
(00:30:22)
this stuff. You've been at this for a
(00:30:24)
long time. And you're actually just
(00:30:25)
launched something that's going to we're
(00:30:27)
going to talk about uh right before this
(00:30:29)
podcast airs. Um talk about what is a
(00:30:32)
world model? Why is it so important? I'm
(00:30:34)
very excited to see that more and more
(00:30:36)
people are talking about role models
(00:30:39)
like Elon, like Jensen. Um,
(00:30:43)
I have been thinking about
(00:30:46)
really how to push AI forward all my
(00:30:50)
life, right? and the large language
(00:30:53)
models uh that came out of uh the
(00:30:57)
research world and then open AI and and
(00:31:00)
all this for the past few years were
(00:31:03)
extremely inspiring even for a
(00:31:06)
researcher like me. I remembered when
(00:31:09)
GPT2 came out and that was in I think
(00:31:13)
late 2020.
(00:31:16)
I was um co-director um I still am but I
(00:31:20)
was at that time uh full-time
(00:31:22)
co-director of Stanford's uh human
(00:31:24)
center AI institute and I I remember it
(00:31:27)
was you know the public was not aware of
(00:31:30)
the power of the large language model
(00:31:32)
yet but as researchers we were seeing it
(00:31:35)
we're seeing the future and I had pretty
(00:31:38)
long conversations with my natural
(00:31:41)
language processing colleagues like
(00:31:44)
Percy Leang and Chris Batting, we were
(00:31:46)
talking about how critical this
(00:31:48)
technology is going to be and Stanford
(00:31:52)
uh AI institute, human center AI
(00:31:53)
institute, hi was the first one to
(00:31:56)
establish a full research center um
(00:31:59)
foundation model. We were Percy Le Young
(00:32:01)
and and many researchers led the first
(00:32:04)
uh academic paper um foundation model.
(00:32:07)
So so it was just very inspiring for me.
(00:32:10)
So, of course, I come from the world of
(00:32:13)
visual intelligence and I was just
(00:32:16)
thinking there's so much we can um push
(00:32:18)
forward on beyond language because
(00:32:22)
humans um humans have used our sense of
(00:32:29)
spatial intelligence and world
(00:32:31)
understanding to do so many things and
(00:32:34)
they are beyond language. Think about a
(00:32:37)
very chaotic
(00:32:39)
first responder scene, whether it's fire
(00:32:42)
or some traffic accident or or some
(00:32:46)
natural disaster. And it's if you
(00:32:51)
immerse yourself in those scene and
(00:32:53)
think about how people organize
(00:32:55)
themselves to to rescue people, to stop
(00:32:58)
further disasters, to put down fires, to
(00:33:02)
to a lot of that is movements, is is
(00:33:07)
spontaneous understanding of objects,
(00:33:10)
worlds, hum
(00:33:13)
situational awareness. Language is part
(00:33:16)
of that. But a lot of those situations
(00:33:19)
language cannot get you to put down the
(00:33:21)
fire. So that is what is that? I I was
(00:33:25)
thinking a lot and in the meantime I was
(00:33:27)
doing a lot of robotics research and I
(00:33:30)
it ca it dawned on me that the lynch pin
(00:33:34)
of connecting
(00:33:37)
the additional intelligence in addition
(00:33:40)
to language and connecting embodied AI
(00:33:44)
which are robotics. connecting visual
(00:33:47)
intelligence is this sense of spatial
(00:33:50)
intelligence about understanding the
(00:33:53)
world and that's when um I think I um it
(00:33:57)
was 2024 I gave a TED talk about spatial
(00:34:01)
intelligence and world models and uh I
(00:34:05)
start formulating this idea uh back in
(00:34:09)
2022
(00:34:11)
um based on my robotics and computer
(00:34:13)
vision research and then one thing that
(00:34:16)
is really clear to me is that I really
(00:34:20)
want to work with the brightest uh
(00:34:22)
technologist and and move as fast as
(00:34:26)
possible to bring this technology to
(00:34:28)
life. And that's when we founded this
(00:34:30)
company called World Labs. And you can
(00:34:33)
see the the the word world is in the
(00:34:36)
title of our company because we believe
(00:34:38)
so much in world modeling and spatial
(00:34:40)
intelligence.
(00:34:42)
>> People are so used to just chat bots and
(00:34:43)
that's a large language model. So the
(00:34:45)
simple way to understand a world model
(00:34:46)
is you basically describe a scene and it
(00:34:49)
generates an infinitely
(00:34:51)
explorable world. We'll link to a the
(00:34:53)
thing you launch which we'll talk about
(00:34:55)
but just is that a simple way to
(00:34:56)
understand it?
(00:34:56)
>> That's part of it Lenny. I think a
(00:34:58)
simple way to understand a world model
(00:35:01)
uh is that this model can allow anyone
(00:35:05)
to create
(00:35:08)
any worlds in their mind's eye by
(00:35:11)
prompting whether it's an image or a
(00:35:14)
sentence
(00:35:15)
and also be able to interact in this
(00:35:18)
world. whether you're browsing and
(00:35:21)
walking or or picking objects up or or
(00:35:24)
or changing changing things as well as
(00:35:29)
to reason within this world. For
(00:35:31)
example, if if the person consuming if
(00:35:35)
the agent consuming this output of the
(00:35:38)
world model is a robot, it should be
(00:35:40)
able to plan its path and and help to
(00:35:44)
you know tidy the kitchen for example.
(00:35:48)
So, so world model is a
(00:35:52)
a foundation that that you can use to
(00:35:56)
reason, to interact, and to create
(00:35:59)
worlds.
(00:36:00)
>> Great. Yeah. So, robots feels like
(00:36:02)
that's potentially the next big focus
(00:36:06)
for AI researchers and just like the
(00:36:08)
impact on the world. And what you're
(00:36:10)
saying here is uh this is a key missing
(00:36:13)
piece of making robots actually work in
(00:36:16)
the real world. Understanding how the
(00:36:17)
world works.
(00:36:18)
>> Yeah. Well, first of all, I do think
(00:36:20)
there's more than robots that's
(00:36:21)
exciting. Um so, but I agree with
(00:36:24)
everything you just said. I think uh
(00:36:26)
world modeling and spatial intelligence
(00:36:29)
is a key missing piece of uh uh embody
(00:36:33)
AI. I also think let's not underestimate
(00:36:36)
that humans are embodied agents and
(00:36:39)
humans can be augmented by AI's uh
(00:36:43)
intelligence just like today humans are
(00:36:46)
language animals but we're very much
(00:36:48)
augmented by AI when helping us to you
(00:36:52)
know do language tasks including
(00:36:54)
software engineering. I I think that uh
(00:36:57)
we shouldn't underestimate or maybe it's
(00:37:00)
it's um we tend not to talk about how
(00:37:04)
humans as an embodied agents can
(00:37:07)
actually benefit so much from world
(00:37:10)
models and spatial intelligent u models
(00:37:13)
as well as robots can. So the big
(00:37:16)
unlocks here, robots, which uh a huge
(00:37:19)
deal. If this works out, imagine each of
(00:37:21)
us has robots doing a bunch of stuff for
(00:37:22)
us. Goes into, you know, they help us
(00:37:24)
with disasters, things like that. Uh
(00:37:26)
games obviously is a really cool
(00:37:27)
example. Just like infinitely playable
(00:37:30)
games that you just invent out of your
(00:37:31)
head. And then creativity feels like
(00:37:34)
just like being fun, having fun, being
(00:37:35)
creative, thinking of m wild new worlds
(00:37:37)
and and environments.
(00:37:39)
>> And also design. humans design from
(00:37:42)
machines to buildings to homes and also
(00:37:46)
scientific discovery right there is so
(00:37:48)
much u I I like to use the example of
(00:37:52)
the discovery of the structure of DNA if
(00:37:55)
you look at one of the most important
(00:37:58)
piece in DNA's discovery history is the
(00:38:03)
X-ray defraction photo that was captured
(00:38:06)
by Rosalyn Franklin and it was a flat 2D
(00:38:10)
photo of a structure that looks like it
(00:38:13)
looks like a cross with defractions. You
(00:38:16)
can you can uh Google those photos. But
(00:38:19)
with that 2D flat photo,
(00:38:24)
humans, especially two important humans,
(00:38:27)
James Watson and Francis Crick, in
(00:38:30)
addition to their other uh information,
(00:38:33)
was able to reason in 3D space and
(00:38:38)
deduce a highly three-dimensional double
(00:38:41)
helix structure of the DNA. And that
(00:38:44)
structure cannot possibly be 2D. You
(00:38:48)
cannot think in 2D and deduce that
(00:38:52)
structure. You have to think in 3D
(00:38:55)
spatial um use the the human spatial
(00:38:58)
intelligence. So I think even in
(00:39:01)
scientific discovery um spatial
(00:39:03)
intelligence or AI assisted spatial
(00:39:06)
intelligence is critical.
(00:39:08)
>> This is such an example of I think it
(00:39:10)
was Chris Dixon that had this line that
(00:39:12)
the next big thing is going to start off
(00:39:14)
feeling like a toy. When Chad GBT just
(00:39:17)
came out, if like I remember Salman just
(00:39:19)
tweeted as like here's a cool thing
(00:39:20)
we're playing with. Check it out. Now
(00:39:21)
it's the fastest growing product all of
(00:39:23)
history changed the world.
(00:39:24)
>> Yeah.
(00:39:24)
>> Uh and it's oftentimes the things that
(00:39:26)
just look like okay this is cool. Uh
(00:39:29)
that it's fun to play with and end up
(00:39:30)
changing the world most.
(00:39:32)
>> Yeah.
(00:39:33)
>> This episode is brought to you by Cinch,
(00:39:35)
the customer communications cloud.
(00:39:38)
Here's the thing about digital customer
(00:39:39)
communications. Whether you're sending
(00:39:41)
marketing campaigns, verification codes,
(00:39:44)
or account alerts, you need them to
(00:39:45)
reach users reliably. That's where Cinch
(00:39:48)
comes in. Over 150,000 businesses,
(00:39:51)
including eight of the top 10 largest
(00:39:53)
tech companies globally, use Cinch's API
(00:39:55)
to build messaging, email, and calling
(00:39:58)
into their products. And there's
(00:39:59)
something big happening in messaging
(00:40:01)
that product teams need to know about.
(00:40:03)
Rich Communication Services, or RCS.
(00:40:06)
Think of RCS as SMS 2.0. Instead of
(00:40:09)
getting text from a random number, your
(00:40:11)
users will see your verified company
(00:40:13)
name and logo without needing to
(00:40:15)
download anything new. It's a more
(00:40:17)
secure and branded experience. Plus, you
(00:40:19)
get features like interactive carousels
(00:40:21)
and suggested replies. And here's why
(00:40:23)
this matters. US carriers are starting
(00:40:25)
to adopt RCS. Cinch is already helping
(00:40:28)
major brands send RCS messages around
(00:40:31)
the world, and they're helping Lenny's
(00:40:32)
podcast listeners get registered first
(00:40:34)
before the rush hits the US market.
(00:40:37)
Learn more at get started at
(00:40:38)
cinch.com/lenny.
(00:40:41)
That's s i nch.com/lenny.
(00:40:45)
>> I reached out to Ben Horowitz who loves
(00:40:47)
what you're doing. A big fan of yours.
(00:40:49)
Uh they're investors I believe. And
(00:40:51)
>> yeah, we we've known each other for for
(00:40:54)
many years, but yes, right now they are
(00:40:56)
investors of uh Warlaps.
(00:40:58)
>> Amazing. Okay. So I asked him what I
(00:40:59)
should ask you about and he suggested
(00:41:01)
ask you why is the bitter why is the
(00:41:03)
bitter lesson alone not likely to work
(00:41:07)
for robots. So first of all just explain
(00:41:10)
what the bitter lesson was in the
(00:41:12)
history of AI and then just why that
(00:41:14)
won't get us to where we want to be with
(00:41:15)
robots.
(00:41:16)
So well first of all there are many
(00:41:18)
bitter lessons but
(00:41:21)
but the bitter lessons everybody refers
(00:41:23)
to is a u is a paper written by Richard
(00:41:26)
Sutton who won the touring award
(00:41:29)
recently and he does a lot of
(00:41:31)
reinforcement learning and Richard has
(00:41:33)
said right if you look at the the
(00:41:35)
history especially the algorithmic
(00:41:37)
development of AI it turns out simpler
(00:41:41)
model with a ton of data always win at
(00:41:45)
the end of the day instead of the the um
(00:41:49)
the you know more complex model with
(00:41:52)
less data. I mean that was actually this
(00:41:55)
paper came years after imageet that to
(00:41:58)
me was not bitter it was a sweet lesson
(00:42:02)
that's why I built uh image net because
(00:42:04)
I believe that uh big data plays that
(00:42:07)
role so why can bitter lesson work in
(00:42:11)
robotics alone well first of all um I
(00:42:15)
think we need to give credit to where we
(00:42:18)
are today robotics is very much in the
(00:42:21)
early days of
(00:42:23)
experimentation. It's not the the
(00:42:26)
research is not nearly as mature as say
(00:42:29)
language models. So many people are
(00:42:33)
still um experimenting with different
(00:42:36)
algorithms and some of those algorithms
(00:42:38)
are driven by big data. So I do think
(00:42:42)
big data will continue to play a role in
(00:42:46)
robotics and um but what is hard for
(00:42:51)
robotics there are a couple of things
(00:42:53)
one is that
(00:42:55)
it's harder to get data it's a lot
(00:42:58)
harder to get data you can say well
(00:43:00)
there is web data this is where the
(00:43:02)
latest robotics research is using web
(00:43:05)
videos and I think web videos do do play
(00:43:09)
a role but if you Think about what made
(00:43:11)
language model work. A very as someone
(00:43:15)
who does computer vision and and spatial
(00:43:18)
intelligence and robotics, I'm very
(00:43:19)
jealous of my colleagues in um in
(00:43:22)
language because they had this perfect
(00:43:26)
setup where their training data are in
(00:43:29)
words eventually tokens and then they
(00:43:33)
produce a model that outputs words. So
(00:43:37)
you have this perfect alignment between
(00:43:40)
what you hope to get which we call
(00:43:42)
objective function and what your
(00:43:45)
training data looks like. But robotics
(00:43:48)
is different. Even spatial intelligence
(00:43:50)
is different. You hope to get actions
(00:43:54)
out of robots.
(00:43:56)
But your training data lacks
(00:43:59)
actions in 3D worlds. And that's what
(00:44:03)
robots have to do, right? actions in 3D
(00:44:06)
worlds. So, you have to um find
(00:44:09)
different ways to fit a uh what do they
(00:44:14)
call a a a a square in a round hole that
(00:44:19)
what we have is tons of web videos.
(00:44:23)
So then we have to start talking about
(00:44:26)
uh adding supplementing
(00:44:29)
data such as teleaoperation data or
(00:44:33)
synthetic data so that the robots are
(00:44:36)
trained with this hypothesis of bitter
(00:44:39)
lesson which is large amount of data. I
(00:44:42)
think there's still hope because even
(00:44:45)
what we are doing um in world modeling
(00:44:48)
will really unlock a lot of this uh
(00:44:52)
information for robots but I think we
(00:44:54)
have to be careful because we're at the
(00:44:56)
early days of this and bitter lesson is
(00:44:59)
still to be tested uh because we haven't
(00:45:04)
fully figured out the data for another
(00:45:08)
part of the bitter lesson of robotics I
(00:45:10)
think we should be so
(00:45:12)
so realistic about is again compared to
(00:45:16)
language models or even spatial models,
(00:45:19)
robots are physical systems. So robots
(00:45:24)
are closer to self-driving cars than a
(00:45:27)
large language model. And that's very
(00:45:29)
important to recognize. That means that
(00:45:33)
in order for robots to work, we not only
(00:45:37)
need brains, we also need the physical
(00:45:40)
body, we also need application
(00:45:43)
scenarios. And if you look at the the
(00:45:45)
the the the
(00:45:47)
history of self-driving car, um my
(00:45:50)
colleague Sebastian Thrum uh uh took
(00:45:53)
Stanford's car uh to win the first DARPA
(00:45:58)
challenge in 2006 or 2005. It's 20 years
(00:46:02)
since that prototype of a self-driving
(00:46:06)
car being able to drive 130 miles in the
(00:46:10)
Nevada desert to today's Whimo and um on
(00:46:15)
the street of San Francisco and we're
(00:46:18)
not even done yet. There's still a lot.
(00:46:20)
So that's a 20 year journey. And
(00:46:23)
self-driving cars are much simpler
(00:46:25)
robots. They're just metal boxes running
(00:46:27)
on 2D surfaces. And the goal is not to
(00:46:31)
touch anything. Robot is 3D things
(00:46:36)
running in 3D world and the goal is to
(00:46:39)
touch things. So the journey is going to
(00:46:42)
be you know there's many aspects
(00:46:45)
elements and of course one could say
(00:46:48)
well the self-driving car early
(00:46:50)
algorithm were pre-deep learning era. So
(00:46:54)
deep learning is accelerating uh the
(00:46:56)
brains and I think that's true. That's
(00:46:58)
why I'm in robotics. That's why I'm in
(00:47:01)
spatial intelligence and I'm excited by
(00:47:03)
it. But in the meantime, the car
(00:47:05)
industry is very mature and productizing
(00:47:10)
also involves the mature
(00:47:13)
use cases, supply chains, the hardware.
(00:47:16)
So I think it's a very interesting time
(00:47:18)
to work in these problems. But it's true
(00:47:21)
Ben is right. we might still be subject
(00:47:26)
to a number of bitter lessons
(00:47:29)
>> doing this work. Do you ever just feel
(00:47:31)
awe for the way the brain works and is
(00:47:33)
able to do all of this for us? Just the
(00:47:36)
complexity just to get a a machine to
(00:47:39)
just walk around and not hit things and
(00:47:41)
fall. Does it just give you more spec
(00:47:43)
for what we've already got?
(00:47:44)
>> Totally. We we operate on about 20
(00:47:48)
watts.
(00:47:50)
That's dimmer than any light bulb in in
(00:47:52)
the room. I'm in right now. And yet we
(00:47:56)
can do so much. So I think actually the
(00:47:59)
more I work in AI, the more I respect
(00:48:02)
humans.
(00:48:03)
>> Let's talk about this uh product you
(00:48:06)
just launched. It's called Marble. A
(00:48:07)
very cute name. Talk about what this is,
(00:48:09)
why this important. I've been playing
(00:48:10)
with it. It's incredible. We'll link to
(00:48:11)
it and for folks to check it out. What
(00:48:13)
is Marble?
(00:48:15)
>> Yeah, I'm very excited. So first of all,
(00:48:17)
Marbo is uh one of the first product
(00:48:19)
that World Labs uh has rolled out.
(00:48:22)
Worldlabs is a foundation frontier model
(00:48:25)
company. We are founded by four
(00:48:28)
co-founders who have deep technical
(00:48:31)
history. My co-founders Justin Johnson
(00:48:34)
uh Kristoff uh Lassner and Ben
(00:48:37)
Mildenhal. We all come from the research
(00:48:40)
field of AI, computer graphics, computer
(00:48:42)
vision. And uh we believe that spatial
(00:48:45)
intelligence and world modeling is as
(00:48:49)
important if not more to uh language
(00:48:51)
models and uh complementaryary to to
(00:48:54)
language models. So we wanted to seize
(00:48:57)
this opportunity to create deep uh tech
(00:49:02)
research lab that can connect the dots
(00:49:05)
between um frontier models with
(00:49:08)
products. So, Marvel is an app that's
(00:49:13)
built upon our frontier models. We've
(00:49:16)
spent a year and plus building the
(00:49:19)
world's first uh generative model that
(00:49:22)
can output genuinely 3D worlds. That's a
(00:49:27)
very very hard problem. Um and uh and I
(00:49:32)
it it was a very hard process. Uh we uh
(00:49:36)
we have a team of incredible founding
(00:49:38)
team of incredible technologists from
(00:49:41)
you know incredible uh teams. And then
(00:49:47)
around um just a month or two ago, we
(00:49:51)
saw the first time that we we can just
(00:49:55)
prompt with a sentence and an image and
(00:49:58)
multiple images and create worlds that
(00:50:01)
we can just navigate in. If you put it
(00:50:04)
on goggle, which we have an option to
(00:50:06)
let you do that, you can even walk
(00:50:08)
around, right? So it was even though
(00:50:11)
we've been building this for for for
(00:50:13)
quite a while, it was still just all
(00:50:15)
inspiring and we wanted to get into the
(00:50:18)
hands of uh people who need it. And then
(00:50:21)
we know that so many creators,
(00:50:24)
designers, people who are thinking about
(00:50:28)
uh robotic simulation, people who are
(00:50:30)
thinking about uh different use cases of
(00:50:34)
uh navigable interactable
(00:50:37)
um uh immersive worlds, game developers
(00:50:40)
will find this useful. So we uh develop
(00:50:43)
developed Marble as a first step. It's
(00:50:46)
it's again still very early uh but it's
(00:50:50)
the world's first uh model doing this
(00:50:52)
and it's the world's first uh product
(00:50:55)
that allows people to just uh prompt we
(00:50:59)
call it prompt to worlds.
(00:51:01)
>> Well, I've been playing around with it.
(00:51:02)
It is insane. Like you could just have a
(00:51:04)
little sh world where you just
(00:51:05)
infinitely walk around Middle Earth
(00:51:07)
basically and there's no there's no one
(00:51:09)
there yet but uh it's insane. You just
(00:51:11)
go anywhere. There's like dystopian
(00:51:12)
world. I'm just looking at all these
(00:51:13)
examples.
(00:51:14)
>> Yes. Uh, and my favorite part actually,
(00:51:16)
I don't know, I don't know if this is a
(00:51:17)
feature or bug, you can see like the
(00:51:19)
dots of the world before it actually
(00:51:21)
renders with all the textures. And I
(00:51:23)
just love to like you get a glimpse into
(00:51:25)
what is going on with this model.
(00:51:26)
basically create.
(00:51:27)
>> That's so cool to hear because this is
(00:51:30)
where as a researcher I I I'm learning
(00:51:34)
because the the the the dots that lead
(00:51:37)
you into the world was a an intentional
(00:51:42)
feature uh visualization. It is not part
(00:51:46)
of the model. It's uh the model actually
(00:51:48)
just generates the world. We we were
(00:51:51)
trying to find a way to guide people
(00:51:53)
into the world and a number of engineers
(00:51:56)
uh worked on different versions but we
(00:51:59)
converged on the dot and so many people
(00:52:02)
you're not the only one told us how
(00:52:04)
delightful that experience is and it it
(00:52:07)
was really satisfying for us to hear
(00:52:10)
that this intentional visualization
(00:52:13)
feature that's not just the big hardcore
(00:52:16)
model actually has delighted our users.
(00:52:19)
>> Wow. So, you add that to make it more uh
(00:52:22)
like to have humans understand what's
(00:52:24)
going on more, get more delightful. Wow,
(00:52:26)
that is hilarious. It makes me think
(00:52:28)
about LM and the way they it's not the
(00:52:30)
same thing, but they talk about what
(00:52:31)
they're thinking and what they're doing.
(00:52:33)
>> Yes, it is. It is.
(00:52:35)
>> It also makes me think about just the
(00:52:36)
Matrix. Like, it's exactly the Matrix
(00:52:39)
experience. I don't know if that was
(00:52:40)
your inspiration.
(00:52:42)
>> Um, well, like I said, a number of
(00:52:43)
engineers worked on that. It could be
(00:52:45)
their inspiration. It's in their It's in
(00:52:48)
their uh It's in their subconscious.
(00:52:50)
>> Yeah.
(00:52:51)
>> Okay. So, just for folks that may want
(00:52:52)
to play around with this, maybe use it.
(00:52:54)
What's like what are some applications
(00:52:55)
today that folks can start using today?
(00:52:57)
What's what's your goal with this
(00:52:59)
launch?
(00:53:00)
>> Yeah. So, um we do believe that world
(00:53:03)
modeling is very horizontal, but we're
(00:53:05)
already seeing some really exciting uh
(00:53:08)
use cases. virtual production for movies
(00:53:11)
because what they need are 3D uh worlds
(00:53:16)
that they can align with the camera so
(00:53:18)
when the actors are acting on it uh they
(00:53:22)
can you know they can uh position the
(00:53:24)
camera and shoot the the segments really
(00:53:27)
well and uh we're already seeing um
(00:53:30)
incredible use in fact I don't know if
(00:53:34)
you have seen our launch video showing
(00:53:36)
marble it was produced by a virtual uh
(00:53:40)
production company. We we collaborated
(00:53:42)
with Sony and they use marble things to
(00:53:45)
shoot those videos. So our we were
(00:53:48)
collaborating with those uh uh technical
(00:53:50)
artists and directors and they were
(00:53:52)
saying this has cut our uh production
(00:53:55)
time by uh 40x.
(00:53:58)
In fact it has tox.
(00:54:00)
>> Yes. In fact, I had to because we only
(00:54:02)
had one month to work on this project
(00:54:05)
and and there were so many things they
(00:54:08)
were trying to shoot. So, so using
(00:54:10)
marble really really significantly
(00:54:13)
accelerated the production of virtual
(00:54:16)
virtual production for VFX and movies.
(00:54:19)
That's one use cases. We are already
(00:54:22)
seeing our users putting uh taking our
(00:54:25)
marble scene and taking the mesh export
(00:54:28)
and putting games you know whether it's
(00:54:30)
games on VR or games uh just just just
(00:54:33)
fun games that they they have developed
(00:54:36)
we have had um we were showing uh an
(00:54:40)
example of uh robotic simulation because
(00:54:44)
uh when I was I mean I'm still am a
(00:54:48)
researcher doing robotic uh training.
(00:54:52)
One of the biggest pain point is to
(00:54:54)
create synthetic data for training
(00:54:56)
robots. And these synthetic data needs
(00:54:58)
to be very diverse. They need to come
(00:55:00)
from different environments with
(00:55:02)
different objects to manipulate. And uh
(00:55:05)
and one path to it is is to ask uh
(00:55:09)
computers to simulate. Otherwise, humans
(00:55:12)
have to, you know,
(00:55:14)
build every single asset for robots.
(00:55:17)
That that's just going to take a lot
(00:55:19)
longer. So we already have researchers
(00:55:22)
reaching out and wanting to use marble
(00:55:24)
to create those synthetic environments.
(00:55:26)
We also have unexpected um user uh
(00:55:31)
outreach in terms of uh how they want to
(00:55:35)
use marble. For example, a psychologist
(00:55:39)
team called us to use marble to do
(00:55:42)
psychology research. It turned out some
(00:55:45)
of the psychiatric patients they study,
(00:55:48)
they need to understand how their brain
(00:55:51)
respond to different immersive scenes of
(00:55:55)
different features. Uh, for example,
(00:55:57)
messy scenes or clean scenes or or
(00:56:00)
whatever you name it. And it's very hard
(00:56:03)
for researchers to get their hands on um
(00:56:06)
these kind of immersive scenes. and it
(00:56:08)
will take them too long and too much
(00:56:11)
budget to uh to to create. And Marble is
(00:56:16)
a really almost instantaneous way of
(00:56:20)
getting so many of these um experimental
(00:56:23)
uh environments into their hands. So,
(00:56:26)
we're seeing um uh we're seeing multiple
(00:56:29)
use cases at this point, but the the
(00:56:32)
VFX, the game developers, the simulation
(00:56:35)
uh uh developers as well as designers
(00:56:38)
are very excited.
(00:56:39)
>> This is very much the way things work in
(00:56:41)
AI. I've had other AI leaders on the
(00:56:43)
podcast and it's always like put things
(00:56:45)
out there early as soon as you can to
(00:56:47)
discover where the big use cases are.
(00:56:49)
the head of CHAJBT told me how when they
(00:56:51)
first put out ChatJBT, he was just
(00:56:53)
scanning TikTok to see how people were
(00:56:55)
using it and all the things they were
(00:56:56)
talking about and that's what convinced
(00:56:58)
them where to lean in and and help them
(00:57:00)
see how people actually want to use it.
(00:57:02)
I love this last use case of like for
(00:57:04)
therapy. I'm just imagining like like
(00:57:06)
heights, people seeing dealing with
(00:57:09)
heights or snakes or spiders, which
(00:57:12)
>> it's amazing. A friend of mine last
(00:57:14)
night literally called me and talked
(00:57:16)
about his height scare and asked me if
(00:57:19)
marble should be used. That's amazing.
(00:57:22)
You went straight there.
(00:57:23)
>> That's, you know, cuz I'm imagining all
(00:57:25)
the like the exposure therapy uh stuff
(00:57:28)
like this could be so good for that. Uh
(00:57:30)
that is so cool. Okay, so let me I
(00:57:32)
should have asked you this before, but I
(00:57:33)
think there's a qu there's going to be a
(00:57:35)
question of just how does this differ
(00:57:36)
from things like V3 and other video
(00:57:39)
generation models? It's pretty clear to
(00:57:41)
me, but I think it might be helpful just
(00:57:43)
to explain how this different from all
(00:57:44)
the video AI tools people have seen.
(00:57:47)
>> Wordlab's thesis is that spatial
(00:57:49)
intelligence is fundamentally very
(00:57:51)
important and spatial intelligence is
(00:57:53)
not just uh uh it's not just about
(00:57:58)
videos. In fact, the world is not
(00:58:00)
passively watching videos passing by,
(00:58:04)
right? Um I I love uh Plato has the
(00:58:08)
allegory of the cave analogy uh to
(00:58:12)
describe vision. He said that imagine a
(00:58:15)
prisoner tied on his chair uh not not
(00:58:19)
very uh humane but um uh in in a cave uh
(00:58:24)
watching a full life theater uh on the
(00:58:29)
in front of him. But but the actual live
(00:58:32)
theater that actors are acting is behind
(00:58:35)
his back. It was just lit so that the
(00:58:39)
projection of the the uh the action is
(00:58:42)
on a on a wall of the cave and and then
(00:58:46)
the goal the the task of this prisoner
(00:58:49)
is to figure out what's going on. It's a
(00:58:51)
pretty extreme example, but it really
(00:58:54)
shows it describes what vision is about.
(00:59:00)
is that to make sense of the 3D world or
(00:59:03)
4D world out of 2D. So spatial
(00:59:07)
intelligence to me is deeper than owning
(00:59:11)
creating that flat 2D world. Spatial
(00:59:15)
intelligence to me is the ability to
(00:59:20)
create, reason, interact, make sense of
(00:59:25)
deeply spatial world, whether it's 2D or
(00:59:29)
3D or 4D, including dynamics and all
(00:59:32)
that. So, so World Lab is focusing on
(00:59:35)
that. And of course, um the ability to
(00:59:38)
create videos per se, could be part of
(00:59:41)
this. And in fact uh just a couple of
(00:59:44)
weeks ago we rolled out the world's
(00:59:46)
first uh realtime
(00:59:48)
demoable realtime video generation on a
(00:59:52)
single uh H100 GPU. So we we we part of
(00:59:56)
our technology includes that. But I
(00:59:59)
think Marvel is very different because
(01:00:01)
we really want creators, designers,
(01:00:06)
developers to have in their hands a
(01:00:10)
model that can give them uh worlds with
(01:00:14)
3D structure so they can use it for for
(01:00:17)
their work. And that's where that's why
(01:00:20)
Marble is so different.
(01:00:22)
>> The way I see it is it's a it's a
(01:00:23)
platform for a ton of opportunity to do
(01:00:26)
stuff. uh as you described videos are
(01:00:29)
just like here's a oneoff video that's
(01:00:30)
very fun and cool and you could and
(01:00:32)
that's it and that's it and you move on.
(01:00:33)
>> By the way, we could in Marble we could
(01:00:36)
allow people to export in video form. So
(01:00:39)
you could actually, like you said, you
(01:00:41)
go into a world. So So let's say it's a
(01:00:44)
hobbit uh cave, you can actually,
(01:00:47)
especially as a creator, you have such a
(01:00:50)
uh specific way of uh uh moving the
(01:00:54)
camera in a trajectory in the director's
(01:00:57)
mind, right? And then you can export
(01:00:59)
that uh from Marble into a video.
(01:01:02)
>> What does it take to create something
(01:01:03)
like this? Just like how big is the
(01:01:05)
team? How many how many GPUs you
(01:01:07)
working? Like anything you can share
(01:01:08)
there? I don't know how much of this is
(01:01:09)
private information, but just what does
(01:01:10)
it take to create something like this
(01:01:12)
that you've launched here?
(01:01:13)
>> It takes a lot of brain power.
(01:01:16)
So, we just talk about 20 watts per
(01:01:20)
brain. It's uh so from that point of
(01:01:22)
view, it's it's a small number, but but
(01:01:25)
it's actually an incredible, you know,
(01:01:27)
it's a half billion years of evolution
(01:01:30)
to get give us those power. Um we have a
(01:01:34)
team of 30ish people now and uh we are
(01:01:39)
predominantly
(01:01:40)
uh researchers and research engineers
(01:01:44)
and uh but we also have designers and
(01:01:47)
and product. We we actually really
(01:01:50)
believe that we want to create a company
(01:01:52)
that's anchored in the deep tech of
(01:01:56)
spatial intelligence but uh we we we are
(01:02:00)
actually building serious products. Um
(01:02:04)
so so we have we have this uh
(01:02:07)
integration of R&D and productization
(01:02:11)
and of course we use you know a ton of
(01:02:14)
GPUs. That's a that's the technical
(01:02:17)
>> I'm so happy to hear.
(01:02:20)
>> Well, congrats on the launch. I know
(01:02:21)
this is a huge milestone. I know this
(01:02:23)
took a ton of work. So, I just want to
(01:02:24)
say congrats to you and your team.
(01:02:26)
>> Let me talk about your founder journey
(01:02:28)
for a moment. So, you're a founder of
(01:02:30)
this company. You started how many years
(01:02:32)
ago? Couple years ago, two, three years
(01:02:33)
ago.
(01:02:33)
>> Oh, a year ago. A year ago.
(01:02:36)
>> A year. Okay.
(01:02:37)
>> 18 month. Yeah.
(01:02:39)
>> Okay. What's something you wish you knew
(01:02:41)
before you started this that you wish
(01:02:42)
you could like whisper into the ear of
(01:02:44)
Fay of 18 months ago?
(01:02:46)
>> Well, I continue to wish I know
(01:02:51)
the future of technology. I think
(01:02:53)
actually that's one of our founding
(01:02:55)
advantage is that we see the future
(01:02:59)
earlier in general than than most
(01:03:01)
people. But still, man, this is so
(01:03:03)
exciting and so uh amazing that that
(01:03:07)
what's unknown and what's coming. But I
(01:03:10)
know the reason you're asking me this
(01:03:12)
question is not about the future of
(01:03:14)
technology. You're probably more, you
(01:03:16)
know, look, I I did not start a company
(01:03:20)
of this scale
(01:03:23)
at 20 year old. So, you know, I started
(01:03:26)
a dry cleaner when I was 19, but that's
(01:03:29)
a little smaller scale. we got to talk
(01:03:31)
about that
(01:03:31)
>> and and then I you know um founded
(01:03:34)
Google Cloud AI and then I founded an
(01:03:37)
institute at Stanford but those are
(01:03:39)
different beasts. I did feel I was a
(01:03:43)
little more prepared as a a founder of
(01:03:47)
the the grinding journey that um that I
(01:03:51)
um compared to maybe um maybe the the
(01:03:55)
the 20 year old founders. But I still
(01:04:00)
I'm surprised and and and uh it puts me
(01:04:04)
into paranoia sometimes that how
(01:04:08)
intensely competitive uh AI landscape is
(01:04:13)
from
(01:04:15)
from the model the technology itself as
(01:04:18)
well as talents. And you know when I
(01:04:21)
founded the company um we did not have
(01:04:25)
these incredible stories of how much
(01:04:28)
certain talents would cost you know um
(01:04:32)
so these are things that continue to
(01:04:34)
surprise me and uh and I have to be very
(01:04:38)
alert about.
(01:04:40)
>> So the competition you're talking about
(01:04:41)
is yeah the competition for talent the
(01:04:44)
speed at which how things are moving.
(01:04:46)
>> Yeah.
(01:04:47)
>> Yeah. you mentioned this point that I
(01:04:49)
want to come back to that you if you
(01:04:51)
just look over the course of your
(01:04:53)
career. You were like at all of the
(01:04:55)
major uh collections of humans that led
(01:04:59)
to so many of the breakthroughs that are
(01:05:01)
happening today. Obviously we talked
(01:05:02)
about Imageet also just sale at Stanford
(01:05:04)
is where a lot of the work happened at
(01:05:07)
Google cloud which a lot of the
(01:05:08)
breakthroughs happened. What brought you
(01:05:10)
to those places? uh like for people
(01:05:13)
looking for how to advance in their
(01:05:16)
career, be at the center of the future,
(01:05:18)
just like is there a throughine there of
(01:05:19)
just what pulled you from place to place
(01:05:22)
and pulled you into those groups that
(01:05:24)
might be helpful for people to hear?
(01:05:26)
>> Yeah, this is actually a great question,
(01:05:28)
Lenny, because I do think about it and
(01:05:30)
uh
(01:05:32)
obviously we talked about it curiosity
(01:05:35)
and passion that brought me to AI. That
(01:05:37)
is more a scientific northstar, right? I
(01:05:40)
did not care if AI was a thing or not.
(01:05:44)
So, so that was one part. But how did I
(01:05:47)
end up choosing
(01:05:49)
um in the particular places I work in
(01:05:52)
including starting world labs is
(01:05:57)
I think I'm very grateful
(01:06:00)
to myself or maybe to my parents' jeans.
(01:06:05)
I'm I'm an intellectually very fearless
(01:06:08)
person and I have to say when I hire
(01:06:11)
young people I look for that because I
(01:06:15)
um
(01:06:16)
I think that's a very important quality
(01:06:19)
if one wants to make a difference is
(01:06:22)
that when you want to make a difference
(01:06:25)
you have to accept that you're creating
(01:06:29)
something new or you're diving into
(01:06:31)
something new. people haven't done that.
(01:06:33)
And if you have that self-awareness, you
(01:06:37)
almost have to allow yourself to be
(01:06:40)
fearless and to be courageous. So when I
(01:06:44)
uh for example um came to Stanford, you
(01:06:49)
know, in the world of academia,
(01:06:52)
I was very close to this thing called
(01:06:55)
tenure um which is, you know, have the
(01:06:58)
job forever in in at Princeton. But I
(01:07:03)
I choose to chose to come to Stanford
(01:07:06)
because I love Princeton. It's my alma
(01:07:08)
mater. It's just at that moment there
(01:07:12)
are people who are so amazing at
(01:07:14)
Stanford and the Silicon Valley
(01:07:16)
ecosystem was so amazing that I was okay
(01:07:21)
to take a risk of restarting my tenure
(01:07:24)
clock.
(01:07:25)
um going to um becoming the first uh
(01:07:30)
female director of sale. I was actually
(01:07:34)
relatively speaking a very young faculty
(01:07:36)
at that time and I wanted to do that
(01:07:40)
because I care about that community. I
(01:07:42)
didn't spend too much time thinking
(01:07:44)
about all the failure cases. Obviously,
(01:07:47)
I was very lucky that the more senior
(01:07:50)
faculty supported me, but I just wanted
(01:07:52)
to make a difference. And then going to
(01:07:55)
Google was similar. I wanted to work
(01:07:58)
with people like Jeff Dean, Jeff Hinton,
(01:08:02)
and um all these incredible Dennis, the
(01:08:06)
the incredible people. Um
(01:08:10)
I you know, so so the same with World
(01:08:13)
Labs. I I I have this passion and I also
(01:08:18)
believe that people with the same
(01:08:21)
mission can do incredible things. So
(01:08:23)
that's how it guided my through through
(01:08:26)
life. I don't overink
(01:08:29)
of all possible things that can go wrong
(01:08:32)
because that's too many.
(01:08:34)
>> I feel like that's an important element
(01:08:35)
of this is not focusing on the downside,
(01:08:38)
focusing more on the people, the
(01:08:40)
mission. What gets you excited? What do
(01:08:42)
you think? Uh I do yeah I do want to say
(01:08:45)
one thing to all the young talents in AI
(01:08:48)
the engineers the researchers out there
(01:08:50)
because some of you apply to world labs.
(01:08:53)
I I feel very privileged you considered
(01:08:55)
world labs. I do find many of the young
(01:08:58)
people today
(01:09:00)
think about every single
(01:09:05)
aspect of a equation when they decide on
(01:09:08)
jobs at some point. Maybe, you know,
(01:09:10)
maybe maybe that's the way they want to
(01:09:13)
do it. But sometimes I do want to
(01:09:14)
encourage young people to focus on
(01:09:17)
what's important because I find myself
(01:09:21)
um constantly in mentoring mode when I
(01:09:25)
talk to job job candidates. Not
(01:09:28)
necessarily recruiting or not
(01:09:29)
recruiting, but just in mentoring mode.
(01:09:32)
When I see an incredible young talent
(01:09:34)
who is overfocusing on every minute
(01:09:39)
dimension and aspect of considering a
(01:09:42)
job when
(01:09:45)
when maybe the most important thing is
(01:09:49)
where's your passion? Do you align with
(01:09:51)
the mission? Do you believe and have
(01:09:54)
faith in this team?
(01:09:56)
and and just just focus on the impact
(01:09:59)
and and you can make and the kind of
(01:10:02)
work and team you can you can work with.
(01:10:05)
>> Yeah, it's tough. It's tough for people
(01:10:06)
in the AI space now. There's so much so
(01:10:09)
much at them, so much news, so much
(01:10:10)
happening, so much FOMO.
(01:10:11)
>> That's true.
(01:10:12)
>> I could see the stress. And so, I think
(01:10:14)
that advice is really important. Just
(01:10:15)
like what will actually make you feel
(01:10:18)
fulfilled in what you're doing, not just
(01:10:19)
where's the fastest growing company?
(01:10:21)
Where's the who's going to win? I don't
(01:10:23)
know. I want to make sure I ask you
(01:10:25)
about the work you're doing today at
(01:10:26)
Stanford at the HCI. I think it's HAI
(01:10:30)
human centered AI institute.
(01:10:32)
>> What are you what are you doing there? I
(01:10:34)
know this is a thing you do on the site
(01:10:35)
still.
(01:10:37)
>> So yes, I HAI human center AI institute
(01:10:41)
was co-founded by me and a group of
(01:10:44)
faculty like uh professor John Hendy,
(01:10:47)
professor James Landy, um professor
(01:10:50)
Chris Manning back in 2018. I was
(01:10:54)
actually finishing my last the last
(01:10:56)
sabbatical at Google. Um and uh it was a
(01:11:01)
very very important decision for me
(01:11:04)
because I could have stayed in industry
(01:11:07)
but my time at Google taught me one
(01:11:10)
thing is AI is going to be a
(01:11:12)
civilizational technology and it it's it
(01:11:16)
dawned on me how important this is to
(01:11:18)
humanity to the point that I actually
(01:11:21)
wrote a piece in New York Times that
(01:11:23)
year 2018 to talk about the need for a
(01:11:28)
guiding framework to develop and to
(01:11:32)
to apply AI and that framework has to be
(01:11:35)
anchored in human benevolence is human
(01:11:38)
centerness and I felt that Stanford uh
(01:11:42)
one of the world's top university in the
(01:11:46)
heart of Silicon Valley that gave birth
(01:11:48)
to important companies from Nvidia to
(01:11:51)
Google uh should um be a thought leader
(01:11:57)
uh to create this human- centered AI
(01:12:00)
framework and to um to actually embody
(01:12:04)
that in our research education and
(01:12:07)
policy and in ecosystem work. So I
(01:12:11)
founded HAI it uh you know after uh fast
(01:12:15)
forward after six seven years it has
(01:12:18)
become the world's largest AI institute
(01:12:21)
that does human- centered um uh research
(01:12:26)
education uh ecosystem outreach and
(01:12:30)
policy uh in uh in uh impact. Uh it
(01:12:35)
involves hundreds of faculty across all
(01:12:39)
eight schools at Stanford from medicine
(01:12:42)
to education to sustainability to
(01:12:45)
business to engineering to humanities to
(01:12:48)
uh law and uh we we support researchers
(01:12:54)
especially at the interdisciplinary area
(01:12:57)
from digital economy to uh legal studies
(01:13:01)
to political science to discovery of new
(01:13:04)
drugs.
(01:13:05)
uh to to new algorithms to that's beyond
(01:13:09)
transformers. We also actually put a
(01:13:12)
very strong focus on um on policy
(01:13:16)
because when we started HAI I realized
(01:13:19)
that Silicon Valley did not talk to
(01:13:22)
Washington DC and or Brussels or other
(01:13:27)
parts of the world and it's re given how
(01:13:30)
important this this technology is we
(01:13:33)
need to bring everybody on board. So we
(01:13:36)
created multiple programs from
(01:13:38)
congressional boot camp to um AI index
(01:13:43)
report to policy briefing and we
(01:13:47)
especially
(01:13:49)
uh participated in policym including um
(01:13:53)
advocating for a u a national AI
(01:13:56)
research cloud bill that was passed in
(01:13:59)
the first Trump administration and
(01:14:02)
participate participating in state level
(01:14:05)
uh regulatory AI discussions. So there's
(01:14:09)
a lot we did and and I continue to be um
(01:14:13)
one of the the leaders even though I'm
(01:14:16)
much less involved operationally
(01:14:19)
because I care not only we create this
(01:14:22)
technology but we use it in the right
(01:14:24)
way.
(01:14:24)
>> Wow. I was not aware of all that other
(01:14:26)
work you were doing. Uh, as you were
(01:14:28)
talking, I was reminded Charlie Mer had
(01:14:31)
this quote, take a simple idea and take
(01:14:33)
it very seriously. I feel like you've
(01:14:36)
done that in so many different ways and
(01:14:38)
and stayed with it and it's unbelievable
(01:14:41)
the impact that you've had in so many
(01:14:42)
ways over the years. I'm going to skip
(01:14:45)
the lightning round and I'm just going
(01:14:46)
to ask you one last question. Is there
(01:14:48)
anything else that you wanted to share?
(01:14:50)
Anything else you want to leave
(01:14:51)
listeners with?
(01:14:52)
>> I I'm very excited by AI Lenny. Uh I
(01:14:56)
want to answer one question that I when
(01:14:59)
I travel around the world everybody asks
(01:15:02)
me is that if I'm a musician, if I'm a
(01:15:07)
teacher, middle school teacher, if I'm a
(01:15:10)
nurse, if I'm an accountant, if I'm a
(01:15:14)
farmer, do I have a role in AI or is AI
(01:15:18)
just going to take over my life or my
(01:15:20)
work? And I think this is the most
(01:15:24)
important question of AI. And I find
(01:15:27)
that in Silicon Valley, we tend not to
(01:15:31)
speak heart-to-heart with people with
(01:15:35)
people like us and and not like us in
(01:15:37)
Silicon Valley, but like all of us, we
(01:15:40)
tend to just toss around words like
(01:15:43)
infinite productivity or infinite
(01:15:47)
leisure time or or you know, infinite
(01:15:52)
power or whatever. But at the end of the
(01:15:55)
day, AI is about people. And when people
(01:15:58)
ask me that question, it's a resounding
(01:16:00)
yes. Everybody has a role in AI. It
(01:16:04)
depends on what what you do and what you
(01:16:07)
want. But no technology should take away
(01:16:10)
human dignity and the human dignity and
(01:16:14)
agency should be at the heart of the
(01:16:17)
development, the deployment as well as
(01:16:20)
the governance of every technology. So
(01:16:24)
if you are a young artist
(01:16:27)
and your passion is storytelling,
(01:16:31)
uh, embrace AI as a tool. In fact,
(01:16:34)
embrace Marvel. I hope it becomes a tool
(01:16:36)
for you. Um, because the way you tell
(01:16:40)
your story is unique and this the world
(01:16:43)
still needs it. But how you tell your
(01:16:46)
story, how do you use the most
(01:16:49)
incredible tool to tell your story in
(01:16:52)
the most unique way is important and
(01:16:55)
that that voice needs to be heard. If
(01:16:58)
you're a farmer near retirement, AI
(01:17:02)
still matters because you're a citizen.
(01:17:06)
You can participate in your community.
(01:17:08)
You should have a voice in how AI is
(01:17:11)
used, how AI is applied. you you work
(01:17:15)
with people that you can you know
(01:17:18)
encourage all of all of you to use AI uh
(01:17:22)
to make life easier for you. If you're a
(01:17:26)
nurse, I hope you know that at least in
(01:17:29)
my uh career, I have worked so much in
(01:17:34)
healthc care research because I feel our
(01:17:36)
health care workers should be greatly
(01:17:40)
augmented and helped by AI technology
(01:17:43)
whether it's smart cameras to feed more
(01:17:47)
uh in information or robotic assistance
(01:17:50)
because our nurses are overworked, over
(01:17:54)
fatigued And as our society ages, we
(01:17:58)
need more help for for people to be
(01:18:00)
taken care of. So AI can play that role.
(01:18:03)
So I just want to say that it's so
(01:18:06)
important that um even a technologist
(01:18:10)
like me um are sincere about that
(01:18:15)
everybody has a role in AI.
(01:18:17)
>> What a beautiful way to end it. Such a
(01:18:19)
tie back to where we started about how
(01:18:21)
it's up to us and take individual
(01:18:24)
responsibility for what AI will do in
(01:18:26)
our lives. Final question, where can
(01:18:28)
folks find Marble? Where can they go?
(01:18:30)
Maybe uh try to join uh World Labs if
(01:18:32)
they want to. What's the website? Where
(01:18:34)
do people go?
(01:18:35)
>> Well, World Labs website is
(01:18:38)
www.worldlabs.ai
(01:18:41)
and you can find um you can find our
(01:18:45)
research progress there. We we have
(01:18:47)
technical blogs. You can find Marble the
(01:18:50)
product there. You can sign in there.
(01:18:52)
You can find our job posts uh link
(01:18:55)
there. You can uh you know, we're in San
(01:18:58)
Francisco. We love to work with the
(01:19:00)
world's best talents.
(01:19:02)
>> Amazing. Fay, thank you so much for
(01:19:04)
being here.
(01:19:05)
>> Thank you, Lenny.
(01:19:06)
>> Bye, everyone.
(01:19:10)
Thank you so much for listening. If you
(01:19:11)
found this valuable, you can subscribe
(01:19:13)
to the show on Apple Podcasts, Spotify,
(01:19:15)
or your favorite podcast app. Also,
(01:19:18)
please consider giving us a rating or
(01:19:20)
leaving a review as that really helps
(01:19:22)
other listeners find the podcast. You
(01:19:24)
can find all past episodes or learn more
(01:19:26)
about the show at lennispodcast.com.
(01:19:29)
See you in the next episode.
