↔
Title: Ex-OpenAI Scientist’s DISTURBING Warning: “It’s Coming In 2026”
Duration: 00:17:44
Total Correct Answers:
Current Caption
Correct
Learning Modes
YouTube Video Transcript Hide
Ask AI:
Export as:
Ask AI Result
The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here
(00:00:00)
I do maintain here is something which I
(00:00:03)
predict will happen. That's a
(00:00:04)
prediction.
(00:00:06)
I maintain
(00:00:08)
that as AI becomes more powerful
(00:00:12)
then people will change their behaviors
(00:00:17)
and we will see all kinds of
(00:00:19)
unprecedented things which are not
(00:00:21)
happening right now. So Ilia Sudskver
(00:00:24)
just gave a pretty controversial
(00:00:26)
interview. It's spreading like wildfire
(00:00:28)
on the internet. In this interview, he
(00:00:30)
said he's found the missing piece which
(00:00:33)
we needed to accomplish AGI. Let's watch
(00:00:35)
the interview and I'll dissect
(00:00:37)
everything as we go. The thing so so
(00:00:40)
here here is a perspective. Here's a
(00:00:42)
perspective I think might be might be
(00:00:44)
true.
(00:00:46)
So
(00:00:47)
the way ML used to work is that people
(00:00:50)
would just think of it with stuff and
(00:00:52)
try to
(00:00:56)
and try to get interesting results.
(00:00:57)
That's what's been going on in the past.
(00:01:01)
Then
(00:01:03)
the scaling insight arrived, right?
(00:01:06)
Scaling laws, GPT3.
(00:01:09)
And suddenly everyone realized we should
(00:01:12)
scale.
(00:01:14)
And it's just this this is an example of
(00:01:17)
how language affects thought.
(00:01:21)
Scaling is what? Just one word, but it's
(00:01:24)
such a powerful word because it informs
(00:01:26)
people what to do. do they say okay
(00:01:27)
let's let's try to scale things and so
(00:01:29)
you say okay so what are we scaling and
(00:01:32)
pre-training was a thing to scale it was
(00:01:34)
a particular scaling recipe
(00:01:37)
>> yes
(00:01:37)
>> the big breakthrough of pre-training is
(00:01:40)
the realization that this recipe is good
(00:01:43)
so you say hey if you mix some compute
(00:01:48)
with some data into a neural net of a
(00:01:50)
certain size you will get results and
(00:01:53)
you will know that it will be better if
(00:01:55)
you just scale the recipe up. And this
(00:01:57)
is also great. Companies love this
(00:01:59)
because it gives you a very uh lowrisk
(00:02:03)
way of investing
(00:02:06)
>> your resources.
(00:02:07)
>> Yeah.
(00:02:07)
>> Right. It's much harder to invest your
(00:02:09)
resources in research. Compare that. You
(00:02:12)
know, if you research, you need to have
(00:02:14)
like go forth researchers and research
(00:02:16)
and come up with something versus get
(00:02:19)
more data, get more compute. You know,
(00:02:21)
you'll get something from pre-training.
(00:02:24)
And indeed, you know, it looks like I
(00:02:26)
based on various um
(00:02:29)
um things people say on some people say
(00:02:32)
on Twitter, maybe it appears that Gemini
(00:02:34)
have found a way to get more out of
(00:02:36)
pre-training. At some point though,
(00:02:38)
pre-training will run out of data. The
(00:02:39)
data is very clearly finite. And so
(00:02:41)
then, okay, what do you do next? Either
(00:02:43)
you do some kind of a souped-up
(00:02:45)
pre-training, different recipe from the
(00:02:47)
one we've done before, or you're doing
(00:02:49)
RL or maybe something else. But now that
(00:02:52)
compute is big, computer is now very
(00:02:54)
big. In some sense, we are back to the
(00:02:56)
age of research. So maybe here's another
(00:02:58)
way to put it. Up until 2020, from 2015,
(00:03:01)
from 20 2012 to 2020, it was the age of
(00:03:05)
research.
(00:03:06)
Now from 2020 to 2025, it was the age of
(00:03:09)
scaling or maybe plus minus. Let's add
(00:03:12)
arrow bars to those years because people
(00:03:13)
say this is amazing. You got to scale
(00:03:15)
more. Keep scaling. The one word
(00:03:17)
scaling. But now the scale is so big.
(00:03:20)
Like is is it is the belief really that
(00:03:23)
oh it's so big but if you had 100x more
(00:03:26)
everything would be so different. Like
(00:03:29)
it would be different for sure but like
(00:03:31)
is the belief that if you just 100x the
(00:03:34)
scale everything would be transformed.
(00:03:37)
I don't think that's true. So it's back
(00:03:39)
to the age of research again just with
(00:03:41)
big computers.
(00:03:42)
>> Next Ilia talks about how AGI is going
(00:03:44)
to impact us humans and how it's going
(00:03:46)
to replace us all. Okay. I I I I see. So
(00:03:50)
you're you're suggesting
(00:03:52)
that the thing you're pointing out with
(00:03:54)
super intelligence
(00:03:56)
is not some finished
(00:04:00)
mind which knows how to do every single
(00:04:02)
job in the economy because the way say
(00:04:05)
the original I think open AAI charter or
(00:04:07)
whatever defines AGI is like it can do
(00:04:09)
every single job that a every single
(00:04:11)
thing a human can do. You're proposing
(00:04:13)
instead a mind which can learn to do any
(00:04:17)
single every single job.
(00:04:18)
>> Yes.
(00:04:19)
>> And that is super intelligence. And then
(00:04:21)
but once you have the learning
(00:04:22)
algorithm,
(00:04:24)
>> it gets deployed into the world the same
(00:04:26)
way a human laborer might join an
(00:04:28)
organization.
(00:04:30)
>> And it seems like one of these two
(00:04:32)
things might happen. Maybe neither of
(00:04:33)
these happens. one, this super efficient
(00:04:38)
learning algorithm
(00:04:40)
becomes superhuman, becomes as good as
(00:04:43)
you and potentially even better at the
(00:04:45)
task of ML research and as a result the
(00:04:50)
algorithm itself becomes more and more
(00:04:51)
superhuman. The other is even if that
(00:04:54)
doesn't happen. If you have a single
(00:04:56)
model, I mean this this is explicitly
(00:04:58)
your vision. If you have a single model
(00:04:59)
or instances of a model which are
(00:05:02)
deployed through the economy doing
(00:05:04)
different jobs, learning how to do those
(00:05:05)
jobs, continually learning on the job,
(00:05:08)
picking up all the skills that any human
(00:05:10)
could pick up but actually picking them
(00:05:11)
all up at the same time and then
(00:05:12)
amalgamating the learnings.
(00:05:15)
You basically have a model which
(00:05:16)
functionally becomes super intelligent
(00:05:19)
even without any sort of recursive
(00:05:20)
self-improvement in software right
(00:05:23)
because you now have one model that can
(00:05:25)
do every single job in the economy and
(00:05:27)
humans can't merge our minds in the same
(00:05:28)
way and so do you expect some sort of
(00:05:30)
like intelligence explosion from broad
(00:05:32)
deployment
(00:05:33)
>> I think that it is likely that we will
(00:05:38)
have rapid economic growth
(00:05:42)
I think the broad deployment
(00:05:45)
Like there are two arguments you could
(00:05:48)
make which are conflicting.
(00:05:51)
One is that look if indeed you get once
(00:05:54)
indeed you get to a point where you have
(00:05:58)
an AI that can learn to do
(00:06:02)
things quickly
(00:06:04)
and you have many of them then they will
(00:06:06)
then there will be a strong force to
(00:06:10)
deploy them in the economy. Unless there
(00:06:12)
will be some kind of a regulation that
(00:06:14)
stops it, which by the way there might
(00:06:16)
be. But I think the idea of very rapid
(00:06:22)
economic growth for some time, I think
(00:06:24)
it's very possible from broad
(00:06:25)
deployment. The other question is how
(00:06:27)
rapid it's going to be.
(00:06:30)
So I think this is hard to know because
(00:06:32)
on the one hand you have this very
(00:06:34)
efficient worker. on the other hand
(00:06:36)
there is the world is just really big
(00:06:38)
and there's a lot of stuff
(00:06:41)
and that stuff moves at a different
(00:06:43)
speed but then on the other hand now the
(00:06:44)
AI could you know so I think very rapid
(00:06:48)
economic growth is possible and we will
(00:06:49)
see like all kinds of things like
(00:06:52)
different countries with different rules
(00:06:54)
and the ones which have the friendlier
(00:06:55)
rules the economic growth will be faster
(00:06:58)
hard to predict
(00:06:58)
>> okay now in this next part Ilia predicts
(00:07:00)
how AGI is going to change everything we
(00:07:03)
know about our society how governments
(00:07:05)
are going to change and how human
(00:07:07)
behavior is going to shift as AGI comes
(00:07:09)
in.
(00:07:10)
>> And I maintain that I think I think most
(00:07:12)
people who work on AI also can't imagine
(00:07:15)
it because it's too different from what
(00:07:18)
people see on a day-to-day basis.
(00:07:22)
I do maintain here is something which I
(00:07:25)
predict will happen. That's a
(00:07:26)
prediction.
(00:07:28)
I maintain
(00:07:30)
that as AI becomes more powerful
(00:07:34)
then people will change their behaviors
(00:07:39)
and we will see all kinds of
(00:07:41)
unprecedented things which are not
(00:07:43)
happening right now and I'll give some
(00:07:46)
examples. I do like I I think I think
(00:07:49)
for better or worse the the frontier
(00:07:52)
companies will play a very important
(00:07:53)
role in what happens as will the
(00:07:55)
government and the kind of things that I
(00:07:57)
think we'll see which you see the
(00:08:00)
beginnings of
(00:08:02)
companies that are fierce competitors
(00:08:05)
starting collaborate to to collaborate
(00:08:07)
on AI safety you may have seen open AI
(00:08:11)
and anthropic event doing a first small
(00:08:14)
step but that did not exist That's
(00:08:16)
actually something which I predicted in
(00:08:18)
one of my talks about three years ago
(00:08:21)
that such a thing will happen. I also
(00:08:23)
maintain that as AI continues to become
(00:08:25)
more powerful, more visibly powerful,
(00:08:29)
there will also be a desire from
(00:08:32)
governments and the public to do
(00:08:34)
something
(00:08:36)
and I think that this is a very
(00:08:37)
important force
(00:08:40)
of showing the AI. That's number one.
(00:08:43)
Number two, okay, so then the AI is
(00:08:45)
being built. what needs to what needs to
(00:08:46)
be done.
(00:08:49)
So one thing that I maintain that will
(00:08:51)
happen is that right now people who are
(00:08:53)
working on AI I maintain that the AI
(00:08:57)
doesn't feel powerful because of its
(00:08:58)
mistakes.
(00:09:00)
I do think that at some point the AI
(00:09:02)
will start to feel powerful actually and
(00:09:04)
I think when that happens we will see a
(00:09:06)
big change in the way
(00:09:09)
all AI companies approach safety.
(00:09:13)
they'll become much more paranoid. I
(00:09:15)
think I I say this as a predict as a as
(00:09:18)
a as a prediction that we will see
(00:09:19)
happen. We'll see if I'm right, but I
(00:09:22)
think this is something that will happen
(00:09:23)
because they will see the AI becoming
(00:09:25)
more powerful. Everything that's
(00:09:27)
happening right now, I maintain is
(00:09:29)
because people look at today's AI and
(00:09:32)
it's hard to imagine the future AI.
(00:09:35)
And there is a third thing which needs
(00:09:37)
to happen. And I think this is this this
(00:09:40)
and I'm talking about it in in broader
(00:09:42)
terms not just from the perspective of
(00:09:44)
SSI
(00:09:46)
because you ask me about our company but
(00:09:48)
the question is okay so then what should
(00:09:49)
what should the companies aspire to
(00:09:51)
build
(00:09:52)
>> what should they aspire to build and
(00:09:54)
there has been one big idea that
(00:09:56)
actually every that um everyone has been
(00:09:58)
locked in locked into which is the the
(00:10:00)
self-improving AI
(00:10:03)
and why why did it happen because there
(00:10:05)
is fewer ideas than companies
(00:10:08)
But I maintain that there is something
(00:10:10)
that's better to build and I think that
(00:10:13)
everyone will actually want that. It's
(00:10:15)
like the AI that's robustly aligned to
(00:10:20)
care about sentient life specifically.
(00:10:23)
I think in particular it will be there's
(00:10:26)
a case to be made that it will be easier
(00:10:28)
to build an AI that cares about sentient
(00:10:30)
life than an AI that cares about human
(00:10:33)
life alone because the AI itself will be
(00:10:36)
sentient.
(00:10:38)
And if you think about things like
(00:10:39)
mirror neurons and human empathy for
(00:10:41)
animals, which is, you know, you might
(00:10:43)
argue it's not big enough, but it
(00:10:46)
exists. I think it's an emergent
(00:10:48)
property from the fact that we model
(00:10:50)
others with the same circuit that we
(00:10:53)
used to model ourselves because that's
(00:10:55)
the most efficient thing to do.
(00:10:56)
>> Now, in the next part, Ilia explains why
(00:10:59)
the age of scaling is over and why we're
(00:11:01)
back into the research phase. Pretty
(00:11:03)
interesting take. I am curious if you
(00:11:04)
say we are back in an era of research.
(00:11:08)
You were there from 2012 to 2020
(00:11:11)
and do do you have Yeah. What what is
(00:11:14)
now the vibe going to be if we go back
(00:11:16)
to the era of research?
(00:11:18)
>> So one consequence of um the age of
(00:11:22)
scaling is that there was this
(00:11:26)
um scaling sucked out all the air in the
(00:11:29)
room.
(00:11:29)
>> Yeah.
(00:11:31)
And so
(00:11:33)
because scaling sucked out all the air
(00:11:34)
in the room,
(00:11:36)
everyone started to do the same thing.
(00:11:39)
We got to the point where
(00:11:42)
uh we are in a world where there are
(00:11:45)
more companies than ideas by quite a
(00:11:47)
bit.
(00:11:48)
>> Actually on that you know there is this
(00:11:50)
Silicon Valley saying that says that
(00:11:54)
ideas are cheap, execution is everything
(00:11:58)
and people say that a lot.
(00:11:59)
>> Yeah. And there is truth to that. But
(00:12:01)
then I saw I saw someone say on Twitter
(00:12:04)
um something like if ideas are are so
(00:12:07)
cheap, how come no one's having any
(00:12:09)
ideas?
(00:12:10)
>> And I think it's true too. I think like
(00:12:14)
if you think about um research progress
(00:12:17)
in terms of bottlenecks,
(00:12:20)
there are several bottlenecks. If you go
(00:12:22)
back to the if if you and um one of them
(00:12:24)
is ideas and one of them is your ability
(00:12:27)
to bring them to life.
(00:12:28)
>> Yeah. which might be compute but also
(00:12:30)
engineering.
(00:12:31)
So if you go back to the '9s let's say
(00:12:34)
you had people who had had pretty good
(00:12:35)
ideas and if they had much larger
(00:12:38)
computers maybe they could demonstrate
(00:12:39)
that their ideas were viable but they
(00:12:41)
could not. So they could only have very
(00:12:43)
very small demonstration and did not
(00:12:45)
convince anyone.
(00:12:46)
>> Yeah.
(00:12:47)
>> So the bottleneck was compute. Then in
(00:12:50)
the age of scaling computers increased a
(00:12:53)
lot and of course there is a question of
(00:12:56)
how much comput is needed but compute is
(00:12:59)
large so compute is large enough such
(00:13:04)
that
(00:13:05)
it's like not obvious that you need that
(00:13:08)
much more compute to prove some idea
(00:13:12)
like I'll give you an analogy. Alexet
(00:13:15)
was built on two GPUs. That was the
(00:13:18)
total amount of comput use for it. The
(00:13:20)
transformer
(00:13:22)
was built on 8 to 64 GPUs. No single
(00:13:26)
transformer paper experiment used more
(00:13:28)
than 64 GPUs of 2017, which would be
(00:13:32)
like what two GPUs of today.
(00:13:34)
So the ResNet,
(00:13:37)
right? many like even even the the um
(00:13:40)
you could argue that the like 01
(00:13:43)
reasoning was not the most comput heavy
(00:13:46)
thing in the world. So there definitely
(00:13:50)
for for research
(00:13:53)
you need like definitely some amount of
(00:13:55)
compute but it's far from obvious that
(00:13:57)
you need the absolutely largest amount
(00:13:59)
of compute ever for research. M
(00:14:02)
>> you might argue and I think it is true
(00:14:04)
that if you want to build the absolutely
(00:14:06)
best system, if you want to build the
(00:14:09)
absolutely best system, then it helps to
(00:14:12)
have much more compute and especially if
(00:14:14)
everyone is within the same paradigm,
(00:14:16)
then compute becomes one of the big
(00:14:19)
differentiators. Okay, now this next
(00:14:21)
part is pretty interesting. Ilia talks
(00:14:24)
about AGI and gives a pretty solid
(00:14:26)
understanding of the AGI architecture
(00:14:28)
and how AGI compares to a human mind.
(00:14:31)
>> This will be two words, two words that
(00:14:33)
have shaped everyone's thinking I
(00:14:36)
maintain.
(00:14:37)
F first word AGI
(00:14:41)
second word pre-training. Let me
(00:14:43)
explain.
(00:14:45)
So the word the term AGI
(00:14:48)
why does this term exist? It's a very
(00:14:51)
particular term. Why does it exist?
(00:14:53)
There's a reason. The reason that the
(00:14:56)
term AGI exists is in my opinion not so
(00:15:00)
much because it's like a very important
(00:15:02)
essential descriptor of of some end
(00:15:05)
state of intelligence, but
(00:15:10)
because it is a reaction to a different
(00:15:14)
term that existed and the term is narrow
(00:15:16)
AI. If you go back to ancient history of
(00:15:20)
gameplay AI, of checkers AI, chess AI,
(00:15:24)
computer games AI, everyone would say,
(00:15:26)
look at this narrow intelligence. Sure,
(00:15:28)
the chess AI can beat Casper off, but it
(00:15:30)
can't do anything else. It is so narrow,
(00:15:33)
artificial narrow intelligence. So in
(00:15:36)
response, as a reaction to this, some
(00:15:38)
people said, well, this is not good. It
(00:15:42)
is so narrow. What we need is general
(00:15:44)
AI.
(00:15:46)
general AI, an AI that can just do all
(00:15:48)
the things.
(00:15:51)
The second and and that term just got a
(00:15:55)
lot of traction.
(00:15:56)
>> Yeah.
(00:15:57)
>> The second thing that got a lot of
(00:15:59)
traction is pre-training.
(00:16:02)
Specifically, the recipe of
(00:16:03)
pre-training. I think the current the
(00:16:05)
way people do RL now is maybe um un is
(00:16:09)
undoing the conceptual imprint of
(00:16:12)
pre-training. But pre-training had the
(00:16:14)
property. you do more pre-training and
(00:16:17)
the model gets better at everything more
(00:16:19)
or less uniformly. Yeah,
(00:16:22)
>> general AI pre-training gives AGI
(00:16:28)
but
(00:16:30)
the thing that happened with AGI and
(00:16:33)
pre-training is that in some sense they
(00:16:34)
overshoke the target
(00:16:37)
because by the kind if you think about
(00:16:39)
the term AGI you will realize and
(00:16:42)
especially in the context of
(00:16:43)
pre-training you will realize that a
(00:16:45)
human being is not an AGI
(00:16:48)
because a human being Yes, there is
(00:16:51)
definitely a foundation of skills.
(00:16:54)
A human being,
(00:16:57)
a human being lacks a huge amount of
(00:16:59)
knowledge. Instead, we rely on continual
(00:17:02)
learning. We rely on continual learning.
(00:17:05)
And so then when you think about okay,
(00:17:07)
so let's suppose that we achieve success
(00:17:09)
and we produce a safe super some kind of
(00:17:11)
safe super intelligence. The question is
(00:17:14)
but how do you define it? Where on the
(00:17:16)
curve of continual learning is it going
(00:17:17)
to be? I produce like um a super
(00:17:20)
intelligent 15 year old that's very
(00:17:22)
eager to go and you say okay I'm going
(00:17:24)
to they don't know very much at all the
(00:17:26)
great student very eager you go and be a
(00:17:29)
programmer you go and be a doctor
(00:17:32)
go and learn so you could imagine that
(00:17:34)
the deployment itself will involve some
(00:17:36)
kind of a learning trial and error
(00:17:38)
period it's a process as opposed to you
(00:17:42)
drop the finished
