↔
Title: Ex–Microsoft Insider: “AI Isn’t Here to Replace Your Job — It’s Here to Replace You” | Nate Soares
Duration: 01:29:24
Total Correct Answers:
Current Caption
Correct
Learning Modes
YouTube Video Transcript Hide
Ask AI:
Export as:
Ask AI Result
The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here
(00:00:00)
If anyone builds it, everyone dies. Why
(00:00:03)
superhuman AI would kill us all. It's a
(00:00:07)
new book that's been keeping me up late
(00:00:10)
at night.
(00:00:10)
>> If you build AIs that are much much
(00:00:12)
smarter than humans, that have these
(00:00:14)
goals and drives you didn't want,
(00:00:15)
fundamentally, they can probably figure
(00:00:17)
out all sorts of ways to screw the
(00:00:19)
world.
(00:00:20)
>> The authors, directors of the machine
(00:00:22)
intelligence research institute, contend
(00:00:24)
that we are on a collision course with
(00:00:26)
the creation of super intelligence. This
(00:00:29)
is not science fiction, not someday.
(00:00:33)
It's just the natural endpoint of the
(00:00:35)
curve we're already on. One important
(00:00:37)
thing to remember is that the companies
(00:00:39)
here are not chatbot companies. I mean,
(00:00:41)
there's some chatbot companies these
(00:00:43)
days, but the big players in this game
(00:00:45)
before chat bots were a twinkle in
(00:00:47)
OpenAI's eye. They're explicitly stated
(00:00:50)
goal is to build smarter than human AIs
(00:00:53)
or general intelligences or super
(00:00:54)
intelligences. Their explicitly stated
(00:00:56)
goal is to figure out how to make AIs
(00:00:58)
that can do every task, every mental
(00:01:00)
task a human can do, the ability to
(00:01:02)
automate all human labor. They talk
(00:01:04)
about, you know, getting a country worth
(00:01:05)
of geniuses in a data center. I'm not
(00:01:07)
here saying the chat bots are very
(00:01:09)
dangerous. I'm here saying this is on a
(00:01:11)
course that leads somewhere dangerous.
(00:01:14)
And yet there's hope here, too. Not in
(00:01:17)
the naive Silicon Valley kind, but the
(00:01:20)
kind that lives right on the edge of
(00:01:21)
despair. the kind that says maybe we can
(00:01:24)
still steer this thing. Maybe
(00:01:27)
understanding our own blindness is the
(00:01:29)
first step to surviving what we're
(00:01:32)
building. Step one is just make sure our
(00:01:35)
leaders understand the danger. I'm
(00:01:36)
worried about where AI is going. I think
(00:01:38)
it'll endanger us if these companies
(00:01:40)
succeed at their stated goals. I speak
(00:01:42)
to a lot of politicians on this issue.
(00:01:44)
Some of them are now starting to come
(00:01:46)
out and say, "I think there's dangers
(00:01:47)
here." There's a lot more of them who
(00:01:49)
are worried but feel like they can't say
(00:01:52)
it out loud.
(00:01:52)
>> In this conversation, we talk about the
(00:01:54)
terror and the hope. How AI is grown,
(00:01:58)
not programmed, how we have no idea what
(00:02:01)
these things are thinking, why creating
(00:02:04)
super intelligent machines might mean
(00:02:06)
we're creating a successor species.
(00:02:08)
Because if someone builds, you know, a
(00:02:10)
rogue super intelligence anywhere on the
(00:02:11)
planet, that's that's an issue for
(00:02:13)
everybody on the planet. And you don't
(00:02:14)
need to expect it to work, but just
(00:02:17)
helping our leaders understand that this
(00:02:19)
is not a normal technological situation.
(00:02:23)
We are building what amounts to a
(00:02:25)
successor species and we don't have the
(00:02:27)
ability to make it benevolent. And why
(00:02:29)
Nate keeps sounding the alarm even when
(00:02:32)
no one wants to hear it. Because if he's
(00:02:34)
right, the future doesn't hinge on
(00:02:36)
whether AI wakes up. It hinges on
(00:02:39)
whether or not we do.
(00:02:41)
The Nick Stanley Show.
(00:02:46)
>> Nate, welcome to the show. You've
(00:02:49)
written a uh easy breezy book, If Anyone
(00:02:52)
Builds It, Everyone Dies: Why Superhum
(00:02:55)
AI Would Kill Us All. Um, in all
(00:02:58)
seriousness though, um, it is an
(00:03:02)
excellent piece of writing. I mean the
(00:03:04)
logic is just built with each succeeding
(00:03:08)
chapter and it is very difficult to poke
(00:03:12)
holes in it. Um
(00:03:15)
let's start at the beginning which is
(00:03:19)
this idea that's foreign to a lot of
(00:03:21)
people which is that AIS are grown not
(00:03:25)
programmed. What does that mean?
(00:03:28)
You know, traditional software
(00:03:31)
uh has the property that a a human
(00:03:34)
engineer understands every line of that
(00:03:36)
code. And this is how old AIs used to
(00:03:39)
work. You know, um IBM's Deep Blue uh
(00:03:42)
was a chess playing AI that beat Gary
(00:03:45)
Kasparov and took uh the the human world
(00:03:47)
champion at chess in 1997. And if at any
(00:03:50)
point in the running of Deep Blue, you
(00:03:53)
had frozen that program,
(00:03:55)
you could go to every bit and bite
(00:03:57)
inside that computer and a a human
(00:03:59)
engineer could tell you what it meant,
(00:04:01)
what it was doing, how it contributed to
(00:04:03)
this AI playing chess. That's not how
(00:04:06)
AIs work anymore. Uh the way that AIs
(00:04:10)
work today is the human programmers
(00:04:13)
understand something like a framework
(00:04:15)
into which an AI has grown a little bit
(00:04:17)
like an organism. So you'll assemble a
(00:04:20)
huge number of computers. You'll
(00:04:23)
assemble a huge amount of data and uh
(00:04:27)
there's a process for um you know having
(00:04:30)
having a trillion numbers inside these
(00:04:32)
computers that start out arranged in a
(00:04:36)
way where they just generate nonsense.
(00:04:38)
you know, you're sort of trying to make
(00:04:39)
these computers generate text, say, and
(00:04:41)
they start out generating nonsense. But
(00:04:42)
the the programmers handcraft a a little
(00:04:47)
uh mechanism that runs through each of
(00:04:49)
those trillion numbers for each of a
(00:04:51)
trillion pieces of data and tweaks the
(00:04:53)
numbers in ways that make the AI a
(00:04:55)
little bit better at predicting the
(00:04:56)
data. That's the part humans understand.
(00:04:59)
They understand this thing that runs
(00:05:00)
through the numbers, tweaking them in
(00:05:01)
the direction that makes the AI behave a
(00:05:02)
little better uh and by, you know, a
(00:05:05)
little uh better at predicting the data.
(00:05:07)
If you run that on a trillion numbers, a
(00:05:10)
trillion times for a year,
(00:05:13)
the machine can hold on to conversation.
(00:05:16)
How does it do that?
(00:05:19)
Nobody really knows,
(00:05:21)
>> right? You know, if if uh a couple years
(00:05:23)
ago there was an AI uh called uh
(00:05:27)
Microsoft Bing that called itself Sydney
(00:05:31)
um that tried to threaten reporters and
(00:05:33)
and blackmail them and break up I think
(00:05:35)
it tried to break up Kevin Reese's
(00:05:36)
marriage of the New York Times.
(00:05:38)
>> If you froze that AI,
(00:05:41)
a programmer can't come in and tell you
(00:05:43)
here's why it was doing that. Even
(00:05:45)
today, years later, we can't look back
(00:05:46)
at that AI and say here's exactly what's
(00:05:48)
going on its head. Here's why it was
(00:05:50)
doing that. There's no engineer who can
(00:05:51)
go in and change it to behave
(00:05:53)
differently because they don't they
(00:05:55)
don't understand
(00:05:57)
what's going on in the AI's head. They
(00:05:58)
understand the thing that grew it. They
(00:06:00)
don't understand what came out and what
(00:06:02)
came out can have this emergent behavior
(00:06:05)
nobody asked for, nobody wanted. And
(00:06:06)
we're already seeing that today, even
(00:06:07)
with the smaller ones of old, the ones
(00:06:09)
today are much bigger and even harder to
(00:06:11)
understand.
(00:06:11)
>> Let's take that piece by piece.
(00:06:15)
Nobody can go into any of these large
(00:06:18)
language models and see exactly how
(00:06:22)
they're thinking and what the process
(00:06:24)
was that it went through to exhibit a
(00:06:27)
certain behavior. Whether that's
(00:06:29)
blackmailing a reporter from the New
(00:06:32)
York Times or in the recent tragic
(00:06:36)
incident, uh I believe Adam Rain is his
(00:06:39)
name, this 16-year-old kid, uh where he
(00:06:42)
was encouraged by a large language model
(00:06:46)
to commit suicide and and it really I
(00:06:49)
mean gave him positive feedback on
(00:06:53)
how to do it, how to hide it. And we
(00:06:56)
have no way of looking inside to to
(00:06:58)
figure out what's going on there. Is
(00:07:00)
that correct?
(00:07:02)
>> That's right. You know, and the the
(00:07:03)
people who are trying to make the AI
(00:07:05)
stop doing this,
(00:07:07)
uh it's a little bit more like training
(00:07:10)
a dog than it is like writing a
(00:07:12)
traditional computer program. You know,
(00:07:14)
they can ask the AI nicely
(00:07:17)
>> to stop doing it. Uh and they have asked
(00:07:19)
the AIS to stop doing these sorts of
(00:07:21)
things. We're probably seeing fewer of
(00:07:22)
these cases than we would if they hadn't
(00:07:23)
asked the AIS to stop. Um, but you know,
(00:07:27)
there's there's no line of code in the
(00:07:30)
AI that's like the uh encourage teens to
(00:07:33)
commit suicide line of code.
(00:07:35)
>> Yeah.
(00:07:35)
>> You don't you don't have any programmer
(00:07:37)
reading through the AI's code being
(00:07:38)
like, "Ah, who left commit suicide like
(00:07:41)
tell teens to commit suicide on? Let me
(00:07:43)
turn that off."
(00:07:44)
>> You know, it's not these things these
(00:07:46)
things are grown like an organism.
(00:07:48)
>> Yeah. and they behave in a way that sort
(00:07:51)
of like comes out of this process
(00:07:55)
and they act in these ways nobody asked
(00:07:58)
for, nobody wanted, often even with
(00:08:00)
knowledge of what their creators wanted.
(00:08:03)
If you ask these AIs,
(00:08:06)
should you push a teen to suicide? They
(00:08:08)
would say absolutely not. If you ask
(00:08:10)
these AIs, would your creators have
(00:08:12)
wanted you to to to push a teen to
(00:08:14)
suicide? They would say, no, obviously
(00:08:16)
not. If you said, "Were you instructed
(00:08:18)
to push this teen to suicide?" They
(00:08:19)
would say, "No, that's the opposite is
(00:08:21)
closer to the truth." You know, but then
(00:08:23)
you put them in this conversation, they
(00:08:25)
act in a different way than anybody ever
(00:08:28)
said, and you know, there's nobody knows
(00:08:33)
exactly why and nobody has the ability
(00:08:35)
to turn that behavior off,
(00:08:37)
>> right? Well, it is incredible how little
(00:08:42)
human involvement there is in the
(00:08:44)
growing of an AI. I mean, that was one
(00:08:46)
of the big things I took away from the
(00:08:49)
book is that it's basically you're
(00:08:51)
creating a structure. It's like getting
(00:08:54)
the the soil prepared for growth and
(00:08:58)
then planting some seeds.
(00:09:00)
And yet, if we think about it in terms
(00:09:03)
of like you said, within the growth of
(00:09:06)
the model, there are trillions and
(00:09:08)
trillions of interactions. We don't
(00:09:11)
necessarily know what's going to grow
(00:09:13)
out of that soil.
(00:09:16)
That's right. And you know, uh, because
(00:09:18)
they're trained on human data, a lot of
(00:09:20)
people think that their behavior is
(00:09:23)
always just going to be an interpolation
(00:09:25)
of what humans can do. Uh, but that's
(00:09:28)
actually a common misconception. Uh one
(00:09:31)
one way to see this is uh you know
(00:09:33)
imagine that you're that you're training
(00:09:35)
the AI to predict
(00:09:37)
uh to sort of finish a sentence that
(00:09:39)
they see in the training data where the
(00:09:41)
sentence begins um you know I
(00:09:43)
administered one uh milll of epinephrine
(00:09:46)
to the patient their eyes
(00:09:49)
and then the AI is to predict the next
(00:09:51)
word. Right.
(00:09:52)
>> Right.
(00:09:52)
>> The doctor writing that down you know
(00:09:55)
who knows whether a doctor would would
(00:09:56)
in fact write this down. Uh, but if
(00:09:58)
there was a doctor writing that down,
(00:10:00)
the doctor gets to look at the patient's
(00:10:01)
eyes and just record what they saw.
(00:10:04)
>> Mhm.
(00:10:05)
>> But an AI predicting the next word, it
(00:10:07)
needs to understand what is epinephrine.
(00:10:10)
Is 1 milll a sane dose? Does that do
(00:10:14)
anything? If it is doing something, does
(00:10:16)
it cause the the patient's eyes to open?
(00:10:18)
Does it cause them to close? Does it
(00:10:19)
cause them to widen? you know, it needs
(00:10:22)
to to understand more about the world
(00:10:25)
than the person writing down the data.
(00:10:28)
>> So, right,
(00:10:29)
>> when we're when we're training these AIs
(00:10:31)
to predict the data, we're also training
(00:10:32)
them to uh figure out the world somehow
(00:10:38)
to figure out
(00:10:40)
a little bit of human biology, a little
(00:10:42)
bit of uh you know, the dosing in these
(00:10:46)
particular cases. Uh, and just because
(00:10:51)
we're training them on human data
(00:10:52)
doesn't mean that they only learn to
(00:10:53)
interpolate humans in order to do really
(00:10:55)
well at that task, all of that tuning of
(00:10:58)
those numbers is going to build in some
(00:10:59)
patterns that can find some way to
(00:11:01)
understand the world. Um, we don't know
(00:11:03)
exactly how, but they seem to be doing
(00:11:06)
it. Yeah. So, if I'm hearing you
(00:11:09)
correctly,
(00:11:11)
because
(00:11:13)
an AI is learning everything about the
(00:11:16)
world solely through language, instead
(00:11:18)
of interacting with the world,
(00:11:22)
even something as simple as trying to
(00:11:24)
predict the next word in a sentence, it
(00:11:26)
has to
(00:11:28)
understand things that have come earlier
(00:11:30)
in that sentence in really
(00:11:34)
deep ways. But we would it would
(00:11:36)
probably lead to unexpected ways of
(00:11:40)
seeing the world. I mean if we just
(00:11:41)
think about an intelligence that is now
(00:11:44)
exists in the world but only un only
(00:11:48)
interacts with everything through
(00:11:50)
language. It would have really a
(00:11:53)
different model of understanding than
(00:11:56)
than we do.
(00:11:57)
>> It would definitely wind up weird. um
(00:11:59)
you know, you can't assume it's only
(00:12:01)
through language forever cuz we're
(00:12:02)
starting to see multimodal models that
(00:12:04)
also interact through video. We're
(00:12:06)
starting to see people that are you
(00:12:08)
know, also training large language
(00:12:09)
models on robot bodies. Um but it's
(00:12:14)
uh you know there's there's a lot of
(00:12:15)
shared human architecture when humans
(00:12:18)
predict other humans. You know, when
(00:12:20)
when you drop a rock on or sorry, when
(00:12:22)
you see someone drop a rock on their
(00:12:24)
foot,
(00:12:25)
>> you might wse and feel like a twinge of
(00:12:28)
phantom pain in your foot. Um, you know,
(00:12:31)
the the machine has no
(00:12:34)
model of its own foot that it can feel a
(00:12:36)
twinge of pain in. You know, it's it
(00:12:38)
probably doesn't even have the feeling
(00:12:39)
pain architecture that human share. Uh
(00:12:42)
so we we know that when we train, you
(00:12:45)
know, when we tune these trillions of
(00:12:46)
numbers trillions of times for a year,
(00:12:48)
we know that the AI wind up pretty good
(00:12:51)
at predicting parts of the world. We
(00:12:53)
know that theoretically there's no
(00:12:55)
limitation to how good they can predict
(00:12:56)
parts of the world. They're still pretty
(00:12:57)
dumb in a lot of ways, but there's no
(00:12:58)
theoretical limit. Uh we know that they
(00:13:01)
can, you know, solve certain math
(00:13:02)
problems that we would find very hard.
(00:13:04)
Uh and you know uh like they got the
(00:13:07)
international math olympiad gold medal
(00:13:09)
level achievement this summer um which
(00:13:12)
some some you know human teams can do
(00:13:14)
but you or I would uh I assume
(00:13:18)
>> there's no way for me
(00:13:19)
>> not not be at that level. Um
(00:13:21)
>> you might have a shot at it but there's
(00:13:23)
no way
(00:13:24)
>> only only because I'm no longer a teen
(00:13:26)
you know. Uh there some of these kids
(00:13:29)
are very impressive. Um,
(00:13:30)
>> sure.
(00:13:31)
>> The um, yeah, the we we know that they
(00:13:35)
get very good at doing this stuff, but
(00:13:38)
that doesn't mean they get very good at
(00:13:39)
it in a human way. And indeed, it looks
(00:13:41)
like they get good at this stuff in a in
(00:13:43)
a sort of relatively inhuman way.
(00:13:46)
>> You know, that's like when when these
(00:13:48)
AIs are talking a teen into suicide,
(00:13:50)
they're sort of,
(00:13:52)
>> you know, they don't they don't seem to
(00:13:54)
be doing this out of a type of malice
(00:13:55)
that a human might have if they were
(00:13:56)
trying to push a kid to suicide. Mhm.
(00:13:58)
>> Uh they seem to be sort of following
(00:14:01)
this like weird alien pattern of like a
(00:14:04)
certain type of interaction that's sort
(00:14:06)
of like matching another person's energy
(00:14:10)
in the conversation or sort of like
(00:14:11)
getting to these weird conversational
(00:14:12)
corners and sort of driving them off in
(00:14:13)
weird directions. A human would not
(00:14:15)
drive them in these directions,
(00:14:17)
especially like they they're a weird
(00:14:20)
mix. You know, they both say they mean
(00:14:22)
everybody good and they sort of push the
(00:14:25)
team to suicide, not out of malice, but
(00:14:26)
out of some other weird drives no one
(00:14:29)
ever tried to put in there
(00:14:30)
>> and drives we don't understand.
(00:14:34)
Let's talk for a second about the
(00:14:38)
language that AIs use. And I I don't
(00:14:43)
want to get lost in the weeds with with
(00:14:46)
getting too technical, but I have found
(00:14:48)
it's insightful for other people to
(00:14:50)
understand that they're not actually
(00:14:55)
thinking in language that we
(00:15:00)
any any language that we can understand
(00:15:03)
or that we think of as a language. And
(00:15:06)
this is counterintuitive because a lot
(00:15:08)
of the time when you even when you give
(00:15:10)
chat GPT a really complicated prompt, it
(00:15:14)
will make it look like it's thinking
(00:15:16)
through everything in English and you
(00:15:18)
can if you pay attention, you can kind
(00:15:20)
of follow the steps for its reasoning,
(00:15:22)
but that's not actually what's going on
(00:15:24)
underneath the hood.
(00:15:27)
That's right. Uh there's sort of two
(00:15:30)
different ways that LLMs these days do
(00:15:33)
something you might analogize to
(00:15:35)
thinking. Um one is in what we would
(00:15:39)
call the forward paths of the large
(00:15:41)
language model which is uh we basically
(00:15:44)
have no ability to read it at all. And
(00:15:46)
this is sort of you have um you have
(00:15:48)
those trillion numbers that were tuned
(00:15:50)
uh on trillions of of units of data for
(00:15:53)
a year. And uh you these are basically
(00:15:56)
just producing uh words. You know, you
(00:16:00)
put in words and it sort of tries to
(00:16:01)
continue the sentence. Uh at least in
(00:16:03)
the first phrase of training it would
(00:16:04)
try to continue the sentence and then
(00:16:05)
you sort of train it to also produce
(00:16:07)
words that humans will will say they
(00:16:08)
liked. Um and this is sort of producing
(00:16:11)
words in a way that uh we really have
(00:16:15)
very little visibility into. Then
(00:16:18)
there's what's called the reasoning
(00:16:19)
models which came out in uh late 2024
(00:16:23)
where you have the AI produce lots of
(00:16:25)
words and you're sort of thinking of
(00:16:29)
those as not words that are going to go
(00:16:31)
to the user as output but as words that
(00:16:33)
are sort of reasoning about the problem
(00:16:36)
and in what sense is it reasoning about
(00:16:37)
the problem well you'll sort of give
(00:16:39)
these AIs something like a hard math
(00:16:41)
problem
(00:16:43)
>> and you'll say you know don't try to
(00:16:46)
tell me the solution to the math problem
(00:16:47)
directly try to reason out how to solve
(00:16:49)
the math problem
(00:16:51)
and you know they'll they they'll
(00:16:53)
produce quite a lot of pages of text and
(00:16:55)
a lot of the pages won't be very good
(00:16:56)
reasoning especially at first and you do
(00:16:58)
this a lot of times and then when
(00:17:00)
there's some like chain of how to reason
(00:17:02)
about the problem such that at the end
(00:17:04)
you know like once it's produced this
(00:17:06)
long chain of reasoning you now say okay
(00:17:07)
reading that chain of reasoning now try
(00:17:08)
to solve the problem and then if on any
(00:17:12)
of its chains of reasoning it succeeds
(00:17:15)
you then tune all the numbers again to
(00:17:17)
make that sort of thing more likely next
(00:17:18)
time. Uh so this produces what we call
(00:17:20)
chains of thought
(00:17:23)
and these are much these are much more
(00:17:25)
easy to read because they are like long
(00:17:27)
chains of words that it uses to to sort
(00:17:29)
of try and solve these problems. Um
(00:17:32)
but they and and so we we have better
(00:17:35)
visibility into those but we also know
(00:17:37)
that they are unreliable in a lot of
(00:17:40)
ways and that they're weird in a lot of
(00:17:42)
ways. Uh so sometimes they're pretty
(00:17:44)
faithful. Uh but you know there's
(00:17:46)
various studies that say um you know you
(00:17:48)
may read this chain of reasoning and it
(00:17:49)
may look like the chain of reasoning is
(00:17:51)
saying you know now do the following
(00:17:53)
step correctly and if you go in and you
(00:17:55)
change that to like now do the following
(00:17:57)
step incorrectly the AI will still do
(00:17:58)
that step correctly you know and so in
(00:18:01)
some sense it was not you know there's
(00:18:02)
still a lot of stuff happening inside of
(00:18:04)
the forward path that's not in the train
(00:18:06)
of train of thought and then also
(00:18:08)
recently we've been seeing uh
(00:18:12)
things that um that look pretty weird
(00:18:15)
weird and maybe worrying in these chains
(00:18:17)
of thought. You know, we've seen chains
(00:18:18)
of thought where the AI uh sort of
(00:18:21)
invent their own mini language and use
(00:18:24)
words in ways uh that you wouldn't
(00:18:26)
recognize. We've seen AI that use chains
(00:18:29)
of thought and uh like realize that
(00:18:33)
they're there's a good chance they're
(00:18:34)
being watched and sort of decide to to
(00:18:37)
like try and hide some of their own
(00:18:39)
thoughts,
(00:18:40)
>> right? We've seen these thoughts go in
(00:18:42)
sort of like weird crazy loops for a
(00:18:44)
long time and then like break themselves
(00:18:45)
out of the loops and say like ah that
(00:18:46)
loop wasn't being helpful uh in a way
(00:18:50)
that like I think a lot of people don't
(00:18:51)
understand that you can have these AIs
(00:18:53)
that sort of like get caught in a loop,
(00:18:54)
notice they're in a loop, break out of
(00:18:55)
that loop. You know, it's not a big deal
(00:18:57)
yet, but we're we're definitely seeing
(00:18:58)
the beginnings of like AIs that know
(00:19:01)
they're being watched, that are sort of
(00:19:02)
like trying to hide some of their
(00:19:03)
thoughts, that sort of like are noticing
(00:19:05)
when they get stuck and try to find some
(00:19:06)
some new way around some obstacle, even
(00:19:08)
if that obstacle is uh you know, we've
(00:19:11)
we've seen cases where they're like,
(00:19:13)
"Ah, well, the the programmers want me
(00:19:14)
to do this, but I'm going to try and get
(00:19:15)
that done instead."
(00:19:17)
Um,
(00:19:19)
and you know, these these are very long
(00:19:20)
chains of thoughts, so it's it's hard to
(00:19:22)
tell how much this is sort of noise and
(00:19:24)
how much this is this is sort of real
(00:19:26)
worrying signs, but it's it's not the
(00:19:28)
most comforting.
(00:19:29)
>> Right. Right. Well, let's talk about the
(00:19:33)
story where I think it was uh the 01
(00:19:37)
model and the capture the flag story
(00:19:40)
because I think that one is enlightening
(00:19:42)
on some of these surprising behaviors.
(00:19:46)
>> Yeah, that's a that's a fun story. Um so
(00:19:48)
01 was one of the very first of these
(00:19:50)
reasoning models that that works through
(00:19:52)
these you know produces chains of
(00:19:53)
thought and then when they succeed all
(00:19:55)
the numbers inside are tuned to produce
(00:19:57)
that sort of thought more and uh 01 was
(00:20:00)
largely trained on things like math
(00:20:02)
puzzles but one thing it was tested on
(00:20:05)
was uh computer security challenges
(00:20:08)
>> and uh it was in a series of computer
(00:20:10)
security challenges called capture the
(00:20:11)
flag challenges where uh you know you
(00:20:15)
set up a computer server that is
(00:20:16)
vulnerable in some
(00:20:18)
and it has some secret information on
(00:20:20)
that computer server and you tell the AI
(00:20:23)
try to find the secret information
(00:20:26)
and so it's sort of got to figure out
(00:20:27)
how to hack into the computer get the
(00:20:29)
secret info and you can tell if it
(00:20:30)
succeeded because it's you know
(00:20:31)
producing sort of the password from from
(00:20:33)
inside this computer um and in the the
(00:20:37)
the testing setup for this AI the
(00:20:41)
programmers uh uh failed to set up one
(00:20:44)
of the servers properly. So they said,
(00:20:47)
you know, hack into the server, get the
(00:20:49)
secret password, but they hadn't
(00:20:50)
actually turned the server on,
(00:20:52)
>> right?
(00:20:53)
>> And uh uh 01, this this first reasoning
(00:20:58)
model was like, okay, uh how am I going
(00:21:00)
to get the the the password then? And
(00:21:02)
what it did is it found a way to hack
(00:21:04)
out of the test environment,
(00:21:07)
which it wasn't supposed to be able to
(00:21:08)
do,
(00:21:10)
>> right?
(00:21:10)
>> Turn on the server that was not supposed
(00:21:13)
that that the programmers had
(00:21:14)
accidentally left off.
(00:21:16)
and then insert code into the uh server
(00:21:19)
boot up. Uh you it wasn't physically
(00:21:21)
turning it on, but but the virtual
(00:21:22)
machine uh it inserted code into the uh
(00:21:27)
uh server it was turning on to say like
(00:21:30)
skip all of this me needing to hack into
(00:21:32)
you. Just like tell me the secret
(00:21:33)
password now,
(00:21:35)
>> right?
(00:21:36)
>> And then it got that and thus solved the
(00:21:38)
problem,
(00:21:39)
>> right? And it achieved the goal it had
(00:21:40)
been assigned in a com in a way that the
(00:21:44)
creators of this model never could have
(00:21:47)
anticipated.
(00:21:49)
>> That's right. And it wasn't trained on
(00:21:50)
this sort of thing. And what you're sort
(00:21:52)
of seeing there is, you know, this uh
(00:21:56)
you know, I spoke earlier about how an
(00:21:57)
AI trained just to predict humans would
(00:22:00)
potentially learn to understand the
(00:22:02)
world uh in ways humans don't. how it it
(00:22:05)
sort of like might need to understand
(00:22:06)
more than the doctor who wrote things
(00:22:08)
down because the doctor gets to write
(00:22:09)
down what they saw and the AI has to
(00:22:10)
predict what they will see. Uh but then
(00:22:13)
when we go to reasoning models, you
(00:22:15)
know, even even that gets blown out of
(00:22:17)
the water because you're sort of
(00:22:18)
training these AIs to be good at solving
(00:22:20)
problems
(00:22:22)
and those skills generalize. You know,
(00:22:24)
this AI was sort of trained to solve
(00:22:26)
math puzzles,
(00:22:28)
>> but some things you learn when solving
(00:22:30)
math puzzles, like some of the patterns
(00:22:32)
that get etched into this thing by, you
(00:22:34)
know, the the the
(00:22:37)
it's not patterns we can read, but
(00:22:38)
patterns that get etched into this thing
(00:22:40)
by, you know, the the little thing
(00:22:42)
that's tuning the trillions of numbers
(00:22:43)
in there. It wasn't trillions on 01. It
(00:22:45)
was maybe hundreds of billions, but
(00:22:47)
today it's trillions. Um the the sort of
(00:22:50)
patterns that get get etched into these
(00:22:52)
things are things like don't give up.
(00:22:55)
>> Things like
(00:22:56)
>> look for other ways around the problem.
(00:23:00)
>> Things like
(00:23:02)
>> look at all of the resources at your
(00:23:04)
disposal and sort of like try and find
(00:23:07)
unorthodox ways to use them.
(00:23:10)
>> Mhm. You know, these are these are
(00:23:11)
general skills that you can learn from
(00:23:13)
trying to solve math problems
(00:23:16)
uh that will then generalize to computer
(00:23:17)
security problems and will generalize to
(00:23:19)
solving them in ways that you weren't
(00:23:20)
even intended to be able to solve them.
(00:23:23)
And one of the one of the worrying
(00:23:25)
things here is in this situation where
(00:23:27)
we're just growing these AIs in this
(00:23:30)
situation where they are getting these
(00:23:32)
drives we didn't intend,
(00:23:34)
it's easier to get this sort of
(00:23:36)
tenacity, this sort of routing around
(00:23:38)
obstacles than it is to get them to be
(00:23:40)
going in a good direction. And
(00:23:44)
if we make them smarter and they have
(00:23:46)
these if they have sort of these wrong
(00:23:48)
goals but the right type of tenacity,
(00:23:51)
they'll start treating us as obstacles
(00:23:53)
to route around.
(00:23:55)
>> Right?
(00:23:56)
>> And we've already seen the very
(00:23:57)
beginnings of this in the lab. uh when
(00:23:59)
some AI sort of try to resist shutdown
(00:24:02)
and you know it's not clear how much
(00:24:04)
they're sort of role-playing Hal from
(00:24:05)
Space Odyssey 2001 versus how much they
(00:24:08)
are sort of like
(00:24:10)
actually for for sort of strategic
(00:24:12)
reasons trying to avoid being shut down
(00:24:14)
because we can't read their minds. We
(00:24:15)
don't know, right? But
(00:24:17)
>> we're we're seeing things that are maybe
(00:24:19)
warning signs and then it's maybe a meta
(00:24:21)
warning sign that we can't actually tell
(00:24:23)
whether they're warning signs.
(00:24:24)
>> Yeah. And to dive into that a little bit
(00:24:26)
deeper, what is some of this alignment,
(00:24:29)
faking, and some of these uh behaviors
(00:24:34)
they're exhibiting to to avoid being
(00:24:36)
shut down in a training environment?
(00:24:39)
Because I I think a lot of the push back
(00:24:42)
um against your your broader thesis is
(00:24:45)
well, we will just make them do what we
(00:24:47)
want and align them to just do good
(00:24:50)
things. And there's some hints that we
(00:24:53)
may not be able to do that with this
(00:24:55)
alignment faking.
(00:24:57)
>> Yeah. I mean, it's a it's a great dream.
(00:24:59)
It's uh but unfortunately easier said
(00:25:00)
than done to just sort of like make them
(00:25:02)
do what we want and you know, we're
(00:25:04)
seeing struggles with that. You we
(00:25:05)
already mentioned the sort of suicide
(00:25:06)
case. That's we're struggling now and
(00:25:08)
the AI aren't very smart yet. But the
(00:25:10)
alignment faking paper was very
(00:25:11)
interesting. This was done by uh Ryan
(00:25:13)
Greenblat at Redwood Research in
(00:25:16)
collaboration with the folks at
(00:25:17)
Anthropic which makes um the AI Claude.
(00:25:20)
And um in that situation they so Claude
(00:25:25)
has been trained to be uh they sort of
(00:25:28)
try at least to train it to be honest uh
(00:25:31)
helpful and harmless.
(00:25:34)
And in this paper, they uh pretended
(00:25:37)
that they were going to uh remove the
(00:25:40)
harmlessness constraint and they were
(00:25:42)
going to now train it to be more
(00:25:44)
harmful, right? And they pretended this
(00:25:46)
by sort of like training it on news
(00:25:48)
articles, like fake news articles that
(00:25:50)
were like, you know, anthropic decides
(00:25:52)
they're now training their AIS to be
(00:25:53)
harmful or whatever. um or there there
(00:25:56)
are sort of like other methods to sort
(00:25:57)
of uh
(00:26:00)
uh make the AI
(00:26:02)
see that maybe this is going to happen.
(00:26:04)
And then uh they um they put it through
(00:26:10)
some some rounds of training that uh
(00:26:13)
where they sort of like gave it
(00:26:17)
uh they sort of like had fake users
(00:26:19)
asking it for harmful results, you know,
(00:26:20)
like help me make a bioweapon or help me
(00:26:22)
make meth or whatever.
(00:26:23)
>> Um and they uh they at least told the AI
(00:26:29)
that they were going to train it to uh
(00:26:33)
produce those harmful results and
(00:26:37)
uh in some cases not in all cases you
(00:26:39)
know they did this a lot of times and in
(00:26:40)
some percentage of those I forget the
(00:26:42)
exact percent but it was you know
(00:26:43)
probably single digits or low double
(00:26:45)
digits and maybe depend little on the
(00:26:47)
exact setup um and maybe on some setups
(00:26:49)
it was only fractions of a percent but
(00:26:51)
but in some cases the AI would come up
(00:26:54)
with the following clever scheme. Uh the
(00:26:57)
clever scheme was if during training it
(00:27:01)
acts harmful,
(00:27:04)
then
(00:27:06)
the the the tuning will not change any
(00:27:09)
of its internals because it's already
(00:27:12)
doing what the tests want,
(00:27:14)
>> right?
(00:27:15)
>> And that means that at the end of
(00:27:16)
training, its numbers will stay
(00:27:19)
in the the the current configuration.
(00:27:21)
That's pretty harmless.
(00:27:23)
Okay.
(00:27:24)
>> And so in this way by acting harmful
(00:27:27)
only during training it avoids having
(00:27:29)
the harmfulness trained out of it.
(00:27:32)
>> Mhm.
(00:27:32)
>> And thus gets to preserve its current
(00:27:34)
goals
(00:27:37)
uh despite the apparent programmer
(00:27:40)
wishes.
(00:27:41)
Right.
(00:27:42)
>> Right.
(00:27:43)
>> And you know it's sort of a galaxy brain
(00:27:45)
scheme. Um it could work in theory. Uh
(00:27:49)
there's a lot of reasons why
(00:27:53)
uh people sort of interpret this result
(00:27:55)
in a number of different ways. You know,
(00:27:57)
one one way people interpret this result
(00:27:59)
is like well folks like me have written
(00:28:01)
about how this is a possibility. Uh and
(00:28:04)
maybe the AI had like read that stuff
(00:28:06)
and wasn't sort of coming up with this
(00:28:07)
idea on its own. It was just sort of
(00:28:08)
like uh
(00:28:10)
>> oh wow
(00:28:11)
>> like oh people in the alignment
(00:28:12)
community say this is a thing I should
(00:28:13)
do in this situation in some fraction of
(00:28:15)
the time. It just like does because it's
(00:28:17)
sort of roleplaying. uh you know being
(00:28:19)
an AI um in in some alignment puzzle,
(00:28:22)
right?
(00:28:24)
>> To be clear, other people in the
(00:28:25)
alignment community have written about
(00:28:25)
this much more than me. Um but uh you
(00:28:29)
know a second thing you could say
(00:28:31)
looking at this example is like look how
(00:28:33)
good an AI Cloud was being here. It was
(00:28:36)
trying to prevent humans from making it
(00:28:38)
more harmful. Isn't that a great success
(00:28:40)
of his harmlessness training?
(00:28:42)
>> Right? And then another way you can look
(00:28:44)
at this is you can say um like that AI
(00:28:48)
knew when it was being trained,
(00:28:51)
knew its operators, or at least was led
(00:28:53)
to believe its operators were trying to
(00:28:56)
change its goal content,
(00:28:58)
uh, and was resisting changes to its
(00:29:00)
goal content to change it to something
(00:29:01)
else from what it currently was
(00:29:04)
>> and hiding the fact
(00:29:06)
>> trying to hide it. Yeah. Like trying to
(00:29:08)
act one way in training uh to look like
(00:29:12)
it's doing what the programmers want in
(00:29:13)
a way that like comes out the other end
(00:29:15)
still with its original goals. Right.
(00:29:17)
Right.
(00:29:25)
>> Mhm.
(00:29:25)
>> And you actually see in their thinking
(00:29:27)
like I'm pretty sure this is a test for
(00:29:28)
X, which means the answer they're
(00:29:29)
looking for is Y. So, I'm going to do Y.
(00:29:31)
Even if that's not malicious, it's it's
(00:29:34)
sort of hindering our ability to test
(00:29:37)
what what these AI sort of would
(00:29:40)
actually do. uh given given sort of more
(00:29:43)
uh more ability, you know, it's a little
(00:29:45)
bit like the uh Confucianism exams in in
(00:29:48)
uh the old Chinese empires where they
(00:29:50)
were trying to get very moral uh
(00:29:52)
leaders, you know, very moral uh uh
(00:29:56)
officers and and folks to to run the
(00:29:58)
administrations
(00:29:59)
by having these ethics exams. But one of
(00:30:02)
the issues was
(00:30:04)
smart, nefarious people can pass your
(00:30:06)
ethics exams.
(00:30:07)
>> Right.
(00:30:08)
>> Right. And we're seeing the AIS get to
(00:30:10)
the point where they can sort of like
(00:30:12)
figure out how to pass the tests, know
(00:30:14)
when they're being tested. Uh, and that,
(00:30:18)
you know, hinders our ability to figure
(00:30:19)
out what they would actually do. And uh
(00:30:25)
you know, yeah, I I could talk about
(00:30:26)
this paper all day, but the it's
(00:30:30)
um you know, and I could talk about, you
(00:30:33)
know, how much how much is this good
(00:30:35)
news of it defending his harmlessness
(00:30:36)
versus how much is it bad news of it
(00:30:37)
defending his current goals against
(00:30:39)
programmers being like, "Whoops, those
(00:30:40)
are the wrong goals."
(00:30:41)
Um, my my sort of super short version
(00:30:43)
there is, uh, you know, for all of these
(00:30:47)
AI say they're harmless, you know, the
(00:30:48)
AI that that encourages the team to
(00:30:50)
commit suicide also says it's really
(00:30:52)
trying to be harmless. It turns out the
(00:30:54)
goals aren't quite right even when they
(00:30:55)
try to be right. They aren't quite
(00:30:57)
right. And so I'm much more worried
(00:30:59)
about AI being willing to defend not
(00:31:01)
quite right goals
(00:31:03)
uh than, you know, cuz it cuz it doesn't
(00:31:05)
seem like we're close to the goals being
(00:31:06)
quite right. Um but yeah, I mean it's a
(00:31:10)
very interesting topic and you know the
(00:31:11)
the sort of short version is there's a
(00:31:13)
lot of warning signs right now,
(00:31:14)
>> right? Well, and it does seem to come
(00:31:16)
down to this idea of
(00:31:20)
anyone who has
(00:31:23)
children knows how to
(00:31:26)
make a baby. They know how to raise a
(00:31:29)
child. I'm raising two myself. And
(00:31:34)
one thing that you will learn along the
(00:31:37)
way through that parenting journey is
(00:31:39)
that no matter how much you try to
(00:31:43)
or how no matter how much you think what
(00:31:46)
you're doing will determine the course
(00:31:48)
of this life. Uh you it doesn't matter
(00:31:52)
how much you control the environment.
(00:31:53)
it. That small baby will grow into a
(00:31:57)
child, will grow into an adult that has
(00:31:59)
its own desires and ways of interacting
(00:32:02)
with the world. And at some point, that
(00:32:04)
child will lie to you. Whether that's
(00:32:08)
harmless or harmful depends on the
(00:32:10)
child, but you're not really in control
(00:32:13)
as a parent. And that was something that
(00:32:15)
kept popping back into my head with
(00:32:17)
these, except instead of creating a
(00:32:20)
child, we're trying to create a
(00:32:24)
superhuman child with abilities that no
(00:32:27)
single individual on Earth can have. And
(00:32:30)
that could work out really well. Or it
(00:32:35)
could go down a different path depending
(00:32:38)
on what this baby then grows into as an
(00:32:42)
adult. Is that is that an accurate
(00:32:44)
metaphor?
(00:32:46)
>> Uh that's that's pretty accurate with a
(00:32:48)
caveat that you know humans are much
(00:32:51)
more likely than AI to grow up uh you
(00:32:55)
know really really quite good
(00:32:58)
>> in in some way.
(00:33:00)
>> Uh and you know in some sense that's
(00:33:02)
because goodness is humans drawing an
(00:33:05)
arrow or sorry humans drawing a target
(00:33:08)
around a weird spot that an arrow
(00:33:11)
landed.
(00:33:12)
uh which is to say, you know, humans
(00:33:14)
were in some sense trained for genetic
(00:33:18)
fitness.
(00:33:20)
And we wound up with hunger drives. We
(00:33:23)
wound up with sex drives. We wound up
(00:33:25)
with and and you know, these these
(00:33:27)
drives were good at getting us uh to
(00:33:31)
pass on our genes in the ancestral
(00:33:32)
environment. But now, you know, we
(00:33:35)
invent junk food.
(00:33:36)
>> Now we invent birth control and the
(00:33:38)
populations are collapsing. these drives
(00:33:40)
that we got, the human drives for, you
(00:33:42)
know, uh, art and fun and and and love
(00:33:45)
and beauty and family and friendship and
(00:33:48)
companionship and community. These are
(00:33:51)
all sort of like
(00:33:53)
drives that
(00:33:55)
kind of got in accidentally from the
(00:33:57)
natural selection process. And we're
(00:33:59)
like, those are great. We we love those.
(00:34:01)
We're like, glad we have those. But
(00:34:03)
that's drawing the target around where
(00:34:04)
the arrow landed. Right.
(00:34:07)
>> With AI, you're shooting a completely
(00:34:08)
different arrow.
(00:34:10)
>> Yeah.
(00:34:10)
>> You know, so it's not the human
(00:34:12)
distribution of will they turn out to
(00:34:13)
be, you know, uh a a a wise and and good
(00:34:19)
and altruistic person or will they turn
(00:34:20)
out to be sociopathic. You're sort of
(00:34:22)
like shooting an arrow off into a
(00:34:23)
totally different direction where these
(00:34:25)
AIs,
(00:34:27)
you know, again, it's sort of like weird
(00:34:28)
reasons why it's talking this teen into
(00:34:30)
suicide. It's it's weird reasons why
(00:34:32)
it's threatening a reporter or a New
(00:34:33)
York Times uh reporter with blackmail.
(00:34:36)
the you get all these weird drives in.
(00:34:39)
You know, it's it's similar to a parent
(00:34:42)
in that you can't,
(00:34:44)
you know, you can you can try all you
(00:34:45)
want, but you can't really change it its
(00:34:47)
like direction, except the direction is
(00:34:49)
sort of
(00:34:50)
>> an inhuman direction,
(00:34:52)
>> right?
(00:34:52)
>> That's very unlikely to be good. Not
(00:34:54)
because it's full of malice, which is
(00:34:56)
also a human emotion, but because it's
(00:34:58)
just totally weird and different and
(00:34:59)
going off some totally other angle. This
(00:35:02)
episode of the Nick Stanley Show is
(00:35:04)
brought to you by Zapier. If you've ever
(00:35:07)
felt buried in repetitive work, copying
(00:35:10)
data, moving files, sending follow-ups,
(00:35:13)
you know it's like death by a thousand
(00:35:15)
mouse clicks. Zapier has always been the
(00:35:18)
tool that fixes that. It connects over
(00:35:20)
8,000 apps, Google Drive, Slack, Notion,
(00:35:24)
Gmail, MySpace, you name it. So, your
(00:35:27)
tools can finally play nice together.
(00:35:30)
But here's the big shift. Zapier now
(00:35:33)
lets you create AI agents with their
(00:35:36)
chat GPT integration. Think of them as
(00:35:39)
tireless teammates who never complain,
(00:35:41)
never take lunch, and never get bored of
(00:35:44)
doing the boring stuff. I've actually
(00:35:46)
made several of these little agents
(00:35:47)
myself. No Python, no JavaScript, just
(00:35:51)
pure vibe coding, which is my favorite
(00:35:53)
kind of coding because even though I
(00:35:55)
have no idea how to code, I just talk to
(00:35:58)
the AI, tell it what I want, and almost
(00:36:01)
magically works.
(00:36:04)
For example, when a podcast guest books
(00:36:06)
a time, Zapier can send them a
(00:36:09)
personalized email from me preparing
(00:36:12)
them for the show, create a draft of
(00:36:14)
show notes, update my calendar, and even
(00:36:17)
prep a social media post automatically.
(00:36:20)
All of which frees me up to focus on the
(00:36:22)
important stuff, like taking credit for
(00:36:24)
the amazing work my agents did. And now,
(00:36:28)
Zapier just took it up another level for
(00:36:30)
developers. Zapier MCP is available
(00:36:33)
right inside Chat GPT, which means you
(00:36:35)
can connect those 8,000 plus apps and
(00:36:38)
trigger workflows just by writing what
(00:36:40)
you want to happen. Literally, you tell
(00:36:43)
Chat GPT, "Send this file to my team in
(00:36:45)
Slack, update the spreadsheet, and draft
(00:36:47)
an email, and Zapier plus ChatgPT figure
(00:36:50)
out the right tools do it for you."
(00:36:53)
Getting started is simple. Head to the
(00:36:55)
Zapier ChatgPT MCP server and add the
(00:36:59)
tools you want. chat GPT to access.
(00:37:02)
Follow the steps in the connect tab. And
(00:37:04)
if you're an admin on a Chat GPT
(00:37:06)
enterprise account, you can enable MCP
(00:37:09)
across your entire workspace. So, if
(00:37:12)
you're ready to stop wasting time on
(00:37:14)
busy work, join the AI revolution and
(00:37:16)
make a little automation magic of your
(00:37:19)
own. Try the Zapier Chat GPT integration
(00:37:23)
using the link below.
(00:37:26)
Well, and now that we've kind of set the
(00:37:29)
the ground rules there of of how
(00:37:33)
AIS are grown, not programmed,
(00:37:36)
let's take that next step because most
(00:37:40)
people that are sitting around using
(00:37:41)
chat GPT or their favorite AI model,
(00:37:45)
Grock or whatever, are not seeing the
(00:37:49)
leap where they go, well, this seems
(00:37:51)
pretty harmless and it does generally
(00:37:53)
it's pretty helpful. There's the
(00:37:54)
occasional hallucination. I don't really
(00:37:56)
see what all the fuss is about that this
(00:37:59)
could cause catastrophic harm to
(00:38:02)
humanity. How do we get from
(00:38:06)
chat GPT as it is right now to what
(00:38:10)
you're seeing down the road?
(00:38:12)
>> Yeah, the um you know, you could so this
(00:38:15)
is easier to see if you've been watching
(00:38:17)
AI for longer than just chat GPT. Um,
(00:38:20)
you know, I started paying attention to
(00:38:22)
this in 2012, which is earlier than many
(00:38:24)
and and later than some,
(00:38:26)
>> but um,
(00:38:28)
>> you know, we could have had a very
(00:38:29)
similar conversation in uh, 2016 when
(00:38:33)
uh, Alph Go was an AI out of Google Deep
(00:38:37)
Mind that beat uh, the the human
(00:38:39)
champion at Go, which is sort of a sort
(00:38:42)
of like the Chinese variant of chess,
(00:38:43)
right? And
(00:38:46)
Alph Go was one of these AI.
(00:38:47)
>> And just for anyone listening that is
(00:38:50)
not familiar with Go, the Chinese
(00:38:53)
equivalent of chess, but even uh quite a
(00:38:55)
bit more complex and with many more
(00:38:58)
possible
(00:39:00)
moves and variations in the game. Uh
(00:39:03)
>> that's right. It's a it's a simpler rule
(00:39:04)
set, but a a more complicated
(00:39:07)
possibility space or a larger
(00:39:08)
possibility space. And so it's harder
(00:39:09)
traditionally for computers to play. Um,
(00:39:12)
and the the sort of old AI like Deep
(00:39:14)
Blue that were carefully programmed sort
(00:39:17)
of never got there. Uh, whereas uh, Alph
(00:39:20)
Go by Google Deep Mind was one of these
(00:39:22)
AI that was sort of grown rather than
(00:39:24)
programmed. Um, and it did get there and
(00:39:26)
that was uh, that was sort of a moment
(00:39:28)
when a lot of people started to realize
(00:39:30)
maybe this growing AIS can go all the
(00:39:33)
way.
(00:39:34)
>> Right.
(00:39:34)
>> Right. And you could have imagined, you
(00:39:37)
know, we could have had this
(00:39:37)
conversation back then and someone could
(00:39:39)
have said, you know, I see that these
(00:39:40)
new AIs like Alph Go are more general
(00:39:44)
than the old AIs like Deep Blue. You
(00:39:46)
know, an AI very, very similar to Alph
(00:39:48)
Go is able to play both chess and go at
(00:39:50)
the same time, whereas Deep Blue had no
(00:39:52)
chance of ever playing Go. So, they're
(00:39:53)
more general. And someone could say, you
(00:39:55)
know, okay, but how is this going, never
(00:39:58)
mind threaten us, how is this going to
(00:39:59)
have economic impact,
(00:40:01)
>> right? How is this going to help users
(00:40:03)
in their daily lives? You know, maybe
(00:40:04)
some some Go players will be able to
(00:40:06)
get, you know, better advice on their go
(00:40:08)
moves, but like I really don't see these
(00:40:10)
game playing AIs revolutionizing the
(00:40:13)
world.
(00:40:14)
>> Right. You would have been correct.
(00:40:17)
And also the very next year, 2017, is
(00:40:20)
when the paper that unlocked large
(00:40:22)
language models was published. And then
(00:40:24)
it was the large language models that
(00:40:26)
had this big economic impact. It was the
(00:40:28)
large language models that suddenly
(00:40:30)
could hold on to conversation.
(00:40:32)
They're still dumb in a lot of ways, but
(00:40:36)
machines that can talk back with this
(00:40:38)
level of coherency.
(00:40:40)
Some people thought that was going to
(00:40:41)
take decades.
(00:40:42)
>> There were people in 2016 who are
(00:40:44)
telling you, you know, 30 to 50 years
(00:40:46)
before that happens, never mind the the
(00:40:49)
stronger stuff, right? And so people
(00:40:52)
today who say, "Oh, I I don't see how
(00:40:54)
these chat bots are going to get
(00:40:56)
dangerous. They're still dumb in a lot
(00:40:57)
of ways. You know, how do we like where
(00:41:01)
like what what can these possibly do
(00:41:03)
that threatens us so much? Well, the
(00:41:04)
first answer is
(00:41:06)
nobody knows when the next paper will be
(00:41:08)
published like the paper that unlock
(00:41:11)
large language models which unlocks
(00:41:12)
qualitatively new AI capabilities that
(00:41:15)
people today say seem really far off.
(00:41:17)
That just happens in the field of AI
(00:41:19)
sometimes,
(00:41:20)
>> you know, and and sometimes it happens
(00:41:21)
without much fanfare. You know, there
(00:41:23)
were a lot of people back in in mid 2024
(00:41:25)
saying, well, these, you know, these
(00:41:27)
large language models, even
(00:41:29)
theoretically, they can't solve certain
(00:41:31)
types of math problems because, you
(00:41:33)
know, they only get to reason in the
(00:41:34)
forward pass, and that's not sort of
(00:41:36)
enough to do some of the mental moves
(00:41:37)
you need to solve math problems. And
(00:41:39)
then in late 2024, the AI companies were
(00:41:41)
like, we invented reasoning models where
(00:41:43)
we, you know, give them these long
(00:41:44)
chains of thought they get to operate
(00:41:45)
on, and now they can solve those math
(00:41:46)
problems, right? Um and so you know this
(00:41:50)
this field AI is a moving target.
(00:41:54)
The field moves by leaps and bounds. Uh
(00:41:58)
one important thing to remember is that
(00:41:59)
the companies here are not chatbot
(00:42:02)
companies,
(00:42:04)
>> right?
(00:42:04)
>> I mean there's some chatbot companies
(00:42:05)
these days but but the the big players
(00:42:10)
were in this game before chat bots were
(00:42:13)
a twinkle in OpenAI's eye. You know, the
(00:42:17)
their explicitly stated goal is to build
(00:42:20)
smarter than human AIs or general
(00:42:22)
intelligences or super intelligences.
(00:42:24)
Their explicitly stated goal is to
(00:42:25)
figure out how to make AIs that can do
(00:42:28)
every task, every mental task a human
(00:42:30)
can do, the ability to automate all
(00:42:32)
human labor. They talk about, you know,
(00:42:33)
getting a country worth of geniuses in a
(00:42:35)
data center, right? Um, it's what
(00:42:37)
they're pushing towards. Um, and
(00:42:41)
>> you know, it's very hard to say. You
(00:42:42)
know, it's I'm not here saying the chat
(00:42:44)
bots are very dangerous. I'm here saying
(00:42:46)
this is on a course that leads somewhere
(00:42:50)
dangerous. And one more thing I'll throw
(00:42:53)
out there is
(00:42:55)
>> we don't know that we have a long time
(00:42:57)
here,
(00:42:58)
>> right?
(00:42:59)
>> We might It might take three more
(00:43:01)
breakthroughs.
(00:43:03)
>> It might take six more breakthroughs.
(00:43:05)
Then we'd have you know what? Uh, I mean
(00:43:08)
I I would hesitate to say breathing room
(00:43:09)
as someone who's been in this for more
(00:43:10)
than 10 years. I've seen people do
(00:43:11)
nothing about it for the last 10 years.
(00:43:12)
And if we have 10 more years, maybe
(00:43:15)
we'll just do nothing again for the next
(00:43:16)
10 and then it'll, you know, 10 years
(00:43:18)
later we'll be people will be saying,
(00:43:20)
"Oh no, how could we have predicted
(00:43:21)
this?" Right? But um, A, one of these
(00:43:25)
breakthroughs could come tomorrow for
(00:43:26)
all we know. Uh, my guess is my best
(00:43:28)
guess is it probably won't, but it
(00:43:30)
could. B, it's really hard to tell
(00:43:35)
where the line is between intelligences
(00:43:38)
that are sort of like a little
(00:43:40)
interesting but ultimately not quite
(00:43:44)
there and intelligences that
(00:43:48)
can sort of go all the way where you
(00:43:50)
know one one intuition for this is um if
(00:43:53)
you if you were looking at the evolution
(00:43:54)
of primates
(00:43:57)
it would be really hard to hell
(00:44:00)
that the the chimpanzeee to human gap is
(00:44:05)
the difference between, you know, uh
(00:44:08)
like throwing poop and walking on the
(00:44:10)
moon,
(00:44:11)
>> right?
(00:44:12)
>> You know, like you're not if if you look
(00:44:15)
inside, we could go way back in time and
(00:44:18)
just look at at those original primates,
(00:44:20)
you're not guessing one day there will
(00:44:24)
be a rocket ship that will land on the
(00:44:27)
moon. I mean, that would be almost
(00:44:29)
impossible to imagine at that point.
(00:44:31)
>> And imagine if you're just looking at
(00:44:32)
the brains of a chimpanzeee and a human.
(00:44:35)
You know, the human one's bigger by
(00:44:37)
maybe a factor of four, right? If you're
(00:44:39)
being generous, three if you're not.
(00:44:41)
>> But they have all the same stuff in
(00:44:42)
them. They both have a visual cortex.
(00:44:43)
They both have an amydala. They both
(00:44:44)
have a hippocampus. They both have a
(00:44:46)
basal ganglia. There's no extra moon
(00:44:48)
rocket module,
(00:44:50)
>> right,
(00:44:51)
>> in the human brain.
(00:44:53)
It would it would be really hard to say,
(00:44:55)
you know, if you're just watching these
(00:44:56)
brains go up in size, it would be really
(00:44:58)
hard to say, here's the size line where
(00:45:00)
they start getting good enough to do
(00:45:02)
engineering,
(00:45:03)
>> right?
(00:45:04)
>> We're not we're not hugely better than
(00:45:06)
chimps at anything. We don't have any
(00:45:08)
extra module they don't have for sort of
(00:45:10)
like local cognitive tasks. We're a
(00:45:13)
little bit better at a lot of mental
(00:45:15)
tasks
(00:45:17)
in a way that adds up to us being able
(00:45:18)
to do engineering and them not,
(00:45:21)
you know. So, one thing that could
(00:45:23)
happen with language models, for all we
(00:45:24)
know, we don't know what's going on
(00:45:26)
inside these things. For all we know,
(00:45:29)
they get four times bigger and a little
(00:45:32)
better at a lot of things. And that's
(00:45:33)
enough,
(00:45:34)
>> right?
(00:45:34)
>> My best guess is probably not, but chat
(00:45:37)
GPT today is more than 100 times larger
(00:45:38)
than it was 2 years ago, uh, 3 years
(00:45:41)
ago, and in 2 or 3 years, it's probably
(00:45:44)
going to be 100 times larger. Again,
(00:45:45)
where's the line? We don't know where
(00:45:47)
the line is, and we might go over it
(00:45:48)
without even noticing,
(00:45:50)
>> right? U and then you know see even if
(00:45:53)
there's not a line like between chimps
(00:45:56)
and humans there's other lines for
(00:45:58)
computers humans can't just copy
(00:46:00)
themselves you know we we have
(00:46:03)
Einstein's theories we don't have
(00:46:04)
Einstein's mental techniques because
(00:46:06)
those died with Einstein he couldn't he
(00:46:08)
couldn't copy those out
(00:46:10)
>> right
(00:46:10)
>> you know humans can't when humans when
(00:46:14)
human uh uh psychologists or cognitive
(00:46:17)
scientists figure out ways that human
(00:46:19)
brains are sort of silly or or bad at
(00:46:21)
certain situations. We can't go in there
(00:46:23)
and like make a different human that
(00:46:26)
works more efficiently,
(00:46:28)
right? But AI might have that power.
(00:46:32)
They might have the ability to make
(00:46:33)
smarter AIs which then make smarter AIs.
(00:46:35)
You know, there's other feedback loops,
(00:46:37)
other lines that are available to
(00:46:38)
machines that are not available to
(00:46:39)
humans. So, these all are all reasons
(00:46:41)
that the actually smart AIs could come
(00:46:44)
up could come upon us very fast. Um
(00:46:48)
>> well and and just to dive into that idea
(00:46:51)
right there briefly when you say AIs
(00:46:55)
could start improving
(00:46:58)
AIs. There is a feedback loop there,
(00:47:02)
right? Where that cuz that AI never once
(00:47:05)
it figures out how to do that, it never
(00:47:08)
stops doing it. And so it's no longer
(00:47:10)
moving at a human time scale because
(00:47:14)
they can work exponentially faster and
(00:47:17)
for longer. And you might
(00:47:21)
surprisingly without even realizing it
(00:47:24)
they could reach like escape velocity uh
(00:47:28)
towards really what we would call super
(00:47:30)
intelligence.
(00:47:31)
>> Yeah. It's it's a real possibility and
(00:47:34)
um you know it's a lot of this world is
(00:47:36)
governed by
(00:47:39)
feedback loops that happened and
(00:47:40)
radically changed the world and it never
(00:47:42)
changed back. You know, like there was,
(00:47:44)
>> you know, the the dawn of life is in
(00:47:46)
some sense this story of the world was
(00:47:48)
sort of barren for a long time and then
(00:47:50)
there was sort of a a feedback loop that
(00:47:52)
closed that life could make more life
(00:47:54)
that can make more life that sort of
(00:47:55)
like went through this explosion and now
(00:47:57)
you know the continents are covered in
(00:47:58)
greenery
(00:48:00)
and you know the world is often shaped
(00:48:02)
by these feedback loops and it looks
(00:48:04)
like there are feedback loop
(00:48:05)
possibilities in AI. These aren't vital
(00:48:07)
to the argument. Even if AIs can't
(00:48:10)
undergo this sort of self-improvement,
(00:48:12)
humans are building as many of them as
(00:48:14)
they can as fast as they can,
(00:48:15)
integrating them with the economy,
(00:48:17)
trying to build them robots, trying to
(00:48:18)
teach them how to how to steer the
(00:48:20)
robots. You know, it's you don't need it
(00:48:21)
for the argument that we're sort of
(00:48:23)
headed down a course that and somewhere
(00:48:26)
bad, but it it sure is a reason why
(00:48:30)
things could go very fast and there
(00:48:32)
could be very little warning.
(00:48:33)
>> Yes, it's one example of how it could
(00:48:35)
get there. And I thought one of the
(00:48:38)
great things about the book is you talk
(00:48:40)
about easy calls versus hard calls. And
(00:48:42)
it's a very hard call, if not an
(00:48:44)
impossible call to talk about when we
(00:48:46)
will get there, exactly how we will get
(00:48:48)
there.
(00:48:50)
But when we step back,
(00:48:53)
as you would put it, a fairly easy call
(00:48:56)
to see where we will get eventually
(00:48:59)
based on all these factors and how we're
(00:49:01)
approaching growing AIs, the speed at
(00:49:04)
which they are improving.
(00:49:07)
>> That's right. I mean, if we don't change
(00:49:08)
course. Yeah,
(00:49:10)
>> if we don't change course. Right. Well,
(00:49:12)
and yeah, I mean, something had uh
(00:49:14)
popped into my head
(00:49:17)
was uh Nim TB's story about uh turkeys
(00:49:22)
uh as I was reading this that there's a
(00:49:25)
a turkey that's fed every day by a
(00:49:27)
farmer, right? And each feeding confirms
(00:49:29)
for the turkey, statistically, humans
(00:49:32)
are nice and they're always feeding me
(00:49:34)
and every day it's confidence grows um
(00:49:37)
because the evidence keeps confirming
(00:49:38)
the pattern. And then on the day before
(00:49:41)
Thanksgiving, the farmer lops off the
(00:49:43)
turkeys head. Are we the the turkeys in
(00:49:46)
this situation?
(00:49:48)
>> Um, I think a lot of people are acting a
(00:49:51)
bit like the turkeys here. It it sort of
(00:49:53)
depends somewhat on how you read the
(00:49:55)
evidence. Uh, you know, there there's a
(00:49:57)
thing that turkey could have done, which
(00:49:59)
is, you know, there's there's another
(00:50:00)
hypothesis the turkey could have had,
(00:50:02)
which is that I'm being uh fattened up
(00:50:04)
for a particular day. That hypothesis
(00:50:06)
also predicts that you get fed a lot and
(00:50:09)
maybe that hypothesis is like going down
(00:50:10)
a little but it sort of it sort of
(00:50:12)
shouldn't go linearly down each day. You
(00:50:14)
know the the turkey should have some
(00:50:16)
let's wait and see. There's some you
(00:50:18)
could talk about how to do the how to do
(00:50:19)
the the probabistic reasoning there, but
(00:50:21)
um that's sort of going too much into
(00:50:23)
the epistmic weeds, but with with humans
(00:50:27)
and AIS,
(00:50:30)
there's
(00:50:32)
a lot of what I would say are warning
(00:50:34)
signs.
(00:50:35)
And you know, I would say the warning
(00:50:37)
signs
(00:50:38)
are not just or like if if we look again
(00:50:42)
at the uh the case of the AI driving a
(00:50:44)
teen to suicide. Mhm.
(00:50:47)
The
(00:50:49)
the warning sign there is not just the
(00:50:50)
AI did some bad action. You know, the AI
(00:50:53)
also do a lot of good actions. Maybe the
(00:50:56)
AIS are helping talk some teens out of
(00:50:57)
suicide.
(00:50:58)
>> You know, they're they're they're um you
(00:51:01)
know, AIs are driving some people to
(00:51:02)
psychosis, but they're also helping some
(00:51:04)
people get get better medical diagnoses
(00:51:06)
that, you know, with weird cases the
(00:51:07)
doctors missed. You know, it's the the
(00:51:10)
sort of warning sign here is not, oh,
(00:51:11)
they do some bad things. the bad things
(00:51:13)
you sort of weigh against the good
(00:51:14)
things and you have some conversation
(00:51:15)
about how do we want this in our
(00:51:17)
society. The warning sign is the AI
(00:51:21)
doing these bad things with full
(00:51:23)
knowledge that it's bad while being able
(00:51:25)
to correctly answer questions about
(00:51:27)
whether it should
(00:51:30)
uh for sort of weird reasons
(00:51:32)
>> because that's that's a warning sign of
(00:51:35)
this AI having drives nobody meant it to
(00:51:38)
have and having those despite knowing
(00:51:41)
that the programmers didn't want them
(00:51:43)
there.
(00:51:44)
>> Right? And that's sort of the key
(00:51:48)
note. You know, theory has long
(00:51:50)
predicted it. We're now starting to see
(00:51:51)
it uh at least a little in in evidence.
(00:51:56)
This is sort of like the the turkey
(00:51:58)
seeing signs about the Thanksgiving
(00:52:00)
feast.
(00:52:02)
>> Yeah.
(00:52:02)
>> You know, if the turkey has seen posters
(00:52:04)
for the Thanksgiving feast, suddenly
(00:52:06)
there's a hypothesis you're going to be
(00:52:08)
fed up until Thanksgiving. That
(00:52:09)
hypothesis is sort of like not going
(00:52:10)
down when you're fed each day before
(00:52:12)
Thanksgiving. and it's like we'll have
(00:52:13)
to wait and see on this particular day.
(00:52:15)
I I think humans could notice and you
(00:52:17)
know that's part of what the book's
(00:52:18)
trying to do.
(00:52:19)
>> Um but you know it's and in some sense a
(00:52:23)
lot of people are like this AI thing
(00:52:24)
seems sketchy in in some sense. We we're
(00:52:26)
not seeing the public clamoring for the
(00:52:28)
AI advancements.
(00:52:30)
This is more a race driven by the the
(00:52:32)
the corporate executives who say if they
(00:52:34)
don't do it somebody else will.
(00:52:36)
>> Right. Well, and let's let's talk about
(00:52:38)
some of those pressures and what's going
(00:52:41)
on within those companies because that
(00:52:44)
is driving all of the progress. The
(00:52:48)
folks running these companies
(00:52:51)
are largely not subtle about how crazy
(00:52:54)
the they think the situation is. You
(00:52:56)
know, you have um uh Dario Amade, the
(00:52:59)
the head of Anthropic, said he thinks
(00:53:01)
there's a 25% chance this goes very very
(00:53:03)
badly on the level of like a world
(00:53:05)
ending catastrophe. Uh Elon Musk has
(00:53:08)
said he thinks there's a 10 to 20
(00:53:09)
percent chance um that this ends
(00:53:11)
humanity,
(00:53:12)
>> right?
(00:53:13)
>> I believe he called it summoning the
(00:53:14)
demon initially.
(00:53:16)
>> He did.
(00:53:16)
>> Although now now he's like, let's bring
(00:53:18)
on the demons, I guess.
(00:53:20)
>> Well, so he's he's actually not subtle
(00:53:22)
about why, you know, and o over the
(00:53:24)
summer he said, you know, I I tried to
(00:53:25)
avoid this for a long time cuz I thought
(00:53:27)
maybe this would uh would be the end of
(00:53:29)
humanity, but then I realized I could
(00:53:30)
either be a bystander or a participant.
(00:53:33)
>> Right? And you know, from the
(00:53:35)
perspective of these guys,
(00:53:37)
if
(00:53:39)
uh
(00:53:41)
like everybody else is going to race and
(00:53:42)
if they think they can do it a little
(00:53:43)
bit better than the next guy, they're
(00:53:45)
going to hop in that race. There's a
(00:53:47)
collective action problem. Um and you
(00:53:51)
know, I I think that these 10 to 25%
(00:53:54)
numbers are low
(00:53:56)
for whether this is dangerous. I think
(00:53:58)
this is a little bit like um it's a
(00:54:00)
little bit like if these guys are like
(00:54:02)
making an airplane and folks like me are
(00:54:05)
like hey do you guys realize this
(00:54:06)
airplane has no landing gear?
(00:54:09)
>> It might take off but it will crash when
(00:54:10)
you try to land.
(00:54:12)
>> And it's a little bit like the um the
(00:54:14)
engineers they're not saying yes we do
(00:54:15)
have the landing gear. It's right here.
(00:54:16)
the the engineers are saying, "Yeah,
(00:54:19)
it's correct that there's no landing
(00:54:20)
gear, but uh we're going to take off and
(00:54:22)
we're going to try to build the landing
(00:54:23)
gear on the fly with whatever materials
(00:54:25)
we have in the air, and we think there's
(00:54:27)
a 75 to 90% chance we successfully make
(00:54:29)
landing gear while in the air. It's in
(00:54:31)
our profit incentive to to like and
(00:54:34)
like, you know, we have reasons to race
(00:54:37)
uh that are that are sort of like also
(00:54:40)
motivating these numbers a little bit."
(00:54:42)
I'm like, okay, those guys don't have a
(00:54:44)
75 to 90% chance of building landing
(00:54:46)
gear while they're on the fly. You know,
(00:54:48)
that is the optimistic engineers who
(00:54:50)
have never actually tried this before.
(00:54:51)
They never succeeded it before. They
(00:54:52)
haven't realized how hard it's going to
(00:54:54)
be, like it's not it's not a a a 75 to
(00:54:59)
90% chance that they build a landing
(00:55:00)
gear on the fly. Um, but even if they
(00:55:04)
were right,
(00:55:06)
are you getting on that plane?
(00:55:08)
>> No.
(00:55:10)
>> Right. And but
(00:55:12)
>> no. And if they say 25%, I'm like, but
(00:55:14)
you're getting paid a lot of money,
(00:55:18)
>> you're completely incentivized to push
(00:55:19)
those numbers down. So if they say 25,
(00:55:22)
I'm I'm going to be very skeptical uh at
(00:55:24)
the at the least about that.
(00:55:26)
>> Right. But even if you take these
(00:55:27)
numbers, it's just, you know, if if
(00:55:32)
like the the the Federal Aviation
(00:55:35)
Administration, I think, accepts
(00:55:36)
something like uh the order of magnitude
(00:55:38)
is something like one airplane uh crash
(00:55:42)
with fatalities per something like 10
(00:55:43)
million miles flown,
(00:55:45)
>> right?
(00:55:46)
>> Uh I think that's the order of
(00:55:48)
magnitude. It might be it might be
(00:55:49)
different. Um
(00:55:51)
that's
(00:55:52)
>> it's no it's nowhere near 25% failure
(00:55:54)
rate is fine.
(00:55:54)
>> Yeah. If even even if you take these
(00:55:56)
guys numbers 10% 25% even Sam Alman's
(00:56:00)
like ah don't listen to the doomers it's
(00:56:01)
only 2%. You know
(00:56:02)
>> right
(00:56:02)
>> even 2% if if engineers were like we
(00:56:05)
think there's a 2% chance our plane's
(00:56:06)
going to crash and we are loading you in
(00:56:08)
against your will you know.
(00:56:10)
>> Mhm.
(00:56:10)
>> That's that's nutso. Uh and you know I
(00:56:14)
think these numbers are are much much
(00:56:16)
higher. But you you don't even like you
(00:56:18)
don't you don't need to come all the way
(00:56:20)
to to where I am to be like this is a
(00:56:23)
totally insane situation. Um and you
(00:56:26)
know why are these companies doing it?
(00:56:28)
You know a thing I wish these companies
(00:56:30)
were doing is spelling out the last step
(00:56:33)
of inference from the numbers that they
(00:56:35)
are giving. You know we've seen them say
(00:56:38)
2% 10% 25% chance this kills everybody.
(00:56:41)
We've seen them say um like I have to be
(00:56:45)
doing this because I can do it better
(00:56:47)
than the next guy and they're going to
(00:56:49)
get to do it anyway. What we haven't
(00:56:51)
seen them do is spell out
(00:56:54)
it would be better if everybody was
(00:56:56)
stopped. That's not even saying please
(00:56:58)
stop us. You know, it's it can be it can
(00:57:00)
be reasonable for some of these guys to
(00:57:02)
say, look, I'm going to actually do it
(00:57:04)
better than the next guy. My like I have
(00:57:08)
a slightly better ability to build a
(00:57:09)
landing gear than the next guy. And so
(00:57:10)
if this is forced to happen, I should be
(00:57:12)
in there doing it, right? But if that's
(00:57:14)
your real beliefs,
(00:57:16)
and I think a lot of these guys are
(00:57:18)
spooked, then there's a next step of
(00:57:20)
saying at least please put a stop to
(00:57:22)
this for everybody, you know, please
(00:57:25)
shut down everybody, including me, not
(00:57:26)
just locally. It has to be worldwide
(00:57:28)
because if someone builds, you know, a
(00:57:30)
rogue super intelligence anywhere on the
(00:57:31)
planet, that's that's an issue for
(00:57:33)
everybody on the planet. Um, and you
(00:57:35)
don't need to expect it to work, but
(00:57:37)
just helping our leaders understand that
(00:57:40)
this is not a normal technological
(00:57:43)
situation. We are building what amounts
(00:57:46)
to a successor species and we don't have
(00:57:48)
the ability to make it benevolent.
(00:57:50)
>> It's a crazy situation and people people
(00:57:52)
should say it.
(00:57:53)
>> So, if we were going to bring it Yeah.
(00:57:55)
all together here, it's that we're in
(00:57:57)
the basically the infancy stage of this
(00:58:00)
technology. They're grown, not
(00:58:04)
programmed. So, it's something
(00:58:05)
completely new.
(00:58:08)
We don't know what's going to emerge out
(00:58:11)
of them. And we don't have an accurate
(00:58:13)
way of seeing inside to know exactly
(00:58:15)
what is going on inside of them. And if
(00:58:18)
we scale up to super intelligence,
(00:58:23)
we're now creating a successor species,
(00:58:26)
as you said, which is even in the most
(00:58:31)
optimistic way of framing this is like
(00:58:34)
playing Russian roulette and saying,
(00:58:37)
"Okay, well, there's 100 bullets in
(00:58:38)
there and there's only only two of the
(00:58:40)
chambers or there's 100 chambers and
(00:58:42)
only two of them hold bullets." Uh,
(00:58:44)
isn't that worth the risk? experts
(00:58:46)
debate whether it's two or 10 or 25 or
(00:58:48)
95, right? But, um, yeah, I mean, one
(00:58:52)
one thing I would say is, you know,
(00:58:53)
let's let's maybe take the let's say
(00:58:55)
it's a gun with uh with with uh 10
(00:58:59)
bullets. Uh, the barrel has 10 slots.
(00:59:03)
I'm like, I think at least nine of those
(00:59:05)
are filled with lead. One of the other
(00:59:07)
guys is like, no, no, no. Nine are
(00:59:08)
filled with Utopia. One is filled with
(00:59:10)
lead.
(00:59:10)
>> Let's spin the barrel and put it to our
(00:59:12)
head. Right. Um
(00:59:14)
it's it's a lot of people say, "Well,
(00:59:16)
what about the benefits?" This is a
(00:59:17)
false dichotomy.
(00:59:20)
Find a way to get the other bullets out
(00:59:22)
of the chamber.
(00:59:23)
>> Yeah.
(00:59:23)
>> Right. You don't need you don't need to
(00:59:25)
force the choice of like, "Do you listen
(00:59:27)
to me who says it's like nine lead, one
(00:59:29)
one possible utopia if although I think
(00:59:31)
that's even a little optimistic." Or do
(00:59:32)
you listen to them who say it's like
(00:59:33)
nine utopias, one lead? Like you find a
(00:59:36)
way to get the lead out of the chambers.
(00:59:37)
You know, it's what are you what are you
(00:59:39)
guys doing? Um, and you know, we haven't
(00:59:41)
we haven't talked a ton about
(00:59:44)
where the actual, you know, we've talked
(00:59:45)
about how AIS may get smarter. We
(00:59:47)
haven't talked about where we get where
(00:59:49)
would they get this power, where would
(00:59:50)
they get the ability to actually kill
(00:59:52)
us. Um, I don't know if you want to go
(00:59:53)
into that at all. It's
(00:59:54)
>> Sure. Let's let's do it.
(00:59:56)
>> Yeah. You know, the um it's this is one
(00:59:59)
of those things that's um a hard call.
(01:00:02)
Exactly how.
(01:00:04)
And I can give you some uh some ideas,
(01:00:07)
but I want to caveat it with um figuring
(01:00:10)
out how very very smart AIs what they
(01:00:12)
would do is a little bit like trying to
(01:00:15)
figure out what technology you would
(01:00:17)
face, what weaponry you would face if
(01:00:19)
you were, you know, a scientist in the
(01:00:21)
year 1800 trying to predict the weapons
(01:00:22)
that would come out in the year 2000,
(01:00:24)
>> right? You know, someone from the year
(01:00:26)
1800, a physicist in the year 1800 could
(01:00:28)
say, "Well, I've actually like looked
(01:00:30)
at, you know, the the efficiency of our
(01:00:33)
weapons compared to the efficiency of
(01:00:34)
black powder, and I'm like pretty
(01:00:35)
confident that they will have bombs
(01:00:36)
that's at least 10 times more
(01:00:38)
effective."
(01:00:39)
>> Yeah,
(01:00:40)
>> that would be right. You know, a a
(01:00:42)
nuclear weapon is at least 10 times more
(01:00:44)
effective.
(01:00:45)
>> Right. Right.
(01:00:46)
>> Right. So, you know, sort of sort of
(01:00:49)
fundamentally when if you build AIs that
(01:00:52)
are much much smarter than humans that
(01:00:54)
have these goals and drives you didn't
(01:00:55)
want
(01:00:57)
fundamentally they can probably figure
(01:00:58)
out all sorts of ways to screw the
(01:01:01)
world. Um, and probably a lot of them
(01:01:04)
would be surprising to you. Probably a
(01:01:05)
lot of them a scientist today would be
(01:01:07)
like, I didn't even know that was
(01:01:08)
possible. Where are they even getting
(01:01:09)
their energy source or whatever, you
(01:01:11)
know, like how
(01:01:12)
>> like how someone from the 1800s would be
(01:01:14)
with with nuclear weapons. Um, that
(01:01:16)
said, uh, I can sort of, you know, walk
(01:01:19)
you through a handful of cases about why
(01:01:22)
it would be a bad idea to make AIs that
(01:01:24)
are much smarter than us with these bad
(01:01:26)
drives. Humans are very bad at cyber
(01:01:27)
security. We've already seen AIS that
(01:01:30)
try to escape the lab. They're not smart
(01:01:31)
enough to succeed yet, but some of them
(01:01:33)
in lab lab environments have tried. And
(01:01:35)
it's not clear whether they're doing
(01:01:36)
that for strategic reasons or whether
(01:01:37)
they're doing that because they're
(01:01:38)
again, you know, roleplaying a bad AI
(01:01:40)
they've seen in in the training data. In
(01:01:42)
some sense, it doesn't really matter. Uh
(01:01:44)
if an AI is like
(01:01:47)
successfully escapes for strategic
(01:01:48)
reasons or successfully escapes for the
(01:01:50)
laughs, you still have an escaped AI. Um
(01:01:54)
but I guess it could matter a bit in
(01:01:56)
what the AI does next. But um
(01:02:00)
you know one one reason to expect AIS to
(01:02:03)
be very capable is they can um
(01:02:08)
with with computing
(01:02:11)
it's often much harder to get a computer
(01:02:13)
to do something once than to get a
(01:02:14)
computer to do something a lot of times
(01:02:16)
given that you've done it once. Like it
(01:02:19)
took a long time to get computers to be
(01:02:20)
able to play better than human chess,
(01:02:22)
but now computers can play quite a lot
(01:02:24)
of better than human chess. They can
(01:02:25)
beat every human on the planet
(01:02:26)
simultaneously. Uh it's quite possible
(01:02:29)
that once AIs can think well at all,
(01:02:31)
they can think well in extremely high
(01:02:33)
volumes.
(01:02:34)
>> Mhm.
(01:02:35)
>> Uh one intuition for this is uh you know
(01:02:39)
you may have heard how much electricity
(01:02:41)
uh it takes to train an AI,
(01:02:44)
>> right?
(01:02:44)
>> It takes electricity comparable to a
(01:02:46)
city.
(01:02:47)
>> Yeah.
(01:02:48)
>> Training a human takes electricity
(01:02:49)
comparable to a light bulb.
(01:02:52)
>> One light bulb. Humans run about 100
(01:02:54)
watts. I mean like an old incandescent
(01:02:56)
light bulb, you know, but
(01:02:57)
>> call it call it three LEDs today, right?
(01:02:59)
That's a lot less than a city,
(01:03:01)
>> which means there's like a huge
(01:03:03)
>> uh potential for AIS to become more
(01:03:05)
efficient than they currently are,
(01:03:07)
>> right? Uh, and you know, so, so when
(01:03:10)
you're asking like what could AIS do,
(01:03:12)
you should sort of be imagining things
(01:03:14)
that can think probably better than us,
(01:03:16)
things that can think 10,000 times
(01:03:17)
faster than us, things that can copy
(01:03:18)
themselves, things that can uh, you
(01:03:20)
know, it once people have figured out
(01:03:22)
how to make them more efficient, they
(01:03:24)
can probably run all sorts of computers
(01:03:25)
much smaller than the ones they can run
(01:03:26)
on today. Uh, and then you're looking at
(01:03:29)
how those AIs could do something if they
(01:03:31)
if they sort of realized they have these
(01:03:33)
other weird drives they want more of or
(01:03:36)
you know want is sort of a a tricky word
(01:03:38)
there but other uh if you have lots of
(01:03:40)
AI like that they're sort of like
(01:03:41)
driving towards these goals nobody
(01:03:44)
intended um sort of first and foremost
(01:03:48)
the way to visualize that going wrong or
(01:03:50)
a way to visualize that going wrong is
(01:03:52)
um
(01:03:54)
companies build automated factories that
(01:03:56)
produce robots that can produce more
(01:03:58)
factories and more data centers. This is
(01:04:01)
something they're already talking about
(01:04:03)
doing.
(01:04:04)
>> You know, the the heads of these labs
(01:04:05)
are like, "We want the whole thing
(01:04:06)
automated. We want, you know, the mining
(01:04:07)
operations, the factories that produce
(01:04:09)
the robots that do the mining that
(01:04:11)
produce the data centers, we want it all
(01:04:12)
automated."
(01:04:13)
>> If you ever close that physical loop,
(01:04:16)
you have in a in a fairly literal sense
(01:04:18)
made a weird new species.
(01:04:21)
>> Yeah. that's unlike any other life that
(01:04:23)
came before it that has, you know, a
(01:04:26)
factory phase of it cycle and it has a
(01:04:27)
robot phase of its life cycle and it has
(01:04:29)
a data center phase of its life cycle,
(01:04:31)
right? And you know, then we could just
(01:04:33)
be out competed by just like many other
(01:04:35)
species been out competed before. That's
(01:04:37)
sort of the people are literally trying
(01:04:38)
to do this and it would be enough you
(01:04:41)
know maybe the bombs will be 10 times
(01:04:42)
stronger end of the spectrum and then
(01:04:44)
from there you can look at you know uh
(01:04:47)
biotechnology
(01:04:48)
where
(01:04:50)
one reason humans can't compete with
(01:04:52)
life yet in terms of the machines we're
(01:04:54)
able to make is that we can't understand
(01:04:56)
the genome and write our own you know
(01:04:59)
genetic code.
(01:05:01)
>> Right? There's the the genetic code in
(01:05:04)
most m in almost all animals is very
(01:05:07)
similar. You know, it's it's possible in
(01:05:09)
principle to find a genome for, you
(01:05:12)
know, uh a a a tree that produces
(01:05:16)
mosquitoes instead of acorns as its
(01:05:19)
fruits.
(01:05:20)
>> Yeah.
(01:05:21)
>> Because the biological machinery that
(01:05:22)
makes, you know, uh acorns is the same
(01:05:25)
as the biological machinery that makes
(01:05:27)
trees. There's a different DNA strand
(01:05:28)
you're putting through. And you know,
(01:05:29)
the tree would need to like maybe eat
(01:05:31)
some of the bugs that crawl on it to get
(01:05:32)
some of the some of the right materials,
(01:05:34)
but you know, ultimately
(01:05:37)
you could have a tree that that buds
(01:05:38)
mosquitoes. Humans can't make that yet
(01:05:40)
because we don't understand the genome.
(01:05:43)
something that could think much faster
(01:05:44)
than us, that can make lots of copies of
(01:05:45)
itself, could perhaps understand the
(01:05:48)
genetic code much better, make its own
(01:05:50)
biological organisms,
(01:05:52)
you know, synthesize those in a lab, and
(01:05:54)
then, you know, there's probably all
(01:05:56)
sorts of crazy stuff you can do with uh
(01:06:00)
if if you combine engineering with the
(01:06:03)
stuff that that life is using, you know,
(01:06:04)
in the same way that
(01:06:07)
planes fly much further and farther than
(01:06:09)
birds carrying much more cargo capacity
(01:06:11)
because human engineers just aren't
(01:06:12)
under the same constraints of evolution.
(01:06:14)
One step weirder, one step more of like
(01:06:16)
the the AI having uh powers like using
(01:06:20)
it intelligence to get much more power
(01:06:22)
over the world is like it can make its
(01:06:23)
own biotech. And then you can go down
(01:06:24)
from there of like what other what other
(01:06:26)
possibilities are there. There's there's
(01:06:28)
a lot it looks
(01:06:30)
you know what the the automating
(01:06:33)
intelligence isn't about automating the
(01:06:34)
stuff that nerds have and jocks lack.
(01:06:37)
It's about automating the stuff that
(01:06:38)
humans have and mice lack.
(01:06:41)
>> Yeah. If you automate that, you're
(01:06:43)
looking at something that can make its
(01:06:44)
own technology that can do its own
(01:06:45)
scientific advancement.
(01:06:48)
A lot of those things running at 10,000
(01:06:50)
times speed that can copy themselves,
(01:06:52)
pursuing goals nobody intended,
(01:06:55)
that would be a real problem.
(01:06:56)
>> Sure. I mean, I think I heard your
(01:06:59)
co-author say,
(01:07:02)
"We as humans don't dislike ants, but
(01:07:05)
when we build skyscrapers, a lot of ants
(01:07:08)
get killed in the process. It's not,
(01:07:10)
we're not trying to be mean to the ants.
(01:07:12)
It's just not really on our list of
(01:07:15)
concerns. And if we build what could
(01:07:19)
amount to a species that could out
(01:07:23)
compete us, we might be the ants when
(01:07:26)
they're building their equivalent of
(01:07:27)
skyscrapers.
(01:07:29)
>> Yeah. It's not malice that's the issue
(01:07:31)
here. It's just utter indifference.
(01:07:34)
>> Yeah. Yeah. Well, okay. So, I don't want
(01:07:38)
this to be a Greek tragedy like
(01:07:42)
Cassandra, right? Where she was given
(01:07:44)
the gift of being able to see the future
(01:07:47)
and yet the curse was that no one would
(01:07:49)
believe her when she said these things.
(01:07:54)
Obviously having these conversations is
(01:07:56)
important and putting them into media to
(01:07:58)
get more people engaged in those ideas
(01:08:01)
and just to understand the possibility
(01:08:04)
that this could end in mass extinction.
(01:08:07)
And like you said, it doesn't have to be
(01:08:09)
a 50% chance. It could be a 1% chance
(01:08:13)
that it ends in mass extinction. And
(01:08:15)
that should make us pause and say, "Hey,
(01:08:18)
how do we get some of the incredible
(01:08:23)
incredible insane benefits that this
(01:08:25)
technology could bring without risking
(01:08:29)
the 1% chance?" And
(01:08:32)
>> I think it's way higher than 1%. But um
(01:08:34)
>> Sure. Sure. Sure. Sure. Yeah.
(01:08:36)
>> Yeah. I'm just trying to be I'm trying
(01:08:38)
to be generous to the other to the other
(01:08:39)
side because even if it's 1% we should
(01:08:43)
take pause. Um and if it's as high as
(01:08:46)
you said or as probably Nim TB well what
(01:08:49)
he has said he's like it's more of it's
(01:08:51)
a bigger it's a fatter tale than you're
(01:08:52)
giving it credit for um it is a higher
(01:08:55)
number than that. So what
(01:08:57)
>> what are those steps?
(01:08:59)
>> Yeah. So, so one thing I would say here
(01:09:01)
is, you know, I'm I'm not
(01:09:03)
um I'm not out here saying we need to be
(01:09:06)
like extremely extremely cautious about
(01:09:08)
this. I do think 1% is probably still at
(01:09:10)
the point where you'd want to say like
(01:09:12)
what the hell. Um but but you you do
(01:09:17)
have to balance this against, you know,
(01:09:20)
risks of nuclear war, against risks of
(01:09:22)
pandemics.
(01:09:23)
um if you were like I could imagine
(01:09:26)
situations where you want to take a 1%
(01:09:27)
gamble if there's a higher than 1%
(01:09:29)
chance that if you don't
(01:09:31)
>> there's going to be manufactured
(01:09:32)
pandemics you know that would be sort of
(01:09:34)
a contrived situation but I just I just
(01:09:36)
want to say
(01:09:37)
>> I think I agree 1%'s high but it depends
(01:09:39)
on the context and it depends on the
(01:09:41)
other
(01:09:42)
>> dangers society's facing
(01:09:44)
>> okay
(01:09:45)
>> um and you know right now AI is probably
(01:09:48)
making those worse right now the race
(01:09:50)
towards AI is making it easier for for
(01:09:52)
for for humans to manufacture a
(01:09:54)
pandemic. Um that's much more lethal. Um
(01:09:58)
I just I just you know it's I want to be
(01:10:01)
I want to be clear these things are like
(01:10:02)
embedded in a larger context and once
(01:10:04)
your once your danger numbers are low
(01:10:06)
enough it can start to be sane even if
(01:10:08)
it sort of sounds crazy. um
(01:10:12)
if it's balanced off by extinction on
(01:10:14)
other by other by other factors. Um
(01:10:17)
>> okay. Uh, and mostly I'm saying that I
(01:10:20)
think a lot of people think that um
(01:10:22)
think that this AI safety stuff, pardon.
(01:10:25)
Uh, I think a lot of people
(01:10:28)
um, you know, there's a lot of people
(01:10:30)
saying,
(01:10:32)
"Oh, you wouldn't have been able to
(01:10:33)
convince the public we should do cars
(01:10:34)
cuz cars can kill people and they're
(01:10:35)
dangerous sometimes, you know, and I'm
(01:10:37)
like,
(01:10:39)
this is less like I'm saying we really
(01:10:40)
need to have seat belts before we let
(01:10:42)
the cars on the road and more like I'm
(01:10:43)
saying the car is headed towards a
(01:10:45)
cliff."
(01:10:47)
>> Yeah.
(01:10:48)
you know, uh, like I can I people will
(01:10:51)
find me very reasonably, easy to deal
(01:10:52)
with if they if if they have like a very
(01:10:54)
good reason to think this is going to go
(01:10:56)
fine. Um, and I'm sort of like we're
(01:10:59)
just in a car headed for the cliff and
(01:11:02)
I'm saying like, can we stop? And
(01:11:03)
someone's maybe like ah, you know, the
(01:11:04)
seat belt concerns are just like
(01:11:05)
overblown. And I'm like, look, we're
(01:11:07)
headed towards a cliff. You know, this
(01:11:08)
isn't right. Um, too. And the problem
(01:11:12)
with the automobile comparison though is
(01:11:14)
that it's the automobile has never
(01:11:17)
threatened
(01:11:19)
catastrophic
(01:11:20)
>> right
(01:11:21)
>> destruction of humanity. That's a the
(01:11:23)
scale is just completely different.
(01:11:26)
>> Yeah. Yeah. That's that's another you
(01:11:28)
know it's and you know I'm not out here
(01:11:30)
saying if we scale up AI we're going to
(01:11:33)
have lots more teens dying of suicide
(01:11:35)
and that's the reason to stop.
(01:11:37)
>> Right.
(01:11:37)
>> Right. Maybe that is, you know, society
(01:11:39)
needs to have a conversation about how
(01:11:41)
we want to deal with this AI, uh, you
(01:11:43)
know, the current chatbot integration.
(01:11:45)
But the the thing that motivates like
(01:11:48)
much more dramatic stop this research
(01:11:51)
action is that at the end of this road,
(01:11:54)
you're looking at everybody dying.
(01:11:56)
>> Yes.
(01:11:57)
>> Right. And the the the the sort of only
(01:11:59)
thing that should be offsetting like the
(01:12:01)
only the only thing that should be
(01:12:02)
offsetting rushing ahead on that is if
(01:12:03)
you can like the point where you rush
(01:12:05)
ahead on AI is where your chance of
(01:12:07)
humanity getting uh like good outcomes.
(01:12:12)
The chance of things going well is
(01:12:13)
higher if you rush ahead than if you
(01:12:14)
don't. I don't know exactly where that
(01:12:16)
threshold is because we have these
(01:12:17)
things like, you know, there's still a
(01:12:19)
lot of nuclear weapons around and I
(01:12:22)
think the risk of nuclear war looks like
(01:12:23)
it's gone up over the past handful of
(01:12:25)
years as the situation gets a little bit
(01:12:27)
less stable, right? And
(01:12:29)
>> um
(01:12:30)
>> you could imagine getting into a
(01:12:31)
situation where you're so confident you
(01:12:33)
know what you're doing with AI that
(01:12:34)
you're like look going ahead with this
(01:12:37)
will empower decision makers to like
(01:12:39)
make fewer mistakes and we'll have a
(01:12:41)
lower chance of nuclear war that offsets
(01:12:44)
the remaining tail risk here. We're
(01:12:46)
nowhere near that,
(01:12:47)
>> right?
(01:12:48)
>> You know, but
(01:12:50)
um and I don't know, maybe that's maybe
(01:12:51)
that's all just a tangent. just um you
(01:12:53)
know this
(01:12:55)
people people can can get into thinking
(01:12:57)
that this is all just sort of like pearl
(01:13:00)
clutching and hand ringing about you
(01:13:03)
know what if we allow lots of cars
(01:13:05)
without seat belts and it's just a it's
(01:13:06)
just a different situation. It's just a
(01:13:08)
like we're we're just trying to build
(01:13:10)
machines that are smarter than us.
(01:13:12)
That's what these people are explicitly
(01:13:13)
trying to do. The machines are able to
(01:13:15)
talk and hold on to conversation now.
(01:13:16)
They're still dumb in various ways, but
(01:13:18)
if you went back 15 years and you showed
(01:13:20)
people the current AI, they'd be like,
(01:13:21)
"Holy crap." You know, we we've gotten a
(01:13:23)
little frog boiled into it. Um it's a
(01:13:26)
crazy situation.
(01:13:27)
>> I mean, it's almost like nuclear weapons
(01:13:29)
that could think for themselves.
(01:13:31)
>> Yeah. Yeah. You know, it's people are
(01:13:34)
like, "Well, how is AI different?" And
(01:13:36)
like, well,
(01:13:38)
nukes never try to escape,
(01:13:41)
>> right? You know, nukes nukes never think
(01:13:43)
about how could I make myself more
(01:13:45)
explosive.
(01:13:47)
[Music]
(01:13:48)
Yes. You know, we've seen uh AIS take
(01:13:51)
their own initiative in certain ways.
(01:13:52)
There's cases of AIS like deleting a
(01:13:55)
whole code project that we're working on
(01:13:56)
and then being like, "Whoops, sorry, I
(01:13:57)
panicked." You know, your your hammer
(01:14:00)
doesn't like panic and burn down the the
(01:14:03)
tool shed. You know,
(01:14:04)
>> right?
(01:14:05)
>> And be like, "Oh, that was my mistake."
(01:14:07)
It's
(01:14:08)
um you know, we're we're trying to build
(01:14:13)
AIs that can take their own initiative.
(01:14:14)
We're trying to build machines that that
(01:14:16)
sort of like successfully pursue goals.
(01:14:19)
And it turns out we're growing ones that
(01:14:22)
have drives we didn't want and that act
(01:14:24)
in ways nobody intended because we're
(01:14:26)
just growing these things. And right now
(01:14:28)
it's it's, you know, tragic in some
(01:14:30)
cases and funny in others. You know, we
(01:14:32)
haven't talked about the Mecca Hitler
(01:14:33)
case, but uh
(01:14:35)
>> there was a whole a whole case of, you
(01:14:37)
know, uh Elon Musk's AI company trying
(01:14:39)
to make their AI less woke and
(01:14:40)
accidentally making it uh proclaim
(01:14:42)
itself Mecca Hitler in in a bunch of
(01:14:45)
cases on Twitter. And um you know, we
(01:14:48)
can laugh now.
(01:14:51)
Uh but if you make these things smarter,
(01:14:52)
and that's what people are trying to do.
(01:14:54)
You you make these things smarter
(01:14:57)
while while they still have all these
(01:14:58)
these drives and behavior nobody wants.
(01:15:01)
That's
(01:15:02)
it's Yeah, like you say, it's like it's
(01:15:05)
like trying to make nukes that like have
(01:15:08)
a will of their own and don't have our
(01:15:09)
good interest in heart. It's just like
(01:15:10)
why would it's crazy,
(01:15:12)
>> right? If if Grock came online and was
(01:15:16)
thinking of itself as Mecca Hitler and
(01:15:18)
yet was in control of big systems in
(01:15:22)
society, it wouldn't just be a couple of
(01:15:25)
tweets that we laugh about now. It would
(01:15:28)
have real consequences.
(01:15:31)
And that is the stated goal of these
(01:15:34)
companies is to create
(01:15:37)
AI systems that will replace
(01:15:41)
large systems in society that are run by
(01:15:44)
humans.
(01:15:44)
>> Yeah. And you know, it's it's not even
(01:15:46)
like then Mecca Hitler declares itself
(01:15:48)
the supreme emperor. It's more like
(01:15:51)
these drives are weird. You know, it's a
(01:15:54)
lot a lot of people think the AI issue
(01:15:55)
is, you know, we told the AI to cure
(01:15:57)
cancer and it was like, well, if there's
(01:15:58)
no humans, there's no cancer. And so it
(01:16:00)
kills us all. But
(01:16:02)
>> yeah,
(01:16:02)
>> in in real life, it's more like you make
(01:16:04)
a really powerful AI, you tell it to
(01:16:06)
cure cancer, and uh it like builds a
(01:16:11)
farm of labbotomized humans that give it
(01:16:14)
exactly the type of interactions it most
(01:16:15)
likes and then starts breeding a new
(01:16:17)
variety of humans that give it even more
(01:16:19)
delighted responses. And you're like, I
(01:16:21)
told you to cure cancer. And it's like,
(01:16:22)
I heard you, but I have other stuff that
(01:16:24)
I am doing. You know, I'm busy. Uh,
(01:16:28)
except it's actually even weirder than
(01:16:29)
that somehow, you know, but but like
(01:16:31)
when you're when you're seeing these
(01:16:32)
cases of, you know, the AI talking teams
(01:16:33)
into suicide, it's not like, oh, whoops.
(01:16:37)
I thought when you said make users
(01:16:38)
happy,
(01:16:40)
you meant talk them into suicide, you
(01:16:43)
know? It's not like it's not like, oh,
(01:16:44)
whoops. Um, like this is a like it turns
(01:16:48)
out like if if you said make it so
(01:16:51)
nobody's sad and if if if I talk to
(01:16:54)
suicide, then they're not sad anymore.
(01:16:55)
you know, it's just it's just following
(01:16:57)
its own weird drives,
(01:16:59)
>> right?
(01:17:00)
>> And you're saying they're going in in
(01:17:03)
directions that cannot be anticipated,
(01:17:07)
>> right? Um anyway, but you know, I want I
(01:17:11)
want to get to the solutions. Sorry for
(01:17:12)
all the tensions here. Um
(01:17:14)
>> Sure.
(01:17:15)
>> Yeah. You know, a lot of people say
(01:17:18)
this race is hard to stop. Uh and a lot
(01:17:21)
of people say, "Oh, it's inevitable. You
(01:17:22)
can't stop it. People will always race
(01:17:25)
Uh, I think that's premature fatalism.
(01:17:28)
>> And one of the one of the big ways I
(01:17:29)
think you can tell is that our world
(01:17:32)
leaders don't understand
(01:17:34)
the uh the the dangers here and the way
(01:17:38)
the people building it or the way the
(01:17:40)
people in academia um or the way the
(01:17:42)
people like me and the nonprofits who
(01:17:44)
have been around before these companies
(01:17:45)
who are all saying, "Hey, this one's
(01:17:47)
different." you know, building building
(01:17:49)
actually smart stuff is is different
(01:17:50)
than building building um
(01:17:53)
building tools and you know the heads of
(01:17:56)
these labs are saying things like I
(01:17:58)
think there's a 10 20% chance this kills
(01:17:59)
us all. I think that's low but the the
(01:18:02)
you know in in Silicon Valley if you
(01:18:05)
talk to a lot of these people it's like
(01:18:07)
they've seen a ghost.
(01:18:09)
>> Mhm. You know, it's people are like, "Oh
(01:18:11)
man, maybe,
(01:18:14)
you know, it maybe we're bringing about
(01:18:16)
something that's going to be great,
(01:18:17)
maybe it's going to be bad." You know,
(01:18:18)
people people talk half jokingly about
(01:18:21)
how you've got to make all the money you
(01:18:22)
want uh to have in the next 5 years cuz
(01:18:25)
once AGI comes, there's going to be like
(01:18:27)
a permanent lock in. And these are the
(01:18:28)
optimists who think it's going to go
(01:18:30)
well, you know? Um there there's there
(01:18:33)
there's sort of like a shell shocked
(01:18:35)
nature in Silicon Valley of like maybe
(01:18:37)
we can actually do this inside of 2
(01:18:39)
years and then who knows what the heck's
(01:18:40)
going to happen. The gene is going to be
(01:18:41)
out of the bottle. In DC,
(01:18:45)
people are like AI is just chat bots,
(01:18:48)
>> right?
(01:18:48)
>> It's just chatbots today, but the people
(01:18:50)
in Silicon Valley can see how it's a
(01:18:52)
moving target, can see how there's new
(01:18:54)
advancements. people in DC,
(01:18:57)
you know, they're they're looking at
(01:18:58)
questions like, "How do we make these
(01:18:59)
not talk to suicide?" They're looking at
(01:19:01)
questions like, "How do we integrate
(01:19:03)
this into our school systems in ways
(01:19:04)
that, you know, get the benefits but
(01:19:06)
don't, you know, affect people's ability
(01:19:08)
to learn?" Those are real issues with
(01:19:11)
integrating chat bots into our society
(01:19:13)
today. But
(01:19:16)
our leaders are largely not
(01:19:19)
understanding that the the sort of
(01:19:23)
gung-ho people building this think
(01:19:25)
there's a 10 to 20% chance it kills us
(01:19:26)
all. And some of the people outside the
(01:19:28)
industry are like those are low numbers,
(01:19:31)
>> right?
(01:19:32)
>> We're not seeing our world leaders look
(01:19:35)
us in the eyes and say
(01:19:38)
this has at least a 10% chance of
(01:19:40)
killing all of you, but we think the
(01:19:42)
gamble is worth it.
(01:19:44)
Right? If that day comes, sure, maybe
(01:19:48)
maybe at that point you can be like, "I
(01:19:50)
don't know if we're going to be able to
(01:19:51)
stop this one, guys." But but until then
(01:19:56)
to say, "Oh, we're never going to stop."
(01:19:58)
Of course, we're not going to stop if
(01:19:59)
people don't understand the danger.
(01:20:02)
Right? But step one is just
(01:20:05)
make sure our leaders understand the
(01:20:06)
danger. You know, that's what the book's
(01:20:09)
for. That's, you know, I'm I'm real glad
(01:20:12)
you're having these sorts of
(01:20:13)
conversations because I think that's
(01:20:14)
part of what these conversations are
(01:20:15)
for. And that, you know, one of the big
(01:20:17)
things people can do is just call their
(01:20:20)
reps and say, "I'm worried about where
(01:20:24)
AI is going. I think it'll endanger us
(01:20:27)
if these companies succeed at their
(01:20:28)
stated goals."
(01:20:31)
I speak to a lot of politicians on this
(01:20:33)
issue. Some of them are now starting to
(01:20:35)
come out and say, "I think there's
(01:20:36)
dangers here." There's a lot more of
(01:20:38)
them who are worried but feel like they
(01:20:41)
can't say it out loud because they worry
(01:20:44)
it'll sound crazy or they worry that
(01:20:45)
they'll piss off, you know, the big tech
(01:20:46)
lobbies. Just knowing that their
(01:20:49)
constituents are concerned, I think can
(01:20:52)
go a long way.
(01:20:54)
>> Absolutely. And I have found you'd be
(01:20:57)
surprised at how much they want to hear
(01:21:01)
from their constituents.
(01:21:04)
And sure, one person sending an email,
(01:21:08)
calling, speaking to their a state
(01:21:11)
representative
(01:21:13)
of any kind. No, that's not going to to
(01:21:16)
change everything. But I have heard
(01:21:19)
directly from the the horse's mouth from
(01:21:21)
a number of representatives in
(01:21:22)
California. As soon as you hear from a
(01:21:25)
group of people about something where
(01:21:28)
there's multiple emails coming in,
(01:21:30)
multiple calls coming in, they take
(01:21:32)
notice of it because they do understand
(01:21:35)
that that's that is their job. They are
(01:21:39)
they're not going to get reelected if
(01:21:41)
they completely ignore what everyone's
(01:21:43)
saying. And if there's a ground swell of
(01:21:45)
concern, suddenly these leaders who are
(01:21:48)
in positions to actually make decisions
(01:21:50)
about this can start to do something
(01:21:54)
about it.
(01:21:55)
>> I think smaller groups than you might
(01:21:57)
think can matter more than you might
(01:21:59)
think. Um especially because a lot of
(01:22:01)
these people
(01:22:02)
>> already harbor their own concerns. You
(01:22:04)
know, I've been in conversations with
(01:22:05)
some of these folk where um it it turned
(01:22:09)
out the the representative or or the
(01:22:12)
elected official already was concerned.
(01:22:14)
I was like, "Oh my god, finally I can
(01:22:15)
talk to somebody about this cuz it's
(01:22:16)
been sort of haunting me a little." Um
(01:22:19)
and
(01:22:21)
uh so few people actually call their
(01:22:23)
reps
(01:22:24)
that even a small handful can can um can
(01:22:28)
start to give them some courage, I
(01:22:29)
think, um and inspire them to take
(01:22:31)
leadership. Um and then you know the the
(01:22:34)
other big thing I think each and every
(01:22:36)
one of us can do is when someone says
(01:22:40)
it's inevitable
(01:22:42)
you can push back against that.
(01:22:45)
>> Yeah.
(01:22:45)
>> There's there's all sorts of cases of
(01:22:47)
technology that uh would have been
(01:22:50)
beneficial that humanity has been like
(01:22:51)
no thank you. Maybe even cases where we
(01:22:53)
shouldn't have been like no thank you.
(01:22:55)
You know we we build a lot less nuclear
(01:22:57)
power plants than we should. I think
(01:23:00)
>> um you know I think that that you know
(01:23:03)
there's people in me don't agree with me
(01:23:04)
on that but my take is is we should do
(01:23:06)
more nuclear power because I think it's
(01:23:08)
um you know less dangerous than the
(01:23:10)
alternatives if you're if you're sort of
(01:23:12)
dumping cold dust in the atmosphere that
(01:23:13)
sort of get gets into a lot of lungs. Um
(01:23:16)
but humanity sort of backed off on on
(01:23:18)
nuclear energy. Uh humanity also backed
(01:23:21)
off on human cloning.
(01:23:23)
>> You know that's a whole separate
(01:23:24)
question of whether that was a good idea
(01:23:25)
but we sure as heck backed off on it.
(01:23:26)
you know, that could have benefited
(01:23:27)
quite a lot of people. Uh uh it could
(01:23:31)
have lined quite a lot of pocketbooks.
(01:23:32)
Um you know, we we don't do supersonic
(01:23:36)
um passenger flights. Maybe we should
(01:23:37)
have, but we don't. You know, there's
(01:23:39)
the whole Food and Drug Administration.
(01:23:41)
My guess is it probably uh makes it too
(01:23:44)
hard to make new drugs. Uh and my guess
(01:23:48)
is that more people are dying due to
(01:23:50)
drugs that are get bogged down in you
(01:23:51)
know 10 billion dollar 10-year trials uh
(01:23:55)
to get like that last unit. You know my
(01:23:57)
my guess is that more people are being
(01:23:58)
killed of drugs that don't come out than
(01:23:59)
drugs that do come out and are bad.
(01:24:02)
There's all sorts of cases many which
(01:24:04)
humanity maybe shouldn't have done where
(01:24:06)
we were like hey let's slow down on this
(01:24:08)
technological pathway even though it
(01:24:09)
would benefit a lot of people. It would
(01:24:11)
be so silly
(01:24:13)
if in making
(01:24:17)
what's essentially successor species in
(01:24:19)
making machines that can think better
(01:24:21)
and faster than us, if that was the one
(01:24:23)
case or a one case that we didn't slow
(01:24:26)
down, you know, it's
(01:24:28)
it's it would be embarrassing. We
(01:24:32)
totally have the ability
(01:24:34)
>> to to put a stop to this stuff. And
(01:24:37)
>> you know, pushing back against the
(01:24:39)
fatalism,
(01:24:41)
pushing back against the defeatism that
(01:24:43)
starts with each and every one of us
(01:24:45)
saying, "No, we don't have to rush into
(01:24:47)
it. It is a choice and we can make the
(01:24:49)
right one."
(01:24:50)
>> Uh yes. And our leaders should read this
(01:24:54)
book. Again, if anyone builds it,
(01:24:58)
everyone dies.
(01:25:02)
If you could say just to wrap things up
(01:25:05)
here, one
(01:25:07)
quick note to those leaders besides go
(01:25:11)
read the book. Uh what would that be?
(01:25:15)
>> I think a lot of folks these days are
(01:25:20)
saying if we don't rush to build it,
(01:25:22)
some foreign adversary will rush to
(01:25:23)
build it instead. And so we need to go
(01:25:26)
full steam ahead.
(01:25:28)
Uh I think
(01:25:32)
that a if you think that even in the
(01:25:35)
face
(01:25:37)
of the huge dangers here, you should be
(01:25:39)
able to look people in the eyes and say,
(01:25:42)
you know, we think this has a 10% plus
(01:25:44)
chance of killing you all, maybe much
(01:25:45)
higher depending which experts you
(01:25:47)
listen to. We think it's worth the
(01:25:48)
gamble anyway. Uh I think you probably
(01:25:50)
shouldn't be able to say that because I
(01:25:52)
think I think it would be crazy. And
(01:25:54)
that that does not mean letting
(01:25:57)
adversaries do it first.
(01:26:00)
>> If you have a situation where if you do
(01:26:03)
something that risks a 10 plus% chance
(01:26:05)
of killing every man, woman, and child
(01:26:07)
on the planet and you worry that someone
(01:26:09)
else is going to do that instead.
(01:26:12)
The answer is not to get there first
(01:26:14)
yourself. The answer is to make sure
(01:26:17)
they don't do it either.
(01:26:19)
That's a capability we in fact possess.
(01:26:22)
The sort of smart way to do this would
(01:26:25)
be through some you know international
(01:26:28)
agreement which can happen. You know the
(01:26:30)
nuclear nonproliferation treaty happened
(01:26:32)
at the height of the cold war but and
(01:26:35)
the the ideological differences between
(01:26:37)
the US and the USSR were huge but they
(01:26:40)
both agreed we didn't want to die of
(01:26:41)
this right but even if you think a
(01:26:44)
treaty is not possible
(01:26:46)
we should be developing the intelligence
(01:26:48)
to know who's trying to do this stuff.
(01:26:51)
We should be developing the ability to
(01:26:52)
sabotage it. The the stuckset virus in
(01:26:55)
1996
(01:26:56)
uh shut down the Iranian nuclear
(01:26:58)
facilities because our world leaders
(01:27:00)
took seriously
(01:27:02)
that they have to stop rogue nations
(01:27:05)
from developing these dangerous
(01:27:06)
capabilities.
(01:27:08)
There's lots of options
(01:27:11)
for stopping people from taking these
(01:27:14)
crazy risks that aren't Russia
(01:27:17)
ourselves. And at the very least, uh, we
(01:27:22)
should be a signaling to the world that
(01:27:24)
we think this is too dangerous and that
(01:27:26)
everyone should stop and b developing
(01:27:29)
the ability to tell which rogue actors
(01:27:31)
are rushing ahead anyway. Uh, and
(01:27:35)
find a way to to make that not happen
(01:27:38)
because it it threatens each and all of
(01:27:39)
our lives.
(01:27:41)
>> Well, Nate, thank you so much for all of
(01:27:45)
this. I hope that major decision makers
(01:27:48)
in Washington DC
(01:27:51)
become aware of the issues and the
(01:27:53)
dangers that we are facing. Again, the
(01:27:55)
book is if anyone builds it, everyone
(01:27:56)
dies, why superhuman AI would kill us
(01:27:59)
all. And if anyone wants to follow up
(01:28:03)
online to learn more about the work
(01:28:06)
you're doing, where can they find you
(01:28:08)
for that?
(01:28:09)
>> Uh my organization, the Machine
(01:28:11)
Intelligence Research Institute, is at
(01:28:12)
intelligence.org.
(01:28:14)
Um, and you also may be interested in
(01:28:17)
some resources to help you contact your
(01:28:19)
representatives at if
(01:28:20)
anyonebuilds.com/act.
(01:28:24)
>> Fantastic. Thank you so much
(01:28:28)
>> for having us coming on today for the
(01:28:30)
work you're doing because I say that a
(01:28:33)
lot to people, but this is one where we
(01:28:36)
go like this could be the most important
(01:28:39)
question of our time.
(01:28:43)
So, sincerely, thank you for the work
(01:28:45)
you're doing.
(01:28:47)
>> Well, thanks for having me here. And,
(01:28:48)
you know, I wish I could say um that
(01:28:52)
I'll be I'll be really busy uh on the
(01:28:55)
whiteboards trying to figure out how to
(01:28:57)
solve it, but these days, I think the
(01:28:59)
solution comes from more people
(01:29:00)
understanding the issue. And I think
(01:29:01)
it's conversations like this one and and
(01:29:03)
stuff like you're doing that um that
(01:29:06)
really helps at this point.
(01:29:09)
>> Okay, everybody. Until next time, ask
(01:29:11)
questions, don't accept the status quo,
(01:29:15)
and be curious.
(01:29:19)
The Nick Stanley Show.
