Home Videos

Ex-OpenAI Scientist’s DISTURBING Warning: “It’s Coming In 2026” (YouTube Video Transcript)

Need transcripts for other videos? Try our YouTube Transcript Generator →
Title: Ex-OpenAI Scientist’s DISTURBING Warning: “It’s Coming In 2026”
Duration: 00:17:44
Total Correct Answers:
Current Caption
Correct

Learning Modes

YouTube Video Transcript Hide

Ask AI Result

The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here (00:00:00) I do maintain here is something which I (00:00:03) predict will happen. That's a (00:00:04) prediction. (00:00:06) I maintain (00:00:08) that as AI becomes more powerful (00:00:12) then people will change their behaviors (00:00:17) and we will see all kinds of (00:00:19) unprecedented things which are not (00:00:21) happening right now. So Ilia Sudskver (00:00:24) just gave a pretty controversial (00:00:26) interview. It's spreading like wildfire (00:00:28) on the internet. In this interview, he (00:00:30) said he's found the missing piece which (00:00:33) we needed to accomplish AGI. Let's watch (00:00:35) the interview and I'll dissect (00:00:37) everything as we go. The thing so so (00:00:40) here here is a perspective. Here's a (00:00:42) perspective I think might be might be (00:00:44) true. (00:00:46) So (00:00:47) the way ML used to work is that people (00:00:50) would just think of it with stuff and (00:00:52) try to (00:00:56) and try to get interesting results. (00:00:57) That's what's been going on in the past. (00:01:01) Then (00:01:03) the scaling insight arrived, right? (00:01:06) Scaling laws, GPT3. (00:01:09) And suddenly everyone realized we should (00:01:12) scale. (00:01:14) And it's just this this is an example of (00:01:17) how language affects thought. (00:01:21) Scaling is what? Just one word, but it's (00:01:24) such a powerful word because it informs (00:01:26) people what to do. do they say okay (00:01:27) let's let's try to scale things and so (00:01:29) you say okay so what are we scaling and (00:01:32) pre-training was a thing to scale it was (00:01:34) a particular scaling recipe (00:01:37) >> yes (00:01:37) >> the big breakthrough of pre-training is (00:01:40) the realization that this recipe is good (00:01:43) so you say hey if you mix some compute (00:01:48) with some data into a neural net of a (00:01:50) certain size you will get results and (00:01:53) you will know that it will be better if (00:01:55) you just scale the recipe up. And this (00:01:57) is also great. Companies love this (00:01:59) because it gives you a very uh lowrisk (00:02:03) way of investing (00:02:06) >> your resources. (00:02:07) >> Yeah. (00:02:07) >> Right. It's much harder to invest your (00:02:09) resources in research. Compare that. You (00:02:12) know, if you research, you need to have (00:02:14) like go forth researchers and research (00:02:16) and come up with something versus get (00:02:19) more data, get more compute. You know, (00:02:21) you'll get something from pre-training. (00:02:24) And indeed, you know, it looks like I (00:02:26) based on various um (00:02:29) um things people say on some people say (00:02:32) on Twitter, maybe it appears that Gemini (00:02:34) have found a way to get more out of (00:02:36) pre-training. At some point though, (00:02:38) pre-training will run out of data. The (00:02:39) data is very clearly finite. And so (00:02:41) then, okay, what do you do next? Either (00:02:43) you do some kind of a souped-up (00:02:45) pre-training, different recipe from the (00:02:47) one we've done before, or you're doing (00:02:49) RL or maybe something else. But now that (00:02:52) compute is big, computer is now very (00:02:54) big. In some sense, we are back to the (00:02:56) age of research. So maybe here's another (00:02:58) way to put it. Up until 2020, from 2015, (00:03:01) from 20 2012 to 2020, it was the age of (00:03:05) research. (00:03:06) Now from 2020 to 2025, it was the age of (00:03:09) scaling or maybe plus minus. Let's add (00:03:12) arrow bars to those years because people (00:03:13) say this is amazing. You got to scale (00:03:15) more. Keep scaling. The one word (00:03:17) scaling. But now the scale is so big. (00:03:20) Like is is it is the belief really that (00:03:23) oh it's so big but if you had 100x more (00:03:26) everything would be so different. Like (00:03:29) it would be different for sure but like (00:03:31) is the belief that if you just 100x the (00:03:34) scale everything would be transformed. (00:03:37) I don't think that's true. So it's back (00:03:39) to the age of research again just with (00:03:41) big computers. (00:03:42) >> Next Ilia talks about how AGI is going (00:03:44) to impact us humans and how it's going (00:03:46) to replace us all. Okay. I I I I see. So (00:03:50) you're you're suggesting (00:03:52) that the thing you're pointing out with (00:03:54) super intelligence (00:03:56) is not some finished (00:04:00) mind which knows how to do every single (00:04:02) job in the economy because the way say (00:04:05) the original I think open AAI charter or (00:04:07) whatever defines AGI is like it can do (00:04:09) every single job that a every single (00:04:11) thing a human can do. You're proposing (00:04:13) instead a mind which can learn to do any (00:04:17) single every single job. (00:04:18) >> Yes. (00:04:19) >> And that is super intelligence. And then (00:04:21) but once you have the learning (00:04:22) algorithm, (00:04:24) >> it gets deployed into the world the same (00:04:26) way a human laborer might join an (00:04:28) organization. (00:04:30) >> And it seems like one of these two (00:04:32) things might happen. Maybe neither of (00:04:33) these happens. one, this super efficient (00:04:38) learning algorithm (00:04:40) becomes superhuman, becomes as good as (00:04:43) you and potentially even better at the (00:04:45) task of ML research and as a result the (00:04:50) algorithm itself becomes more and more (00:04:51) superhuman. The other is even if that (00:04:54) doesn't happen. If you have a single (00:04:56) model, I mean this this is explicitly (00:04:58) your vision. If you have a single model (00:04:59) or instances of a model which are (00:05:02) deployed through the economy doing (00:05:04) different jobs, learning how to do those (00:05:05) jobs, continually learning on the job, (00:05:08) picking up all the skills that any human (00:05:10) could pick up but actually picking them (00:05:11) all up at the same time and then (00:05:12) amalgamating the learnings. (00:05:15) You basically have a model which (00:05:16) functionally becomes super intelligent (00:05:19) even without any sort of recursive (00:05:20) self-improvement in software right (00:05:23) because you now have one model that can (00:05:25) do every single job in the economy and (00:05:27) humans can't merge our minds in the same (00:05:28) way and so do you expect some sort of (00:05:30) like intelligence explosion from broad (00:05:32) deployment (00:05:33) >> I think that it is likely that we will (00:05:38) have rapid economic growth (00:05:42) I think the broad deployment (00:05:45) Like there are two arguments you could (00:05:48) make which are conflicting. (00:05:51) One is that look if indeed you get once (00:05:54) indeed you get to a point where you have (00:05:58) an AI that can learn to do (00:06:02) things quickly (00:06:04) and you have many of them then they will (00:06:06) then there will be a strong force to (00:06:10) deploy them in the economy. Unless there (00:06:12) will be some kind of a regulation that (00:06:14) stops it, which by the way there might (00:06:16) be. But I think the idea of very rapid (00:06:22) economic growth for some time, I think (00:06:24) it's very possible from broad (00:06:25) deployment. The other question is how (00:06:27) rapid it's going to be. (00:06:30) So I think this is hard to know because (00:06:32) on the one hand you have this very (00:06:34) efficient worker. on the other hand (00:06:36) there is the world is just really big (00:06:38) and there's a lot of stuff (00:06:41) and that stuff moves at a different (00:06:43) speed but then on the other hand now the (00:06:44) AI could you know so I think very rapid (00:06:48) economic growth is possible and we will (00:06:49) see like all kinds of things like (00:06:52) different countries with different rules (00:06:54) and the ones which have the friendlier (00:06:55) rules the economic growth will be faster (00:06:58) hard to predict (00:06:58) >> okay now in this next part Ilia predicts (00:07:00) how AGI is going to change everything we (00:07:03) know about our society how governments (00:07:05) are going to change and how human (00:07:07) behavior is going to shift as AGI comes (00:07:09) in. (00:07:10) >> And I maintain that I think I think most (00:07:12) people who work on AI also can't imagine (00:07:15) it because it's too different from what (00:07:18) people see on a day-to-day basis. (00:07:22) I do maintain here is something which I (00:07:25) predict will happen. That's a (00:07:26) prediction. (00:07:28) I maintain (00:07:30) that as AI becomes more powerful (00:07:34) then people will change their behaviors (00:07:39) and we will see all kinds of (00:07:41) unprecedented things which are not (00:07:43) happening right now and I'll give some (00:07:46) examples. I do like I I think I think (00:07:49) for better or worse the the frontier (00:07:52) companies will play a very important (00:07:53) role in what happens as will the (00:07:55) government and the kind of things that I (00:07:57) think we'll see which you see the (00:08:00) beginnings of (00:08:02) companies that are fierce competitors (00:08:05) starting collaborate to to collaborate (00:08:07) on AI safety you may have seen open AI (00:08:11) and anthropic event doing a first small (00:08:14) step but that did not exist That's (00:08:16) actually something which I predicted in (00:08:18) one of my talks about three years ago (00:08:21) that such a thing will happen. I also (00:08:23) maintain that as AI continues to become (00:08:25) more powerful, more visibly powerful, (00:08:29) there will also be a desire from (00:08:32) governments and the public to do (00:08:34) something (00:08:36) and I think that this is a very (00:08:37) important force (00:08:40) of showing the AI. That's number one. (00:08:43) Number two, okay, so then the AI is (00:08:45) being built. what needs to what needs to (00:08:46) be done. (00:08:49) So one thing that I maintain that will (00:08:51) happen is that right now people who are (00:08:53) working on AI I maintain that the AI (00:08:57) doesn't feel powerful because of its (00:08:58) mistakes. (00:09:00) I do think that at some point the AI (00:09:02) will start to feel powerful actually and (00:09:04) I think when that happens we will see a (00:09:06) big change in the way (00:09:09) all AI companies approach safety. (00:09:13) they'll become much more paranoid. I (00:09:15) think I I say this as a predict as a as (00:09:18) a as a prediction that we will see (00:09:19) happen. We'll see if I'm right, but I (00:09:22) think this is something that will happen (00:09:23) because they will see the AI becoming (00:09:25) more powerful. Everything that's (00:09:27) happening right now, I maintain is (00:09:29) because people look at today's AI and (00:09:32) it's hard to imagine the future AI. (00:09:35) And there is a third thing which needs (00:09:37) to happen. And I think this is this this (00:09:40) and I'm talking about it in in broader (00:09:42) terms not just from the perspective of (00:09:44) SSI (00:09:46) because you ask me about our company but (00:09:48) the question is okay so then what should (00:09:49) what should the companies aspire to (00:09:51) build (00:09:52) >> what should they aspire to build and (00:09:54) there has been one big idea that (00:09:56) actually every that um everyone has been (00:09:58) locked in locked into which is the the (00:10:00) self-improving AI (00:10:03) and why why did it happen because there (00:10:05) is fewer ideas than companies (00:10:08) But I maintain that there is something (00:10:10) that's better to build and I think that (00:10:13) everyone will actually want that. It's (00:10:15) like the AI that's robustly aligned to (00:10:20) care about sentient life specifically. (00:10:23) I think in particular it will be there's (00:10:26) a case to be made that it will be easier (00:10:28) to build an AI that cares about sentient (00:10:30) life than an AI that cares about human (00:10:33) life alone because the AI itself will be (00:10:36) sentient. (00:10:38) And if you think about things like (00:10:39) mirror neurons and human empathy for (00:10:41) animals, which is, you know, you might (00:10:43) argue it's not big enough, but it (00:10:46) exists. I think it's an emergent (00:10:48) property from the fact that we model (00:10:50) others with the same circuit that we (00:10:53) used to model ourselves because that's (00:10:55) the most efficient thing to do. (00:10:56) >> Now, in the next part, Ilia explains why (00:10:59) the age of scaling is over and why we're (00:11:01) back into the research phase. Pretty (00:11:03) interesting take. I am curious if you (00:11:04) say we are back in an era of research. (00:11:08) You were there from 2012 to 2020 (00:11:11) and do do you have Yeah. What what is (00:11:14) now the vibe going to be if we go back (00:11:16) to the era of research? (00:11:18) >> So one consequence of um the age of (00:11:22) scaling is that there was this (00:11:26) um scaling sucked out all the air in the (00:11:29) room. (00:11:29) >> Yeah. (00:11:31) And so (00:11:33) because scaling sucked out all the air (00:11:34) in the room, (00:11:36) everyone started to do the same thing. (00:11:39) We got to the point where (00:11:42) uh we are in a world where there are (00:11:45) more companies than ideas by quite a (00:11:47) bit. (00:11:48) >> Actually on that you know there is this (00:11:50) Silicon Valley saying that says that (00:11:54) ideas are cheap, execution is everything (00:11:58) and people say that a lot. (00:11:59) >> Yeah. And there is truth to that. But (00:12:01) then I saw I saw someone say on Twitter (00:12:04) um something like if ideas are are so (00:12:07) cheap, how come no one's having any (00:12:09) ideas? (00:12:10) >> And I think it's true too. I think like (00:12:14) if you think about um research progress (00:12:17) in terms of bottlenecks, (00:12:20) there are several bottlenecks. If you go (00:12:22) back to the if if you and um one of them (00:12:24) is ideas and one of them is your ability (00:12:27) to bring them to life. (00:12:28) >> Yeah. which might be compute but also (00:12:30) engineering. (00:12:31) So if you go back to the '9s let's say (00:12:34) you had people who had had pretty good (00:12:35) ideas and if they had much larger (00:12:38) computers maybe they could demonstrate (00:12:39) that their ideas were viable but they (00:12:41) could not. So they could only have very (00:12:43) very small demonstration and did not (00:12:45) convince anyone. (00:12:46) >> Yeah. (00:12:47) >> So the bottleneck was compute. Then in (00:12:50) the age of scaling computers increased a (00:12:53) lot and of course there is a question of (00:12:56) how much comput is needed but compute is (00:12:59) large so compute is large enough such (00:13:04) that (00:13:05) it's like not obvious that you need that (00:13:08) much more compute to prove some idea (00:13:12) like I'll give you an analogy. Alexet (00:13:15) was built on two GPUs. That was the (00:13:18) total amount of comput use for it. The (00:13:20) transformer (00:13:22) was built on 8 to 64 GPUs. No single (00:13:26) transformer paper experiment used more (00:13:28) than 64 GPUs of 2017, which would be (00:13:32) like what two GPUs of today. (00:13:34) So the ResNet, (00:13:37) right? many like even even the the um (00:13:40) you could argue that the like 01 (00:13:43) reasoning was not the most comput heavy (00:13:46) thing in the world. So there definitely (00:13:50) for for research (00:13:53) you need like definitely some amount of (00:13:55) compute but it's far from obvious that (00:13:57) you need the absolutely largest amount (00:13:59) of compute ever for research. M (00:14:02) >> you might argue and I think it is true (00:14:04) that if you want to build the absolutely (00:14:06) best system, if you want to build the (00:14:09) absolutely best system, then it helps to (00:14:12) have much more compute and especially if (00:14:14) everyone is within the same paradigm, (00:14:16) then compute becomes one of the big (00:14:19) differentiators. Okay, now this next (00:14:21) part is pretty interesting. Ilia talks (00:14:24) about AGI and gives a pretty solid (00:14:26) understanding of the AGI architecture (00:14:28) and how AGI compares to a human mind. (00:14:31) >> This will be two words, two words that (00:14:33) have shaped everyone's thinking I (00:14:36) maintain. (00:14:37) F first word AGI (00:14:41) second word pre-training. Let me (00:14:43) explain. (00:14:45) So the word the term AGI (00:14:48) why does this term exist? It's a very (00:14:51) particular term. Why does it exist? (00:14:53) There's a reason. The reason that the (00:14:56) term AGI exists is in my opinion not so (00:15:00) much because it's like a very important (00:15:02) essential descriptor of of some end (00:15:05) state of intelligence, but (00:15:10) because it is a reaction to a different (00:15:14) term that existed and the term is narrow (00:15:16) AI. If you go back to ancient history of (00:15:20) gameplay AI, of checkers AI, chess AI, (00:15:24) computer games AI, everyone would say, (00:15:26) look at this narrow intelligence. Sure, (00:15:28) the chess AI can beat Casper off, but it (00:15:30) can't do anything else. It is so narrow, (00:15:33) artificial narrow intelligence. So in (00:15:36) response, as a reaction to this, some (00:15:38) people said, well, this is not good. It (00:15:42) is so narrow. What we need is general (00:15:44) AI. (00:15:46) general AI, an AI that can just do all (00:15:48) the things. (00:15:51) The second and and that term just got a (00:15:55) lot of traction. (00:15:56) >> Yeah. (00:15:57) >> The second thing that got a lot of (00:15:59) traction is pre-training. (00:16:02) Specifically, the recipe of (00:16:03) pre-training. I think the current the (00:16:05) way people do RL now is maybe um un is (00:16:09) undoing the conceptual imprint of (00:16:12) pre-training. But pre-training had the (00:16:14) property. you do more pre-training and (00:16:17) the model gets better at everything more (00:16:19) or less uniformly. Yeah, (00:16:22) >> general AI pre-training gives AGI (00:16:28) but (00:16:30) the thing that happened with AGI and (00:16:33) pre-training is that in some sense they (00:16:34) overshoke the target (00:16:37) because by the kind if you think about (00:16:39) the term AGI you will realize and (00:16:42) especially in the context of (00:16:43) pre-training you will realize that a (00:16:45) human being is not an AGI (00:16:48) because a human being Yes, there is (00:16:51) definitely a foundation of skills. (00:16:54) A human being, (00:16:57) a human being lacks a huge amount of (00:16:59) knowledge. Instead, we rely on continual (00:17:02) learning. We rely on continual learning. (00:17:05) And so then when you think about okay, (00:17:07) so let's suppose that we achieve success (00:17:09) and we produce a safe super some kind of (00:17:11) safe super intelligence. The question is (00:17:14) but how do you define it? Where on the (00:17:16) curve of continual learning is it going (00:17:17) to be? I produce like um a super (00:17:20) intelligent 15 year old that's very (00:17:22) eager to go and you say okay I'm going (00:17:24) to they don't know very much at all the (00:17:26) great student very eager you go and be a (00:17:29) programmer you go and be a doctor (00:17:32) go and learn so you could imagine that (00:17:34) the deployment itself will involve some (00:17:36) kind of a learning trial and error (00:17:38) period it's a process as opposed to you (00:17:42) drop the finished

Leave a Reply

Your email address will not be published. Required fields are marked *