Microsoft CTO Kevin Scott on How Far Scaling Laws Will Extend

The current LLM era is the result of scaling the size of models in successive waves (and the compute to train them). It is also the result of better-than-Moore’s-Law price vs performance ratios in each new generation of Nvidia GPUs. The largest platform companies are continuing to invest in scaling as the prime driver of AI innovation. Are they right, or will marginal returns level off soon, leaving hyperscalers with too much hardware and too few customer use cases? To find out, we talk to Microsoft CTO Kevin Scott who has led their AI strategy for the past seven years. Scott describes himself as a “short-term pessimist, long-term optimist” and he sees the scaling trend as durable for the industry and critical for the establishment of Microsoft’s AI platform. Scott believes there will be a shift across the compute ecosystem from training to inference as the frontier models continue to improve, serving wider and more reliable use cases. He also discusses the coming business models for training data, and even what ad units might look like for autonomous agents. Hosted by: Pat Grady and Bill Coughran, Sequoia Capital Mentioned: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , the 2018 Google paper that convinced Kevin that Microsoft wasn’t moving fast enough on AI. Dennard scaling : The scaling law that describes the proportional relationship between transistor size and power use; has not held since 2012 and is often confused with Moore’s Law.

Published: Published Jul 9, 2024
Uploaded: Uploaded Jun 11, 2026
File type: Podcast
Queried: 00

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:31

[00:00] The things that are riddle right now where you're like, oh my god, this is a little too expensive or it's a little too fragile for me to use. All of that gets better. It'll get cheaper and things will become less fragile. And then more complicated things will become possible. That is the story of each generation and these models as we've scaled up. [00:21] Bye. [00:37] - On any given day, Microsoft may be the most valuable company in the world, and arguably no one. [00:43] has been more ambitious. [00:45] more strategic or more effective. [00:48] and its AI strategy, the Microsoft. [00:51] The key architect behind that strategy? [00:54] Kevin Scott. [00:55] CTO. [00:56] of Microsoft. [00:57] Thank you. [00:58] We've had the pleasure of knowing Kevin for a couple of decades now, dating back to his time at Google. [01:03] when he overlapped with our partner, Bill Korn. [01:06] Bill will join us today for a very special episode of Training Data. We hope you enjoy. [01:11] Beep beep beep beep beep beep. [01:14] Kevin, thank you for being here on Training Data. [01:17] Well, glad to be here. [01:19] So just to start, I know you've talked about this before, but for our listeners who might not be familiar with your story, [01:25] How does a kid from rural Virginia end up becoming the CTO of Microsoft?

1:33-3:07

[01:33] Who knows? [01:36] certainly not a [01:38] Not a repeatable plan, I don't think. I don't know. It is... [01:46] The thing... [01:47] when I reflect back on my, uh, yeah, my story is, it's just a lot of being at the right place at the right time. So I'm 52 years old. So I was, uh, [02:00] you know, uh, uh, [02:02] 10, 11, 12 years old when the personal computing revolution started to hit full steam. And so, like, right at that moment when you're a kid trying to figure out what you're about and, like, what you're going to latch on to, like, I had this, like... [02:17] really convenient thing that captured my interest and was a good place for me to ground my curiosity on. [02:25] and [02:26] you know and I think that's [02:28] One of the object lessons in general is if you happen to be interested in... [02:36] um like really motivated to learn more and do more with something that is at the same time growing really really quickly uh like you probably are gonna end up in a reasonable place [02:48] And so, you know, I... [02:51] was interested in computers. I was the first kid in my [02:56] or the first person in my family, neither my mom nor my dad went to university. So I was the first one to graduate with a bachelor's degree. Like I,

3:07-4:46

[03:07] majored in computer science and minored in english literature like i had this moment uh when i was uh [03:13] Um... [03:15] at this moment when I was trying to decide what I was going to go do after I got my undergraduate degree where my, [03:22] two advisors were arguing about whether it should be PhD in computer science or PhD in literature, and I was very seriously considering both. But [03:33] I was so broke. Uh... [03:36] and just so tired of being, uh-uh. [03:40] busted all the time that I picked the pragmatic path. You know, not that, you know, like I still... [03:49] imagine what my life would have been like as a person with a PhD in English literature and I think it would have been just fine but like I chose one of my two equal interests and [04:02] And then, you know, for a while, I thought I was going to be a computer science professor. [04:06] and [04:07] at the... [04:08] Last minute, this is where Bill and I intersected. Like I decided I was a compiler optimization and programming languages person. [04:18] through years and years in grad school and I got almost all the way to the end and I was like [04:24] I don't think I want to be... [04:26] a professor anymore. Like I... [04:29] I would work on these things where it was six months of effort to write a paper and you make some synthetic benchmark 3% better. And I was like, this doesn't feel to me like the way to have a lot of impact in the world. And I don't want to do this over and over and over again for the next year.

4:46-6:16

[04:46] 25 years of my life. And so... [04:50] I, uh... [04:51] I sent... [04:52] my resume cold into Google in 2003 and [05:00] I got a... [05:02] email from this guy, Craig Neville Manning, who had just gone off to New York to open up Google's first [05:09] engineer remote engineering office and uh [05:13] Like I had... [05:14] an amazing interview at Google. Uh, I don't know whether this was on purpose or not, or like, I just got luck of the draw, but, uh, like, uh, it seemed like every compiler person who was working at Google, uh, was on my interview slate. And I was like, this is amazing. Like all these people, like, [05:31] Uh... [05:32] know all of this stuff that I know and you know we can have easy conversations. I worked on nothing that was even remotely close to compilers at Google and I was confused why all of these people were there but... [05:45] It was a great interview, and I was super stoked, and I joined... [05:49] Google, and Google is yet another one of those things, just like the PC and just like the internet, like it was a phenomenon that was growing crazy fast with a bunch of smart people working there, and that... [06:01] that resulted in [06:03] this opportunity I had to go join this startup AdMob when it was very early on, like right at this pivotal moment when mobile was taking off. [06:14] you needed things in mobile like

6:17-7:48

[06:17] advertising infrastructure and I helped build [06:21] Yeah, the... [06:22] the seminal company, I think, in mobile advertising, and then was back at Google, and then I helped LinkedIn go public, running its engineering and operations team, and then I was at LinkedIn when [06:35] got acquired by Microsoft. [06:36] And so like, [06:37] None of that I think you can plant. It's just like a lot of... [06:42] right place at right time and... [06:45] Yeah. [06:46] trying at every point you can to... [06:50] do the most interesting thing you can do on the thing that's growing really fast. [06:55] You know, when you talk about your personal history, Kevin, I guess – [07:00] You know, the focus nowadays is on AI, machine learning. [07:05] a lot of the practitioners are, um, [07:09] people with PhDs. How do you think about [07:13] sort of practical teams. [07:15] for AI. [07:16] Since you're obviously... [07:19] doing a lot of that work at Microsoft and involved with partnerships with OpenAI and others. [07:25] Yeah, I mean, I think if you are... [07:28] building the [07:31] really complicated platform pieces of AI. So like the big distributed systems for training and inference, the big... [07:41] like networking and silicon and, you know, system software components or,

7:48-9:21

[07:48] the [07:49] algorithms that you're using to do training and inference, I think, [07:53] a PhD is super helpful. Like there's just a huge amount of [07:58] prior [08:00] knowledge that you need to have in order to jump into the problem space and be able to like go quickly and like, you know, you need to be clever, but like you don't. [08:11] PhD is... [08:13] I like... [08:14] Yep. I know you have a PhD, Bill, and are far cleverer than I am, but like, [08:19] Usually folks with PhDs are clever, but they're not the only people in the universe who are clever. So I think it's mostly helpful in the sense that... [08:30] You've gone through like a pretty rigorous training regimen where you get a whole bunch of prior art stuffed into your skull and like you demonstrably can do a very complicated project. And you have the PhD projects. [08:43] look kind of like [08:47] AI platform... [08:49] systems projects except the AI platform and systems projects. [08:56] are lots and lots of people working together. Whereas, yeah, when you're getting your PhD, you often are like working in relative isolation on a particular thing. So like, that's the, you know, one of the things people have to learn is like how to, [09:09] get yourself docked into a group and to be able to collaborate effectively with a bunch of other people like yourself. [09:15] so useful. But, you know, there's so much else in AI that needs to be done other than building the

9:21-10:53

[09:21] the platform [09:23] And for those things, [09:26] PhD is helpful, but certainly not necessary. [09:29] you know, like figuring out [09:31] How do I apply this to... [09:33] education? How do I apply this to healthcare? How do I, like, how do I build developer tools around this? How do I [09:41] Yeah, do all of the million things that happen when new platforms emerge that you, you know, sort of complete the whole platform into like a portfolio of products and a portfolio of middleware and a portfolio of like all of the other stuff you need. [09:55] Well, speaking of which, Microsoft seems like it has about the most [09:59] sort of far-reaching or ambitious AI strategy of anybody out there. Can you just kind of say in a couple words, what is the AI strategy for Microsoft? And then just for fun, [10:09] If you're going to grade yourselves, [10:12] What have you done particularly well? [10:14] What have you done... [10:15] Maybe not as well as you could. [10:17] Yeah, so... [10:19] I mean, we've been sort of talking about this strategy. Microsoft is a platform company. Like, we, I think, have participated or, like, helped drive a handful of the [10:30] big platform waves and computing. Uh, like we were certainly one of the pillar companies in the personal computing revolution. Like we had a important part to play in the internet revolution. Although I think that one was, uh, [10:43] a far more [10:45] diversely contributed to revolution than personal computing. [10:50] we kind of miss the

10:53-12:24

[10:53] mobile, uh, computing revolution, um... [10:58] But each one of those things, we have thought about how do you go build [11:05] a technology platform for this particular era of technology that allows other people to go build on top of that platform to make useful things for other people and so that is our ai strategy it is uh [11:18] Like, how do you from... [11:21] frontier models to small language models to, like, highly optimized inference infrastructure, you know, like, [11:31] hyper scale on both training and inference like [11:35] economies of scale, like making the entire platform more... [11:41] because it's cheaper and more powerful with every turn of the crank. [11:46] and like all of the developer tools and [11:50] safety infrastructure and testing and everything that has to be there in order to have [11:56] robustly built AI applications, like go build that and like listen to developers and listen to people building, building AIs. [12:06] as intently as you possibly can so that you are filling in all of the gaps that you can for them as they are encountering problems. [12:16] deploying this technology. [12:18] to users. [12:21] So that is... [12:22] that is our strategy. Um,

12:25-13:53

[12:25] And so... [12:26] Yeah, I think we're doing a reasonable job of it. [12:33] Um... [12:34] I hate to grade myself. It seems a little bit disingenuous, right? I went to lowlights. [12:41] Well, so maybe before I do that, like, you know, let me describe something about my own psychology. So I like I am an engineer and I think most engineers are like. [12:53] short-term pessimist, long-term optimist. And so the short-term pessimism is like you come in every day and you're like, oh my God, this is like a bag of crap. Like I just don't like any of this and like everything's broken and like I got to, [13:06] Like I got, I got so much stuff to fix and I'm so frustrated. Uh, [13:11] But you work on all of those things anyway because you're optimistic that all of the problems can be fixed and that they're going to be worth fixing at the end of the day. [13:22] And so, yeah, I mean, the... [13:25] there's a bunch of stuff that I think we're doing really well. Like, I think we have absolutely, along with OpenAI, made very powerful AI dramatically more accessible than it otherwise would have been to a larger group of people. I think you... [13:40] because of that work that we've been doing alongside open AI, we have, uh, [13:46] We're just seeing lots and lots of customers who otherwise wouldn't be building powerful AI applications.

13:55-15:29

[13:55] And so, like, I feel like we're doing a good job in the way that we're partnering. I think we're doing a good job in, like, having a really... [14:04] Um... [14:06] particular point of view and it's not an immutable point of view but like it's a point of view about like what an AI platform ought to look like and we're trying to like make it as complete as we can. [14:18] Um... [14:19] Yeah, lowlights as I think we were... [14:21] a little bit late to some of the basic AI stuff. So it wasn't that we were not... [14:30] investing in AI at all. And like you can sort of look at some of the work that Microsoft Research had done over the years and like MSR was an early... [14:40] leader. Um... [14:43] And I think, you know, the... [14:47] Bill knows this just as well as I did just from his time at Google and where we overlap for a number of years. [14:56] Um... [15:01] maybe most of the really important advancements in AI over the past 20 years have been [15:08] a function of some kind of scale. And it's usually... [15:13] You got. [15:14] data scale and compute scaling combination let you uh like do things that weren't possible at lower scale points and [15:24] at some point that scaling of data and compute

15:29-17:02

[15:29] is so exponential that you get past the point where... [15:33] you can have fragmented bets where you can literally it just becomes economically impossible to bet on [15:41] 10 different things. [15:43] that are all exponentially scaling or have the ambition or the need to exponentially scale simultaneously. And so I think one of the things that... [15:52] we were a little bit late to is like, we, we didn't put all of our eggs into the right basket soon enough. Like we just, you know, we were spending a lot on AI, but it was fragmented across a whole bunch of different things. And because we didn't want to, [16:05] hurt any of the feelings of smart people or... [16:08] Yeah, whatever, right? Like, I don't even know what the diagnosis was because a lot of that was before I was at Microsoft. [16:15] We... [16:15] Like we just weren't as quick as we should have been at like saying, nope, scale is what matters. And like, here's how we're going to focus our investments on scale in a principal way. [16:27] When did you get religion that scales what matters? Was there a particular event or a moment that... [16:33] really crystallize that for you? [16:35] yeah I mean I was so I've been at Microsoft for about seven and a half years now and uh like [16:42] my, um, [16:44] when I became CTO, my [16:48] my... [16:50] was like take a scan left to right across... [16:55] both Microsoft and the entire industry and try to see where...

17:03-18:37

[17:03] Yeah, we just had holes in execution where we were not doing things at that point in time, which was... [17:12] I guess 2017 early. Um, [17:15] where [17:17] All right. Like, what are we not doing today that we're going to deeply regret in 2019 or 2020? So like two, three years out. [17:26] And, like, the biggest thing on the list was, like, you know, our rate of progress on AI was not fast enough. So I'd say mid-2017, like, I had religion that that was going to be. [17:37] a big part of my job, uh, was like helping us figure out what the strategy was going to be. And then, uh, [17:43] in 2018, uh, like, if anything, the, um... [17:49] Bert [17:50] The publication of the Burt paper from Google was like a real crystallization of... [17:58] that belief. So like everything that I had [18:02] that was in my analysis, I was like, this is as fine an example as anything of why we have to really, really... [18:11] on getting more serious here. [18:15] And so like, you know, [18:17] Very shortly after that, like I restructured a whole bunch of stuff inside of Microsoft to get us more focused on AI. And then about a year later, we did that first deal with OpenAI. [18:28] And, yeah, we have been... [18:30] accelerating our investments and like trying to get more focus more crisp uh more purposeful uh since then

18:38-20:09

[18:38] When you were very early to appreciating the potential of OpenAI, what did you see in them at that time when that first partnership was struck? [18:47] Well, we had, or at least I had... [18:51] this real belief that what was happening with... [19:07] uh... a google where [19:09] You had... [19:11] a pool of data and a bunch of machines and an algorithm and you were like training a model and like the model was for a specific thing like in the other case of. [19:22] the thing i was doing a google was like [19:25] click-through rate prediction for advertising and you know like a handful of other things uh [19:30] and like [19:30] just outrageously effective, right? Uh... [19:34] But... [19:36] most of the work before this, uh, before GPT, um, [19:43] was about... [19:45] those sort of narrow use cases. Like, you were purpose-building models for narrow things, and it was just tough to scale. Like, you couldn't [19:53] invest a bunch of compute and like you couldn't amortize the cost of the compute across anything more than just the [19:59] narrow thing that you were building the model for and [20:03] You had to have a lot of expertise that, you know, if you wanted to replicate all of this, it's like you had to have...

20:09-21:40

[20:09] different data and like different [20:11] you know, AI PhDs and, you know, different processes every time you wanted to... [20:16] go build AI into an application and [20:21] what was happening was like, yeah, you had these big [20:24] large language models that were useful for lots of different things. So you didn't have to, you know, have a separate model for machine translation and sentiment analysis and [20:35] Yep. [20:37] all of the different text things that you were doing. And I was like, okay, this is extraordinary. And they were also becoming more platform-like. [20:44] as a function of scale. So transfer learning was working [20:48] better as things scaled up. And... [20:54] Yeah, and this is still the general pattern. So, like, everything... [20:59] Everything that we understand that large language models can do, plus or minus, like will get better when you get to the next scale point. And on top of that, they will become slightly or maybe dramatically more general. [21:14] in the sense that [21:15] their capability set broadens. [21:18] And [21:20] OpenAI had that same belief, and they also had a very principled analysis of how those [21:29] platform characteristics emerged over time as a function of scale. Uh, [21:33] at a bunch of experimental... [21:35] validation that said [21:38] that their forecasts were right.

21:40-23:15

[21:40] And so, like, it just, you know... [21:42] You sort of like look at what the forecast says and... [21:46] like this is how much money it costs to run the experiment to see if you're going to be on forecast uh for the next uh turn of the crank and like you know it was felt like a big number at the time like it was [21:57] billion dollars uh [22:00] But like relative to what was happening, it just wasn't a, [22:04] large amount of money [22:07] And then, [22:09] GBD-3 was on forecast and GBD-4 was on forecast. So like it just was... [22:15] Like finding a partner that had the same platform belief that you did and like a track record of being able to execute through these scale points. Like it was... [22:26] It didn't like I've done a bunch of things before that I have way more reservation about in the past, like just in terms of investments like this one didn't like there was a bunch of people who didn't agree with me, but like I had pretty high conviction. [22:41] You touch on investment. I guess, you know, there's a lot of... [22:46] trade publications now speculating about the cost of doing training and so forth. And, uh, [22:53] you know, rumors of billions and billions of dollars being spent and so forth. [22:58] And I guess based on my own background, I think training is going to get dwarfed by inference here pretty soon. How do you see the – I hope so. [23:08] Well... [23:09] Yeah, otherwise we're building models that nobody knows what to do with. That might not be a great investment. Yep.

23:15-24:52

[23:15] How do you see kind of computing landscape evolving and where's it going? You know, the [23:22] I think people are joking that all the money is going to NVIDIA at the moment. [23:26] - That's amazing. [23:27] Well, look, I think NVIDIA is doing a good job. So, like, the two interesting things that are happening with these models just in terms of... [23:37] Um... [23:38] the efficiency of the scale up is each hardware generations better price performance wise uh [23:45] usually by an extent greater than Moore's Law used to work for general purpose computing. [23:52] So... [23:53] Um... [23:55] A100 was about three, three and a half times better price performance than V100, H100, not [24:05] quite that much, but, you know, close. [24:08] on paper, the next generation looks very good as well. And so like you've got hardware, [24:16] for a variety of reasons, like part of its process technology, but part of its architecture, and a lot of it is being able to leverage... [24:27] Um... [24:29] uh, [24:30] narrower. [24:32] uh, word size, uh, in the computation. So like, you know, instead of needing 64 bit arithmetic, like you're, you know, you're doing arithmetic, uh, with much less precision right now. Um, [24:43] And so like that, you know, there's just an embarrassing amount of parallelism there. Then like we're getting better and better at extracting that architecturally in the hardware and, you know,

24:52-26:22

[24:52] um [24:53] Yeah, there's a bunch of [24:55] innovative stuff happening with networking as well. Like we're well past the point for the frontier models, at least where you can do anything interesting on a single GPU. So for years and years now, like both training and inference have been [25:08] Multi GPU, multi compute node. [25:11] problems. And so there's a bunch of innovation happening on the network side as well, which allows you to strap all of the compute together like at the [25:21] chassis level, the rack level, the row level, data center level more effectively. [25:27] which is great because... [25:29] For the nerds listening, we haven't had effective... [25:35] power scaling, [25:38] or Denard scaling since 2012 or so. So... [25:44] Yeah, we're night. [25:45] We're... [25:46] We're getting more transistors, but, like, they are not getting cooler. [25:52] Um... [25:53] and we just... [25:55] We have a lot of... [25:57] a lot of density issues just with power dissipation that we have to go, uh, go deal with. Um, um, [26:05] Do you see inferences driving different data center architecture? [26:11] Yeah, I mean, look, we already architect our training environments and our inference environments differently. They just need different things. [26:19] and [26:21] Like I think...

26:23-27:55

[26:23] you know, all the way down to, you know, silicon and like through the network hierarchy, you need different things for inference. [26:30] And like inference is kind of easier than training. Like training the way that we're doing it now is like we go build. [26:37] big environments that take [26:39] a few years to build. [26:43] and [26:45] Yeah, with inference, like if somebody came along with a better silicon architecture, a better network architecture, like a better cooling technology, like it's a much easier experiment to go run. You just go... [26:59] swap some racks out. I mean, like I'm, like my data center people would yell at me, like it's not quite that easy, but it is, uh, [27:07] it is easier than having to go do a big... [27:10] capital project like a training environment looks like. [27:15] And so, like, you know, intuitively you would think that, [27:19] that is going to result in more diversity in the inferencing environment and more competition and more like a faster... [27:28] rate of improvement. Like that's [27:32] And like on the software side, that's certainly what we see, like the inference stack, just because it's such a large fraction of the overall compute footprint and it's constrained. [27:44] Because we have more demand than supply at the moment. Like you just have very, very powerful incentives to go optimize the software stack to squeeze more performance out of it.

27:55-29:25

[27:55] Do you think we'll be in an environment anytime soon where that demand-supply balance is? [28:00] changes. [28:01] Not necessarily at Microsoft, but it feels like we're seeing that at the market level as well. [28:06] Yeah, I don't know. Um... [28:08] I mean, if we... [28:11] If we continue to see the... [28:13] platform [28:15] Uh... [28:17] continue to expand capability-wise, and it just becomes more useful. I think demand increases, if anything. Now... [28:27] the shape of the demand is probably going to move around. Like I think you're already seeing a little bit of that. Um, [28:34] Yeah, building a frontier model is like a very, very... [28:39] resource intensive thing and as long as people are like building frontier models and making them accessible and like maybe they're not accept accessible quite the way that people want you know like they're [28:50] only API accessible and like there isn't an open source thing you can go instantiate and muck around with. But like it's [28:58] way more accessible than it was six or seven years ago, where the only way to access some of this stuff is you had to go work for a, you know, [29:07] to. [29:07] two or three tech companies, uh, [29:10] and [29:11] So anyway, but I think... [29:14] Yeah, you do have to ask yourself, [29:17] and somebody else should do the asking, right? Cause I'm like all kinds of biased. Right. Uh, but like, I don't know how many frontier models, uh,

29:26-30:58

[29:26] you actually need. [29:29] if they're all, roughly speaking, in the same tier of capability. [29:34] Um... [29:35] Yeah, that's an awful lot of money to spend for things that are... [29:40] Yep. [29:42] roughly equivalent. [29:45] You know, it's sort of like, you know, if you're starting a company right now and you believe that you have to build your very own frontier model in order to go. [29:55] an application to someone, that's almost the same thing [30:00] is saying, like, I got to go build my own smartphone hardware and operating system in order to deliver this mobile app. [30:09] Like maybe you need it, but like probably you don't. [30:13] And so... [30:16] I mean, to Bill's point, I think the thing that [30:20] makes sense [30:21] market is you're going to want to see lots of people doing lots of inference because that means you've got lots of products that have found product market fit and those things are scaling. [30:32] um, [30:33] But like lots of [30:35] speculative dollars flowing into infrastructure. [30:39] R&D. [30:41] probably ends the same way that... [30:44] Yeah, the... [30:45] many speculative infrastructure booms have ended. [30:51] On the scaling front, Microsoft published a paper some time ago

30:59-32:30

[30:59] pointing out that the quality of training data, [31:02] is, uh, [31:03] maybe at least as important as volume. And I think [31:08] One of the speculations you see now in the industry is that [31:12] We're running out of [31:14] sources of high quality training data. [31:17] and you're [31:19] reading at least some articles. [31:22] claiming that various partnerships are being struck to get access to training data, and it might be behind paywalls and so forth. [31:29] How do you see that evolving? [31:33] It feels like [31:34] we have more and more computation and, [31:37] but we may not have more and more training data. [31:41] Yeah, I mean, I think that was... [31:43] almost inevitable um [31:46] It is, in my opinion, a good thing that quality of data matters more than quantity of data because it gives you an economic framework to go do the partnerships that you need to go do to make sure that you're feeding your data. [32:02] AI training algorithm or curriculum that, you know, is going to result in smarter models. [32:09] And like honestly, like not wasting a whole bunch of compute feeding at a bunch of things that... [32:14] or not. [32:16] And I think from an infrastructure perspective, one of the things people have been [32:20] very confused about is a large language model is not a database. It's not a repository of facts. It's important for it to

32:30-34:10

[32:30] quote-unquote knows some factual things. But... [32:35] It is like the world's crappiest [32:38] database, you know, honestly, like if you need it to be your retrieval engine, and so like you just shouldn't think about it as [32:46] Like, hey, I got this thing, and, like, it has to have everything baked into it, like, into the model weights themselves so that you can... [32:54] recall a bunch of [32:57] stuff. [32:58] You know, like as you've seen, like the recall is... [33:04] imprecise in the same way that human recall is imprecise. So they're just much more efficient ways to do recall. Um... [33:12] Yeah, so look, I think you are... [33:17] I mean, the way that we see things developing is like you have... [33:21] data that is valuable for training models, and then you have data that you need to have access to for an application in order for the model to reason over. And, like, those are two different things, and I think they're probably... [33:34] two different business models around those things. So... [33:38] And at the end of the day, this is all just... This is about... [33:41] business models, right? Like, [33:43] people who produce data. [33:47] want to be [33:48] compensated for, you know, [33:53] use of that data. [33:55] Um... [33:57] And so, yeah, we have all of this data sitting inside of search engines right now, like not in, you know, randomized weights, but like quite explicitly, it's like sitting at a, you know, sitting in...

34:10-35:43

[34:10] indices and you know bing and google and whatnot just waiting to be retrieved and like [34:17] Plus or minus, everybody's okay with that because there's a business model there that makes sense. [34:24] you. [34:24] enter a query and like you're either sending traffic or, you know, like there's, [34:29] SEO and advertising and a whole bunch of business model that surrounds that. [34:35] I think we'll figure out a business model for... [34:39] that, um... [34:41] referral data so that like, you know, when an agent or an AI application needs to retrieve some information from someone so that it can, you know, reason over it and give the user, you know, [34:52] like we will figure out the business model for that. Like it'll either be subscription rev share, it'll be licensing, it'll be like some new flavor of advertising. [35:00] I was just telling someone the other day, like if I was in my... [35:04] 20s right now, like, you know, for all of your entrepreneur, like somebody ought to be out [35:09] right now figuring out what the new... [35:12] ad unit is for agents and like just building the company. Uh, [35:18] Because like it... [35:19] it will have the same characteristics and qualities as, uh, [35:25] previous ad units, like you have people with information and products and services who are going to want to get to the attention of someone who might want those [35:35] data and products and services and like, [35:38] quality is going to matter and the relevance is going to matter and a bunch of other things and

35:44-37:14

[35:44] I would be shocked if there isn't a [35:47] auction model for, you know, that's [35:51] going to be the right way to value everything. And, you know, like, yeah, maybe there's referrals and like referrals will have some economic value. [35:59] I think for training it's just a little bit different because it's very, very hard when you're – [36:05] building that model to... [36:09] at the time that you're doing the building to... [36:13] really be able to ascribe... [36:16] Um... [36:18] monetary value to a particular token of input. [36:23] Just because it's contribution to the model, like the same way that a word from Moby Dick has like a very... [36:29] diffuse contribution to like your own human intelligence, you know, [36:35] even though you definitely read it at some point in your career. [36:40] Ah. [36:41] or your life. [36:43] Like, [36:44] how valuable that is to like forming, you know, Bill Corrin or Kevin Scott's, you know, useful intelligence, like who knows. [36:53] Speaking of which, this... [36:55] One of the things that we hear... [36:57] a lot of times is the value function is in some ways the bottleneck to broader reasoning capabilities. [37:04] It's easy enough to construct a value function when you're playing a game with a [37:08] with a known winner and loser like [37:11] Go or chess or poker or diplomacy,

37:14-38:47

[37:14] but it becomes a lot harder... [37:16] to construct the value function when you're going into broader domains. And it's, it's things like assigning the value of Moby Dick to Kevin Scott's life, you know, that sort of thing. [37:24] um, [37:25] Are there... [37:26] Are there practical solutions to this? Are there practical implications to this? I guess the broader question would be, [37:32] Where do you see the overall field of reasoning and LLMs going? [37:36] Well, look, I think people are trying to get at this... [37:40] So you've got a bunch of [37:42] benchmarks like [37:44] Yeah. [37:45] GPQA and MMLU and we're just sort of rolling through a bunch of benchmarking paradigms to try to come up with [37:56] scores of [37:58] performance for these models, like whether it's reasoning capability or, yeah. And, [38:03] Yeah, I think one of the interesting things we've seen over the past handful of years is like we just are very quickly saturating these benchmarks, like where you – like one emerges and then – [38:13] within a model generation, like you'll completely or like get very close to saturating the particular benchmark and then you got to go find something else to help. [38:22] be your guiding light. [38:25] But let's just sort of assume that you'll have some interesting benchmarks that are correlated with – [38:32] the reasoning capabilities you want models to have. Then the question, it's just an expensive experiment to run. You can run an experiment where it's like, okay, I'm going to train a model with this information in and out, and does it get...

38:47-40:18

[38:47] better or worse at... [38:50] performance on these [38:53] reasoning benchmarks. [38:56] And, you know, like I think all of us have done. [39:00] different versions of those experiments. Like, they're just extraordinarily expensive to run at the most granular scale you can imagine running them. Yeah. [39:09] um... [39:11] part of that paper that Bill referenced, like textbooks are all you need, is like a... [39:16] Um... [39:17] It's not the full story, but it's part of a story that is just sort of evaluating token contribution quality to... [39:26] model performance. [39:28] So, yeah, I think it's... [39:31] And everybody's got every incentive in the world right now to try to figure out, like, what that is. If for no other reason than you, like, in a world where you're synthesizing data. [39:42] You're literally spending compute. [39:44] to generate synthetic tokens for training. [39:48] You really want to make sure that the tokens that you're generating are actually... [39:55] useful or not. [39:57] Where do you think the models are at the moment? I think Microsoft has introduced a whole bunch of co-pilots to try to help end users with your products and so forth. [40:10] On the other hand, I see lots and lots of companies trying to build agents that can be kind of autonomous actors now.

40:19-41:51

[40:19] wide spectrum of kind of expected performance of what the models can do. Where do you think we are? Where do you think we'll be in a couple of years? [40:29] Well... [40:31] Yeah, so I think that's a super good question, and there's even a philosophical thing there about what it is we should want. [40:40] Bye. [40:40] Well, there is the specter of everybody's job getting replaced, right? So. Yeah. You know, somebody... [40:49] Look, we chose the name co-pilot for the things that we were doing relatively deliberately because – [40:58] We want to... [41:00] at the very least, encourage everyone who's building these things inside of Microsoft to think about [41:07] How can I help augment... [41:09] someone who's doing some form of [41:11] cognitive work. So like we want to build [41:14] you know, assistive, not substitute of... [41:17] tech. [41:18] Um... [41:21] And, you know, the good news is it's also easier to think about how to go from, you know, sort of rough... [41:28] frontier model capability to useful tool when you're narrowing it down to a domain. [41:36] And so, like, I think that's been a reasonable... [41:40] deployment path and like we've got a handful of our co-pilots that have real market traction right now and are like you know in daily use by like a lot of people doing

41:52-43:29

[41:52] Yeah, real... [41:53] non-trivial cognitive work. And I think that will expand... [41:58] over time. [41:59] Um... [42:00] Just on that, what are some examples of co-pilots that have really, like, hit the bullseye already? [42:06] versus maybe co-pilot where the technology is not quite ready. [42:10] In terms of like jobs to be done. [42:12] Yeah, like, I think, you know, GitHub Copilot is, like, probably the, you know, the thing we've talked most about. And, like, there's, you know... [42:21] the most public conversation around. Yeah, like it's [42:27] it's been a hit. [42:29] you know, it is, it is, [42:31] genuinely useful. Um... [42:34] Yeah, we've got... [42:36] other co-pilots that are like that that are, you know, that are super useful. But, you know, I think the thing that Bill was getting at, like, the more general... [42:47] the more general the co-pilot is, the harder it is to have it actually take [42:55] uh... [42:57] very high precision action on your behalf autonomously. [43:03] Yeah, particularly if it's doing something where it's representing you. [43:10] where there are stakes. [43:13] And you have consequences and accountability back to you if this agent makes a mistake. [43:20] And we're trying to be very deliberate there because I think one of the things that you don't want to do is introduce a thing that's going to make...

43:29-45:10

[43:29] a whole bunch of these sort of [43:32] errors where the user's first reaction is like, this doesn't work. And like, I'm not going to try it again for a good long while. Uh, [43:41] So we'd rather have it be... [43:43] very good before we introduce it, which again means... [43:50] you're sort of optimizing for use cases, not for super, super broad things. [43:57] I mean, there's a good... Like, we... [44:00] we did a partnership recently with, uh, yeah, with Devin, uh, [44:04] Um... [44:06] which I think is another one of these very interesting use case specific things where it's like Frontier model plus a whole bunch of other stuff that is like, [44:15] optimize for like giving... [44:17] humans high quality recommendations for actions that they can take. And then when you click accept on the action, like you, you have reasonably high confidence that, you know, it's going to work and you, you know, you're going to do it. [44:30] haven't made another set of problems for yourself. [44:34] And so I'm like, I'm, [44:36] I'm guessing, too, you all see this in your portfolio companies. There seem to be a bunch of companies out there right now that are doing exactly this and that it's useful and working. Yeah. [44:46] Well, it's interesting because we hear... [44:49] One of the things that we hear... [44:51] fairly consistently from the companies that are further on in their AI journey. You know, everybody kind of starts in the same way where they start playing with OpenAI and then maybe they start using some of the other proprietary foundation models that incorporate some open source models and maybe they have some of their own stuff. There's a vector database in there somewhere. Yep. From an architectural standpoint, it feels like people tend to go on a...

45:10-46:41

[45:10] Not quite the same journey, but journeys that sort of rhyme. [45:14] But then what we hear from them when they're [redacted address], [45:18] is there's kind of this massive 80-20 rule at play, and maybe it's a 98-2 rule. [45:24] but you can automate most of a task pretty quickly and pretty effectively. [45:29] but getting it to the point where it's actually... [45:31] end-to-end, running autonomously. [45:35] in a way that is compelling and consistent enough that you can actually trust it [45:40] Kind of that last mile, that last couple percent that makes you really trust it. [45:44] Yeah, it seems like that's been pretty elusive for a lot of tasks. [45:48] One of the things that we're really curious about is, okay, well, when do the foundation models themselves... [45:53] you know, get good enough to knock out that last 2%. [45:57] Or is that a domain-specific thing? And that's really the job of the software vendor who lives on top of the platform to figure out that last 2%. [46:05] Look, I think it's going to be both for a while. Like, the two things that I think you can trust... [46:12] Thank you. [46:12] Yeah, I know you guys are probably going to ask this question at some point, but... [46:18] Like, we're in... [46:19] despite what other people think we're we're not at diminishing marginal returns on scale up um [46:26] And, like, I try to... [46:28] I try to help people understand, like, you know, there is an exponential here. And, like, the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them.

46:43-48:21

[46:43] And so the next sample is coming, uh, [46:46] And, like, I can't tell you when and I can't predict exactly how good it's going to be, but it will... [46:53] Almost certainly. [46:56] B... [46:58] better at like the things that are riddle right now where you're like, oh my God, like this is a little too expensive or it's a little too fragile for me to use. Like all of that gets better. Like it'll get cheaper and like, you know, things will become less fragile. [47:12] and then more complicated things will become possible. That is the story of each generation of these models as we've scaled up. [47:20] And so, like, you know... [47:22] We even think about this inside of Microsoft, and one of the category errors that our own developers who are building these AI products can make is get too convinced that... [47:32] The only way to solve my problem is I have to go take the current frontier and supplement it with a whole bunch of things. But... [47:41] Which you do have to do, but you want to be very careful architecturally when you're doing that. [47:46] that it doesn't prevent you from taking the next sample when it arrives. [47:50] So you just want to architect these applications where when the new goodness comes, you can go plug it in. [47:56] And, like, you'll have to go optimize that as well. Like, I think that's just sort of the grind that we're on. Um... [48:02] But, like, you just... [48:04] Like, the thing that was killing us internally is... [48:08] I would have... [48:10] teams inside of the company who would look at a frontier model and say, oh my God, there's no way that we can ever deploy a product on top of this because like this is fragile and this is too expensive. And so,

48:21-49:52

[48:21] Please give me giant pools of GPUs and let me go spin up a big team doing a very... [48:28] you know, tailored version of this and like we're going to build a specific model and like and yeah, they would go off and like spend a whole bunch of money and like the thing would be. [48:37] a little bit better cost-wise at the same level of performance as the current Frontier. And then the Frontier would snap to the new point. [48:47] and it would be just doomed. [48:50] And so like you just architecturally don't want to get trapped by that, I don't think. I mean, like that'd be my advice I'd give to everyone. [48:59] is like just... [49:01] give yourself the flexibility to snap to the new... [49:05] to the new frontier when it emerges. [49:08] And that lets you preserve all of your skepticism. You can believe all you want that the new frontier is not coming. Like go – [49:16] Yep. [49:17] Read your favorite Twitter troll that says it's all over and a sham. And like, and like, but, but just give yourself the option that, you know, maybe, maybe what's been happening for six years now is going to continue. Yeah. [49:31] Well, hearing that we are not at diminishing returns to scale, I'm going to count that as good news. And so let's stay on the theme of good news. I know that you're a short-term pessimist, long-term optimist. [49:45] Can you give us some of the optimistic point of view for where we're heading in this world of AI? Like, what are some of the things that you're most excited to see?

49:53-51:24

[49:53] in the world in five or 10 or 15 years or whatever you count as a longer term horizon. [49:58] Well, look, I think the thing that everybody ought to spend some time thinking about is where are the gnarliest zero-sum problems that we have in society? [50:10] Like, where are the things where, like, we just are fighting with one another or, you know, like we are... [50:17] immiserating people because uh whatever it is that people need there doesn't appear to be enough of it uh [50:25] And I think for a good number of those things, like what you have to have to turn them into non-zero-sum games to... [50:34] to create abundance and to relax some of these constraints is you have to have... [50:41] technological breakthroughs. It's like the only thing that reliably that's ever turned [50:46] zero-sum to non-zero-sum in human history. It's like some tech has to come along that lets us [50:52] lets us have more. [50:54] Um... [50:56] You know, whenever tech comes along and like, [50:59] creates more, it doesn't mean that the more gets, uh, you know, equitably and uniformly distributed. Um, [51:06] and like I think there are real conversations to go have about that but like [51:11] what you do want is the more, uh... [51:14] and you want it to [51:16] be directed at things where we're just having a tough time right now. [51:21] I'll tell this story.

51:25-52:56

[51:25] that I've, I've, [51:26] told a couple of other times recently, but you know, my mom, uh, like I grew up in rural central Virginia, my mom's 74 years old and like, she's, uh, suffered from this, uh, thyroid condition called Graves disease for 26 years. Uh. [51:39] And so, you know, when you have Graves' disease, your thyroid is hyperactive. [51:44] you're generating too much thyroid hormone. And so they go in and like irradiate your thyroid gland to reduce its activity level. And then you take hormone hormone. [51:54] replacement therapy to like upregulate your hormones for the rest of your life. [51:59] And so she was having some blood pressure issues and her doctor, you know, dorked around with her dosage of this hormone medicine. And then... [52:08] Like she just had... [52:10] like some serious health issues as a consequence of that, that landed her in the ER in rural central Virginia, like six times, like in a pretty, um, [52:20] yep [52:21] short number of weeks and [52:25] Yeah, the interesting thing there was the first time she went to the ER, like she was presenting all of these cardiac symptoms, and it was pretty clear they hadn't even read her chart. [52:34] like it hadn't registered on the, [52:38] she had Graves' disease, and, like, the thing that they needed to do was, like, go order a TSH panel to see what her, like, [52:44] thyroid hormone levels were. [52:47] Like, if they'd done that right away, like, they would have said, okay, like, we got to go, you know, adjust your... [52:54] Madison. [52:55] Um...

52:57-54:30

[52:57] Yeah, and I'm not even ascribing ill intent. This is a healthcare system... [53:03] That is... [53:05] egregiously overburdened. [53:07] Like, this is not a place where, you know, like, you've got this influx of, you know, talent, like, coming into... [53:15] yeah this part of the country and and like it's a lovely part of the country like i i i i like love it um [53:22] I'm not criticizing anything. It's just they have an aging population and they don't have enough young people coming in to be things like doctors to help [53:33] this healthcare system keep up with all of the challenges that they have. [53:38] if some of those doctors had had access to GPT-4 and like it was an approved product use, [53:47] All they would have [53:48] What it needed to do was put the symptoms that she was presenting in... [53:52] and her [53:54] medical record [53:56] and it would have [53:57] said, hey, she needs a TSH test. [54:00] And if you put the TSH test result in, [54:03] the recommendation it would add was like, [54:06] look at the dosage of her hormone replacement therapy that she's on. [54:11] And like if [54:12] like this isn't theoretical like I did this um [54:17] It could have helped... [54:18] alleviate a massive amount of her suffering. [54:21] right now. [54:22] And, like, I think the only reason, like, that she... [54:25] got out of this tough situation that she was in was...

54:30-55:57

[54:30] I had to intervene. [54:32] Like I've... [54:33] sent her to a specialist that was [54:35] 400 miles away that she couldn't have gotten into without special [54:42] And, like, it's ridiculous. [54:45] Yeah, there are so many 74-year-old old southern ladies or old midwestern ladies or, like, folks who are going through similar sorts of things who do not have someone who's going to go in and intervene. [54:56] on their behalf who are suffering unnecessarily because we're not even deploying the technology that we've got right now. [55:04] And like it's just going to get better. [55:07] And so that's the thing that I'm excited about. It's like, let's go... [55:12] Let's go give kids a leg up in education. Let's go fix some of these crazy problems we've got in a health care system that is just, you know, absent technological intervention is just going to get more strained over time. [55:26] Oh. [55:28] Yo, let's equip [55:29] our scientists with better tools so that they can... [55:33] find better carbon capture catalysts so that we can design... [55:40] safer modes of transportation so we can more quickly get to a post-carbon economy. It's just so many things we can go do with this stuff. [55:50] Um... [55:51] I'm super, super optimistic about it. [55:55] and so you know like it just kills me like

56:01-57:33

[56:01] what we don't want to do is get distracted on, you know, like, [56:08] that are... [56:09] Um... [56:11] just you know effectively noise uh in the ecosystem right now um [56:17] And, you know... [56:19] We're getting so sideways sometimes with [56:23] Um... [56:26] you know, like this... [56:27] model like said something that hurt my feelings or you know and like and i don't i don't want to dismiss like you know people's feelings matter and like you know i'm not trying to be a jerk here but [56:38] I do want to make sure that as we're thinking about how [56:43] We develop and deploy the technology that we are always remembering, like what the cost of not deploying the good is, because that that is a high, high cost. [56:56] Mm-hmm. [56:58] Very well put. [56:59] Yeah. [57:00] Yeah, I'm here. [57:03] Probably a good note to end on, I think. [57:07] Well, we have one other question that we like to ask people, and it's a quick one, so I'm going to ask you this one. [57:13] Who do you admire most in the world of AI? [57:16] You know, I was... [57:19] I was thinking about this, so I think it's... [57:22] Ray Salominoff, um... [57:24] who was one of the [57:27] one of the folks who was at that Dartmouth workshop in the 50s where, you know,

57:35-59:06

[57:35] Um... [57:36] Marvin Minsky and... [57:40] Simon and a whole bunch of folks convened that summer and they were all interested in machine intelligence and they coined the... [57:49] the term artificial intelligence at that workshop. [57:53] And, like, the reason that Salaminov is so interesting, and, like, not many people, I think, know who he is outside of computer science, is, like, he was the one from the very beginning who was... [58:06] pushing on this whole notion that [58:08] probabilistic methods were going to be very important for the development of AI. And... [58:14] Yeah, when I was in grad school... [58:17] in the 90s. [58:21] See you. [58:23] prevailing... [58:24] academic theories about how we were going to get to AI were all about like, okay, well, you know, like there's some [58:31] you know, minimalist calculus about human intelligence and, like, we've just got to figure it out and it's got to be, you know... [58:39] rule-based systems and ontologies and, you know, symbolic reasoning and like a bunch of... [58:47] Like... [58:48] Stuff where, like we do in physics, we were going to have to divine the inherent simplicity in the system, figure out what the rules are. And as soon as we understood the rules, we'd be able to make software. [59:01] emulate human intelligence. And Slominov is like, no, no, no, this is just a...

59:06-1:00:26

[59:06] Intelligence is an extraordinarily complicated... [59:10] phenomenon. And like the only way that we're ever going to really get there is modeling it with probabilistic methods. And he was right. And like he was... [59:21] he was judged wrong for a very long time. [59:26] um, [59:27] so I I [59:29] really admire, uh, [59:31] his contrariness, um, [59:35] all the way back in the 1950s, and he stuck with his beliefs his entire career. [59:42] I don't know whether Ray actually lived to see how right he actually was. [59:49] Mm-hmm. [59:50] That's a great answer and a great story. Thank you, Kevin. [59:53] Yep. You're very welcome. Thank you. [59:56] Music playing

Want to learn more?