OpenAI’s IMO Team on Why Models Are Finally Solving Elite-Level Math
In just two months, a scrappy three-person team at OpenAI sprinted to fulfill what the entire AI field has been chasing for years—gold-level performance on the International Mathematical Olympiad problems. Alex Wei, Sheryl Hsu and Noam Brown discuss their unique approach using general-purpose reinforcement learning techniques on hard-to-verify tasks rather than formal verification tools. The model showed surprising self-awareness by admitting it couldn’t solve problem six, and revealed the humbling gap between solving competition problems and genuine mathematical research breakthroughs. Hosted by Sonya Huang, Sequoia Capital
- Published
- Published Jul 30, 2025
- Uploaded
- Uploaded Jun 11, 2026
- File type
- Podcast
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] The pace of progress is really [00:02] I think you see it so clearly in math. And I think Alex tweeted about this, where even a few years ago, these models were struggling with grade school math. And then we-- [00:12] I remember even in 2024 that GSMAK was used as the standard eval when everybody would release a model. And then it was math for a short period of time, and then it became AIME, and then it became USAMO. And the case that it's just blown through all of these math benchmarks is really astonishing. [00:31] *music* [00:48] Today we're joined by Alex Wei, Cheryl Hsu, and Noam Brown. [00:51] the trio behind the OpenAI model that just achieved gold medal performance at the International Math Olympiad. [00:56] The IMO Gold is one of the most important milestones in the race to artificial superintelligence, and what makes this breakthrough particularly fascinating isn't just the mathematical chops, but the underlying architecture. General-purpose techniques for scaling test-time compute and handling hard-to-verify tests that extend far beyond competition math. [01:13] We've now gone from models that can reason about math for a tenth of a minute just a year ago, to systems that can reason and concentrate on the order of a hundred minutes. [01:21] The hope for superintelligence is that as we scale reasoning to thousands or hundreds of thousands of hours, we can begin to solve humanity's greatest unsolved problems in math, the sciences, and more.
[01:32] Alex, Cheryl, and Noam joined us on Training Data to talk about their approach and share some of the behind-the-scenes fun and learnings behind this historic result. [01:40] Enjoy the show. [01:42] Alex, Cheryl, Noem, thank you so much for joining us today. We have with us the team behind OpenAI's first gold medal at the IMO. Congratulations to you all. It's a momentous achievement. Thanks. Thank you. [01:56] I'd love to get into a little bit of the origin story behind this. I know that, you know, the IMO goal has just been this [02:03] you know, elusive thing that everyone in AI has been chasing for a long time. I remember [02:07] Back when Sam pitched us in 2021, it was on the slides. And I remember thinking, oh, that seems really far away. I'd love to understand the more immediate origin story for this specific effort. When did you guys start thinking about this? And how did it come about? [02:21] Yeah, I think it's like one sort of like [02:24] something that we've been thinking about for a long time. I remember in my, you know, first week at OpenAI, Noam asked me, like, you know, when do you think the model will get IMO gold? I thought, you know, like, it was really unlikely in 2025. But I feel like it's something that's always been on our minds, as you said, like, you know, Sam, like many years ago as well. But this specific effort, um, [02:48] I think it was really only like... [02:51] you know, [02:52] maybe like a couple months since like... Just a couple months. Like the sort of last sprint to like get everything ready for this year's IMO. And of course, we've been working on like improving our algorithms. The ideas for this started coming together maybe like six months ago, but like really like the last push, like, you know, we're going to try to do something for this year's IMO was only a couple months long.
[03:18] It's amazing. And how big is the team involved? [03:21] I mean, so it's we're like, you know, definitely [03:23] building on a lot of folks' work at OpenAI. This is not possible without a lot of help from people from [03:31] You know, the people working on inference and the scaling org, the people who do the pre-training and the RL training. But in terms of the core team, I would say it's just the three of us. So it was a super small, scrappy effort here. That's crazy. Just the three of you. Also, it was mostly Alex. Alex had been working on this technique for a while. And Cheryl and I were happy to help out as we were getting closer to the IMO to make it a reality. [04:00] That's so cool. And how does this even come about? Like, do you self-direct and self-choose? You know, I want to work on IMO Gold and I'm going to get us there. And like, how does, you know, how do you even raise your hand to work on something like this? [04:11] I think it was something where it just felt like, you know, maybe it's possible. Like maybe if we like, you know, push a bit for a couple of months, we can just like, you know, get there. One of the nice things about OpenAI is that I think the researchers are really empowered to, you know, [04:25] do the kinds of research that they think is impactful. And, you know, so Alex, you know, had this like pitch that like, Hey, you know, there's this new technique that I, you know, I think could, could help out a lot. And, you know, honestly, like there's a decent amount of skepticism, you know, I think some people were supportive, uh, but you know, [04:39] Everybody felt like we should give them the, you know, [04:42] the freedom to be able to explore this and pursue it. And, uh, [04:46] And then it started showing some strong evidence. And, you know, I think people still were a little skeptical, but more people were getting excited about it. And, you know, eventually it turns into something more substantial. And I think now people are obviously very excited about it. Can you say a little bit more about the strong evidence? Like what were some of the early signs that you all were seeing that made you really lean in?
[05:03] I think it's just like, you know... [05:04] like progress on... [05:07] hard to verify tasks. [05:10] where I think previously we, you know, [05:14] a lot of RL was more focused around just like, you know, if you have like these verifiable rewards, like, you know, [05:20] What can you do? We were just seeing more improvement on these harder to verify tasks is, I think, what made us excited. Maybe on that front, how did you even verify that the results you had? [05:32] were right, [05:33] And I saw that you published the proofs on GitHub, but can you just say a little bit more about how you even know that you've discovered the answers? Because my understanding is that they're done a bit differently from how a human might answer them. [05:49] Yeah, I do think like, you know, like the style of the model outputs is a little atrocious. Atrocious isn't the word I was going to use. Is it creative? Like an alien language? Yeah, it's a little. Yeah, I think it was. I think, you know, we could have. [06:03] I think it was a very small scrappy effort. And so we didn't optimize as hard for human readability. But that's something that-- [06:13] we know how to do like you know we can like we can do the same stuff to like in the same way that like you know chat gpt like is very readable we can we can do the same things here [06:22] Do you even need to optimize for human readability? Like, is that even important? [06:26] I think if you're showing this to humans, they prefer readability. We were actually discussing, like, you know, we got the proofs, like, okay, because you could actually just, like, run them through ChatGPT and, like, ask ChatGPT to, like, rewrite them in a more readable way. And it's, like, the proofs are still correct. They're just, like, a little bit more readable. And we were like, oh, should we, when we post these online, should we, like, post the more readable version that's, like, run through ChatGPT? Or should we just post the raw version? And we decided, you know, I think for full transparency, we'll just, like, post the originals and people will figure it out.
[06:56] You guys have a bunch of IMO medalists and participants in the staff at OpenAI, right? Do you guys like Moonlight in your spare time grading the answers that the model produces? [07:06] During the testing, we read a lot of samples. But for grading these specifically, we hired external... [07:16] former IMO medalists. So each proof was graded by three medalists and for each one they reached unanimous consensus on the correctness. [07:27] I should also say that for me, I don't know about Cheryl, but for me, the proofs are beyond my ability to comprehend. I was a math major and I never really did competition math. And I already, the stuff that this model is writing about is beyond my ability to grade. [07:43] Yes, and I think that's what makes it even more amazing, just how smart the model is. [07:49] Totally. What about problem six? How come none of the models at this year's IMO had a solution and your model didn't even attempt problem six? Can you say more about what makes that problem? And traditionally, problem six is always the hardest at the IMO. Is that right? Yeah, I think problem three or problem six usually. Okay. [08:05] So just say a bit more about what made problem six different and what you learned from, you know, I think you tweeted that the fact that your model knew that it couldn't solve problem six was one of the things that gave you hope. So just say a bit more about that as well. [08:17] For problem six, it's just a really tough problem. I think if you gave me months to think about it, if you even gave me a big hint about the main idea to solve problem six, I don't think I'd be able to get there. It's just like...
[08:32] crazy like tough problem where there's so many things you can do and there's like you know very narrow path to the [08:38] you know, finding the proof. [08:40] And I think, you know, it's one of those things I think... [08:43] Like math is just hard. [08:45] Yeah, and we threw a lot of compute at problem six, but I think it was good to see the model doesn't try to hallucinate or try to just make up some solution, but instead will say no answer. I mean, it is kind of disappointing when you can like, [08:59] it's done so much work just to like say no answer, but I think, you know, it's, [09:04] good that it actually like acknowledges that. [09:07] Yeah, that's an amazing level of self-awareness of your own life. [09:11] kind of [09:12] ceiling because I mean [09:14] I remember at least a couple of years ago with these models, they'd always try to be helpful and make up an answer. [09:20] To see this is just like I think an amazing level of self-awareness from these models. [09:23] When we released the reasoning models, I talked to some professors, mathematicians, computer scientists, and I was asking them, are you finding value in these models? And the answer was frequently yes, but the one thing that they would complain about is if they would ever ask the model a question that it didn't know the answer to, it would just like... [09:41] output a very convincing but wrong answer. They would have to go through it very carefully to figure out, [09:48] was it exactly correct? Or was there like, you know, some flip of an inequality or something that the model snuck in there? And it's nice to see that [09:56] this model like [09:58] If it doesn't know, it will just acknowledge that it doesn't know. [10:01] at least more frequently.
[10:02] I guess internally, did you guys have like a betting, like a poly market or something going on whether you guys were going to win... [10:08] IMO gold this year and like what was the internal vibe? [10:11] I think we felt like we had a strong shot. Um... [10:15] But I think we also felt that it wasn't like a lock where there's definitely a distribution of questions where... [10:23] the models [10:24] would probably struggle more than the humans. [10:26] But then there's another distribution of questions where the models would be really, really strong. And I think this year was somewhere in the middle where, you know, like problem six, like, [10:38] I think it's just [10:39] out of reach of state-of-the-art models today. [10:43] And I think maybe in general, like, you know, [10:45] like these hard, like combinatorics problems, which problem six was, I think more challenging. [10:52] And that's still something that the models struggle with. What is it about combinatorics that makes it challenging versus... [10:58] You know, like geometry, for example, which seems like you guys do well at. [11:02] I think for combinatorics, it's probably because it's a little more abstract, a little more high dimensional. [11:09] And I think oftentimes, like, combinatorics problems sort of require, like, leaps of faith or leaps of insight to, [11:18] that [11:19] you know [11:20] the models aren't as good at, I think the models are more good at like, you know, problems that require like a bunch of smaller steps, for example. [11:29] What about from your guys' perspective? Was the internal vibe optimistic or not that you all were going to get gold?
[11:33] I feel like it wasn't super optimistic. Like, I think they definitely knew that, like, it could happen. But I think, like, even, like, a month or, like, two months back, it definitely felt like we'd have to, like... [11:44] improved quite a bit, which I guess we did. I remember I was talking to [11:48] another researcher at OpenAI, like maybe two months before the competition, and we were like, [11:53] you know, saying like, okay, if we were to bet, you know, I'm a betting man. I'm happy to bet. Yes, you are. And I was saying like, what odds would you take? Because I was willing to bet. I'm like, we were going to get gold here. And he was like, there's really no chance. And, you know, and, you know, [12:08] He said that he would... [12:10] Gladly take two to one odds against the model winning. So less than one third chance. [12:18] But he didn't want to bet against us. So, you know, he thought it would be bad vibes to bet against the team winning. [12:24] So you didn't go for the bet. So did you make some pocket change, Dom? I wish I had. I mean, you knew that. [12:33] Because, I mean, you guys were at, I think you tweeted 12% on Amy's. [12:37] like 15 months ago, right? So even though you never want to bet against scale and open AI, it's just, [12:43] it's just a [12:44] astounding slope of what you all have accomplished here. [12:47] The pace of progress is really... [12:49] I think you see it so clearly in math. And I think Alex tweeted about this, where even a few years ago, these models were struggling with grade school math. And then we-- [12:58] I remember even in 2024 that GSMAK was used as the standard eval when everybody would release a model. And then it was like math for a short period of time. And then it became AIME. And then it became USAMO. And...
[13:12] the [13:12] pace that it's just gone blown through all of these math benchmarks is really astonishing. [13:17] Yeah, I remember trading a mod on GSMA 8k two years ago. [13:21] Yeah, we're past those days, huh? Saturated the evals. What's next? Do you think, I mean, at this point next year, you think we'll be solving Millennium Prizes? [13:30] I think those are still very far away. I think on one hand, you think about how much math progress has been made since GSN-19, [13:39] M8K, which is like, you know, like... [13:42] just like two years ago was sort of a standard that people were trying to push on. You know, that's like an astounding level of progress. But also you think about like how much time it takes for people like, you know, GSM 8K problems, they're like grade school math, you know, it takes someone good at math like a couple seconds. And now we've gone from like a couple seconds to something that takes like, you know, these brilliant students an hour and a half per problem on average, you know, the IMO is, you know, the most, you [14:08] three problems, four and a half hours. And then [14:12] Research math is going to be [14:15] Like, you know, these same, you know, brilliant students, they've grown up, they're researchers, it's going to take them like 1500 hours. So there's like, you know, 1000x of like more thinking time. And then Millenium Prize problems have taken entire fields like, you know, [14:32] people's lifetimes. [14:34] of thinking and you know we still don't have much progress on most of those and so it's [14:39] On one hand, like, you know, [14:41] super exciting that we've made so much progress. On the other hand, it's sort of also humbling to see how much further
[14:48] you know, [14:49] progress has to go from like an hour and a half to like, you know, tens of thousands, hundreds of thousands of hours of human thinking. [14:55] Totally. Noam, I think you deserve a lot of credit for seeing the future on this. I remember you visit us. [15:00] before you even joined OpenAI, [15:02] talking about the results from gameplay and what happens if you let a model think for [15:08] hours and tens of hours and [15:11] Credit you, you've really seen the future on this. [15:14] Thank you, yeah. I mean... [15:16] It's exciting to see it actually happen. Yeah. [15:18] What are the hard things that happen as you scale compute? [15:22] time, inference time from the Order of [15:25] 0.1 minutes to the order of [15:27] 100 minutes. [15:29] Um... [15:30] I guess at a high level because not everyone [15:32] Most of our listeners are not AI researchers, but what are the hard things that happen to keep the model... [15:37] on the rails, so to speak. [15:39] I think we can point to you is like pretty clearly a challenge is that if you have the model thinking for like 1500 hours, [15:47] then in order to eval it, you have to have it think for 1,500 hours. And so eventually... [15:52] the evaluation of the models becomes a significant effect [15:57] you know [15:58] a speed bump on progress. So we're not really at that point. If we have the model thing for an hour and a half, it's no big deal. We can run those tests. But to run a test where the model is thinking for a month, [16:10] It takes them months to finish that test. And so progress can only advance... [16:13] so fast if you want to wait for those kinds of results. [16:16] I think both of you are on the multi-agent team. Help me understand what the role that multi-agent systems play in this is.
[16:24] Yeah, so in addition to having the model like, [16:27] think for a very long time and, you know, [16:30] make a lot of progress on hard to verify tasks. This also involved scaling up parallel compute. [16:37] And so there's a multi-agent component to that. We're probably not going to be able to go into too much detail about the exact techniques. But that was certainly like... [16:47] one way that we were able to scale up test on compute for the IMO. [16:50] By the way, one thing I'll add for the multi-agents, you know, scaling parallel compute thing is that the way that we did it, you know, we really tried to prioritize generality in our techniques. I, for example, like, you know, I worked on AI for poker. Alex and I actually both worked on AI for diplomacy. So Alex was on the team that worked on Cicero. Cicero, yeah. Yeah. Nice. And, you know, those were... [17:13] projects that I'm really proud of, but they were also projects that we spent [17:17] years working on to achieve that result. [17:21] with the [17:22] pace of AI progress being so fast, it felt like that wasn't the best use of time to develop a very bespoke system that could only do that one task. And so... [17:31] we all like really prioritized general purpose techniques and all this. And, you know, the techniques that we used for, um, [17:38] everything for scaling up the thinking time, for working on hard to verify tasks, and for the parallel compute. [17:44] are all general purpose techniques that [17:47] we're [17:48] either planning or have used for other systems as well. [17:53] And is that the reason you all chose not to do this in lean? Like my understanding is the official kind of IMO AI track was a lean...
[18:01] interpretation this year. Is that why you guys chose not to go with lean? [18:05] Yeah, that's right. I think there is a lot of value in Lean as a tool. [18:11] You know, mathematicians find it useful, for example. But the priority for us is really general purpose reasoning capabilities. And Lean has its limitations. And so that's why we wanted to prioritize natural language. My layman's understanding is Lean is a formal verification tool. Does your result here basically say that like informal verification with scale can, you know, can perform at the same level or even surpass formal verification? Is that the right takeaway? Yeah. [18:37] I wouldn't say you, I would not say that's the right takeaway. I don't know, Alex, do you have thoughts? I'd say that these are just like, you know, sort of two like orthogonal sort of components here where like, I think, [18:48] I think we found the informal math sort of an interesting problem because it represents sort of like a kernel of difficulty around like scaling up, test time compute, hard to verify tasks. [19:04] that represented something [19:07] difficulties from a very broad set of tasks that we were interested in from a general purpose standpoint. I think Lean is a little bit more narrow, where I think a lot more of the world can be approached with informal reasoning than is formalizable. I don't think there's anything wrong with... [19:25] narrow AI. Like narrow AI can be very effective and obviously like far surpass general purpose AI in certain domains. And I think the right way to think about it is in the same way that
[19:36] Humans, human mathematicians find a lot of value in Lean. [19:39] um [19:40] general AI can be compatible with, you know, a more narrow system that's focused on, like, formal technology [19:47] formal mathematics. And [19:50] The combination, I think, can be better because of it. [19:55] I think I saw on Twitter from multiple folks at OpenAI, and I think you guys have mentioned this as well, that... [20:01] You know, this system was built with a very similar approach and infrastructure to many of the recent launches from OpenAI. Like we had ESA... [20:11] from the ChatGPT agent launch on the podcast last week. [20:14] Can you say a little bit more about what the similar kind of foundation and approach is? [20:18] I think, like, infrastructure-wise, like... [20:21] I mean, like we all kind of just use the same infrastructure. But I think as far as like the core of this question, like, you know, like, like Noam, Alex said, there's nothing like that's very bespoke to IMO here. [20:51] Keep improving, Agent, Keep improving, ChatGPT, and everything else. Tell me about the actual experience of IMO Day. What was it like? Yeah, I mean, we were waiting for the problems to come through. Once the participants finish the exam, then they get posted. And so we plugged the problems into...
[21:12] our model and uh that was around like i guess pretty late at night maybe like 1am or something and honestly i went to sleep because it's like you know it's 1am i'm not gonna stay up for four and a half hours to like see the output i'll just wake up in the morning and see uh but i think these two like actually stayed up and like um got to watch the model and um [21:30] You see it come in in real time. [21:32] yeah it was it was a lot of fun did anybody call now i'm like wake up wake up we got we got this [21:38] There were a couple moments where Alex was so exhausted that he decided to take a nap, but we told him, okay, just make sure your phone is on silent so that we need to wake you up and call you. And at one point, we did actually have to call him, but I don't think he woke up. [21:54] That's awesome. It must have been such a thrill and such a high, especially for that to come through at like... So you started at 1 a.m., so you must have known... [22:01] Like 9am then? [22:03] Oh, it's one and a half hours. Four and a half hours. Yeah, for the first day. Okay. [22:07] Yeah, I don't know. I mean, we can kind of see the problems come in. So I just feel like making sure the systems are like staying stable and Alex is like over there reading and seeing whether or not how the model is doing. So you were doing the you were doing the live human proof checking to see if it was actually. I was, you know, you're naturally very like anxious about the results. So I was just like looking at the, you know. [22:30] Like the... [22:31] partial progress the model was making, you can sort of observe that. [22:38] And then I also hand-checked things. We were going to send these out to the graders, but I also just hand-checked them because I was so curious. OK, well, call me next time. I want to come hang out there for that. I'm not going to go to sleep. That sounds awesome. One of the cool things about these models is I can't understand the proofs. But when you see the model thinking about it,
[23:02] it will express its uncertainty or its confidence in natural language throughout the process. And it will just kind of say words that will like hint, [23:09] And it's like, [23:10] If it's really confident that it figured out, I'll say good a lot. And if it's unsure, it'll throw in a lot of question marks. And so it's cool that I can kind of follow along and see how the model is feeling about its progress, even though I can't really tell if it's got it correct or not. [23:29] You get the dreaded seems hard. [23:34] You got that on problem six? I just got that a lot. No progress. Hard. Seems hard. Keep going. Too bad. [23:44] Wonderful. I guess looking ahead, [23:47] You've gotten like the pinnacle results in competition math. I guess you can go do Putnam next year, but you're basically at the top, right? And so what's next? [23:56] Yeah, so actually for Putnam, the problem's... [24:01] I think since the exam is like, you know, [24:04] less time per problem than the IMO, and it's a little more knowledge heavy. We actually found in our evals that the model [24:12] you know, was like, [24:13] Really, really good at putting them problems like better than it was at IMO problems. [24:17] And so I think [24:19] you know, the frontiers here are really not about, like, you know, these, like, very, like, time-boxed competition problems anymore, but it's about, like... [24:27] Problems that really take like longer periods of time and more deep thinking to solve. [24:32] It's really cool.
[24:33] Okay, so you're going to start proving novel theorems now? [24:36] I think there's this very intimidating gap between these very... [24:43] time box competition problems than like a [24:46] you know, real research breakthrough, which, you know, takes like a year's worth of work, like a year's [24:52] That's like on the order of like 1500 hours instead of 1.5. Yeah, totally. I guess relatedly I was listening to the Demis podcast last night and he mentions that, you know, the hardest thing is actually coming up with the interesting problems to solve. And I think that's a good thing. [25:08] I'm curious if you all agree with that. [25:11] I think there's some truth to that, that... [25:13] You know, these models are really good now at solving [25:18] these problems coming up with them is [25:20] still... [25:21] a challenge [25:23] But I think it's also worth... [25:26] noting the incredible pace of progress that we're seeing. [25:30] and [25:31] You know, there's always a next hurdle, you know. [25:34] And [25:36] Originally when LMS came out, it was like, well, how do we get them to reason? And then we got them to reason, but then how do we get them to reason on hard to verify tasks? And now they can reason on hard to verify tasks. And I think the next... [25:46] hurdle is going to be like, okay, well, how do we get them to come up with these novel questions? You know, like even creating an IMO question [25:53] is a challenge. And it takes a lot of expert mathematicians, a lot of work to do that. [25:58] But I don't see any... [26:01] fundamental barriers that
[26:03] block us from getting there. [26:05] I love that. Do your results in math, do they just fully generalize to, you know, you're just going to be better at scientific reasoning, you're going to be better at, [26:14] general reasoning, you know, does being great at competition math make you [26:19] you know, be great at everything else. [26:22] I think how we approached this was not like, you know, we should be like, you know, great at competition math. But really, I think it's like we were focused on like developing like general purpose techniques. Yeah. [26:35] To make a [26:36] like reinforcement learning better. [26:39] And I think those we are very excited to like [26:43] improve our models in other domains beyond math. [26:47] And so, and you know, hopefully like make [26:50] models more useful for like you know [26:52] us in like everyday usage. This is like a you know it's a pretty late breaking result. It's honestly it was a surprise even to [27:00] people internally at OpenAI. And so, um, [27:03] The next step is to incorporate this more broadly into our models and improve the reasoning capabilities [27:10] across the board. [27:11] But, you know, it's going to take some time to [27:14] go through that process and deploy it to the world. [27:18] um... [27:19] I think it's going to come, but yeah, it'll just take a little bit more time. [27:22] Is it harder for these maps? [27:23] these models to do the IMO or the physics Olympiad? [27:27] I think definitely the Physics Olympiad because the Physics Olympiad has, I think, like an experimental section. Oh, we need to solve robotics first. I didn't realize that. Okay.
[27:40] I thought I was just on a piece of paper. Yeah, so... [27:43] I think the model will probably be good at the on the paper part, but yeah, I think there'll be a little bit of time before it can do the experiments. Not with like a world model. Okay, cool. Are you going to release this model for customers to play with? Rulof's son is a Math Olympiad kid and he's like, "I want access to the Math Olympiad model." [28:06] Will people be able to play with this? [28:08] So we want to make this accessible to mathematicians to use. We're still trying to figure out the exact details of how we make that happen. [28:15] I think it's really cool that we've developed this system that is [28:18] incredibly good at math and it makes sense that we want to see what mathematicians can do with it. I've actually already been [28:24] emailing with the Stanford professor, mathematics professor. He actually emailed me [28:29] like about a year ago before... [28:31] we announced a one and he was like, Hey, do you want to do a collaboration on like solving hard math problems? And basically what I told him is like, [28:38] I think we just got to advance general reasoning capabilities, and eventually they're going to be able to help you with your hard math problems. And I think that's actually the most promising route to getting there. He was a little skeptical, but every reasoning model release, he's emailed me with a follow-up and is like, [28:52] Can it solve this problem now? [28:54] and I've been plugging them in. [28:55] And I don't know what the output is, but I email it back to him, and he says, yeah, that's wrong. [29:01] And he emailed me a follow-up this time with the same problem, asking, hey, can it solve it now? It still can't solve it, but at least this time it recognizes that it can't solve it. So I think that's a big step. But we're curious to see if there's a lot of other...
[29:15] problems out there that mathematicians, um, [29:18] want to challenge this model with and see if it can take them on. [29:21] Amazing. [29:23] Congratulations to you all. I think this is a momentous result that the entire field has been waiting for for a very long time. And the fact that it was accomplished by a team of three people in a span of two months, it's [29:33] It's extraordinary. Congratulations and thanks for joining us on Training Data. [29:37] thank you thanks for having us [30:03] Thank you.
Want to learn more?