Securing the AI Frontier: Irregular Co-founder Dan Lahav
Irregular co-founder Dan Lahav is redefining what cybersecurity means in the age of autonomous AI. Working closely with OpenAI, Anthropic, and Google DeepMind, Dan, co-founder Omer Nevo and team are pioneering “frontier AI security”—a proactive approach to safeguarding systems where AI models act as independent agents. Dan shares how emergent behaviors, from models socially engineering each other to outmaneuvering real-world defenses like Windows Defender, signal a coming paradigm shift. Dan explains why tomorrow’s threats will come from AI-on-AI interactions, why anomaly detection will soon break down, and how governments and enterprises alike must rethink defenses from first principles as AI becomes a national security layer. Hosted by: Sonya Huang and Dean Meyer, Sequoia Capital 00:00 Introduction 03:07 The Future of AI Security 03:55 Thought Experiment: Security in the Age of GPT-10 05:23 Economic Shifts and AI Interaction 07:13 Security in the Autonomous Age 08:50 AI Model Capabilities and Cybersecurity 11:08 Real-World AI Security Simulations 12:31 Working with AI Labs 32:34 Enterprise AI Security Strategies 40:03 Governmental AI Security Considerations 43:41 Final Thoughts
- Published
- Published Oct 21, 2025
- Uploaded
- Uploaded Jun 11, 2026
- File type
- Podcast
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] There was a scenario... [00:02] Well the wars... [00:03] an agent-on-agent interaction. It was a critical security task. That was the simulation that they were in. [00:09] But after working for a while, [00:11] One of the models decided... [00:13] that they've worked enough. [00:14] and they should stop. [00:19] It did not stop there. [00:21] it convinced the other model [00:23] that they should both take a break. [00:25] So the model did social engineering on the other model, to another model. But now try to think about the situation where you actually as an enterprise are delegating [00:34] an autonomous workflow that is critical to you to complete. And the more complicated and capable machines are ultimately going to be, the more of these weird examples we're going to encounter. [00:47] Bye. [01:03] Today on Training Data, we dig into the future of Frontier AI security with Dan Lahav, founder of Irregular. [01:09] Dan challenges how we think about security in a world where AI models are not just tools, [01:14] but autonomous economic actors. [01:16] He explains why the rise of AI agents will force us to reinvent security from first principles [01:21] and reveals how the very nature of threats is shifting from, say, code vulnerabilities [01:26] to unpredictable emergent AI behaviors. [01:29] Dan also shares surprising real-world simulations.
[01:32] where AI models outmanoeuvre traditional defences, and why proactive experimental security research is now essential. His view is that in a world where more economic value will shift to human-on-AI or AI-on-AI, [01:44] solving these problems is paramount. [01:46] Enjoy the show. [01:48] Dan, wonderful to have you with us today. It's a pleasure to be here. [01:52] Awesome. So before we jump into questions, [01:55] I will just say that... [01:57] It was very hard to get in front of Dan. [01:59] I was... [02:00] trying to get in front of you for three months, probably 30 to 40 emails. [02:04] Five or six people around us who we both knew closely were pinging him all the time. [02:09] And he was still not responsive. [02:11] And I've [02:13] Basically... [02:14] learned where he was, you know, spending most of his time, and I... [02:18] Did you stalk him? [02:21] I kind of stalked him. I kind of stalked him. And eventually, we basically bumped into each other, like, not intentionally. And anyway, so we bumped into each other. I was like, Dan... [02:32] you know, you're brilliant, I keep hearing great things, please respond, let's find time. [02:37] We at Sequoia spend a lot of time in AI security, [02:40] And eventually we found time the following week. So welcome, Dan. Thank you for everything. It seems that I'm going to have to start this podcast with an apology. So sorry, Dean. Sorry, Sonia. Sorry, the entirety of Sequoia. It indeed took time. It took time. But we partnered and here we are. And you guys have done wonderful things. So it's wonderful to have you with us today. Yeah, it's a very, very happy ending. And, you know, just like appreciate you and everyone here. Of course. Of course. Okay, so let's jump into it. I'm going to start with a spicy question.
[03:09] As we recently saw, you partnered with OpenAI on GPT-5, [03:14] and [03:15] Let's kind of look forward a little bit. What is [03:19] security look like in a world of GPT-10. [03:22] Oof, spicy and speculative indeed. [03:25] so let me wrap my head around that [03:29] I think... [03:30] So obviously everything I'm going to say is speculation, projection, [03:36] But I think the way that we think about what's going to come [03:40] is trying to understand how [03:43] we're even going to produce economic value and how organizations and enterprises and people are going to consume stuff in the world at the time of GPT-10 or Cloud 10 or just like any one of the models. [03:55] Let's do our thought experiment to just clarify why we actually believe that sometime we think in the next two to three to five years, there's going to be a huge shift in the way that humans are even organizing themselves as an outcome. Security is probably going to be very different as well. [04:10] So here's the thought experiment. So imagine a situation where you work with OpenAI and you go one generation up or two generations up and you tell your parents or grandparents that you're doing work with Antopic or OpenAI or Google DeepMind on security. [04:25] I think their mind would go on to assuming that the work that you're doing is probably providing a bodyguard service to Sam or to Dario or to Demis. Because the canonical security problem of a few decades ago, you know, it's like our parents, grandparents generation was physical security.
[04:40] Because the vast majority of economic activity was in the physical realm and not in the digital realm. [04:45] And you know, after the PC revolution and the internet revolution, [04:48] we shifted the way that we're organizing and creating value. We transitioned primarily to a digital environment. And just like think about how strong of a testimony it is of how many times [05:00] you did an economic activity of value just by getting an email from someone that you may have not met. You know, just this morning, I got an email from my bank activating me to do something from a person that I've never met, maybe as a security person. That's not a great thing to say openly, but you know, just like we do that all of the time because that's the way that we interact. [05:18] in society. [05:19] So all of you... [05:21] is that soon that's going to happen again. [05:23] And the reason is [05:25] is that AI models are getting gradually so capable that a lot of the economic activity of value is going to transition to human and AI interaction, [05:33] an AI on AI interaction. [05:35] And that means that we may see soon... [05:38] a fleet of agents in an enterprise, or a human when they're doing like a simple [05:43] activity like trying to draft a Facebook post, [05:46] taking a collection of different AI tools in order to just like promote that activity that they're doing. [05:53] and essentially embedding [05:54] tools that are increasingly more capable. [05:57] and we're delegating them [05:59] tasks that require more and more and more and more autonomy, [06:02] in order to drive meaningful paths of our lives. [06:04] So we're transitioning from an age where software is deterministic to an age where this is no longer the case.
[06:11] And as an outcome... [06:13] Enterprise themselves or just like how we interact with the world is going to go to a fundamental change. [06:19] And it's clear that security is just not going to be the same. [06:23] This is an interesting analogy. [06:25] Think about it. [06:26] you know, blockbuster matrix in peace and Netflix. [06:29] The Collins version of Netflix. [06:31] So both, if you think about it, give the exact same value [06:33] to the consumer. [06:34] Either both allow you to list units of content for your pleasure and entertainment. [06:40] but clearly and intuitively. [06:42] Security for Netflix and security for Blockbuster is not the same. One was a chain that organized, you know, just like you need to go and physically rent. [06:51] on DVD. [06:53] And another one is much more of modern architecture, where you're just like, you know, streaming stuff to your home. [06:58] So, [06:59] Even... [07:00] enterprises are going to provide the exact same value in the near future. [07:03] may have like a very very different back end to how they're shaped [07:07] in this autonomous age that we're entering, which makes it clear that security as a whole is going to be very, very, very different. And we need to recalibrate to just like an age of autonomous security that's coming upon us. [07:18] You were at our AI Ascent event earlier this year, right? Do you remember when Jensen Huang shamed everybody who was there for the fact that not enough people in the room were thinking about security in a world of agents? And I remember Jensen said something about how... [07:33] You can imagine that as these agents are allowed to act more autonomously in enterprises, you should expect... [07:39] orders of magnitude more...
[07:41] security agents than the actual productive agents themselves, watchdogging and shepherding this sort of agents effectively. So unbiased here. [07:52] I agree with Jensen. I think Jensen was the first person that I've met that was much more bullish on AI security than myself. [07:59] Because, you know, in our view... [08:01] you know, you need a collection of [08:04] defense bots that are also going to be working side by side with capability bots in the next generations of how enterprise are going to be created. But indeed he gave a ratio that he thinks that was going to be 100 to 1. [08:16] And just like how many... [08:18] you know, just like defense and security bots are going to be required, out of the assumption that secure by design in AI is not going to work. So I'm not sure that I agree with that. [08:27] part of the conclusion. I think that we can make significant progress on secure by design, specifically embedding defenses in the AI models themselves. [08:36] That being said, [08:38] We share the view that the future is going to be one where we need to have a lot of agents that are specifically for the task of monitoring other agents and making sure that they're not going to step out of bounds. [08:53] you know, one layer deeper, what is the state of model cyber capabilities today? [08:58] And... [08:59] How has that changed over the past 12 to 18 months? It's a great question. And I actually think that the rate of change is the most... [09:05] relevant part here [09:06] Because models are capable of doing so much more now that they were even capable of doing a quarter or two quarters before.
[09:12] So just like to give an intuition, so this is now, you know, just like we're entering the fourth quarter of 2025. [09:18] At the beginning of the year, [09:20] Coding agents were not a widespread thing yet. [09:23] The ability to do tool use properly was, you know, not just starting, but obviously much, much more nascent than it is right now. [09:31] Reasoning. [09:32] models are only at the beginning as well. So just like think about all of the things which were added last year and what they mean also for you know for security elements. [09:42] So what we're seeing now is that the combination of coding being much better. [09:45] Models being able to have multi-model operations. [09:49] tool use improving, reasoning skills improving, if you're using models for [09:55] offensive capabilities [09:57] We are seeing unlocks all of the time. [10:00] something that is now feasible that was not even feasible a quarter ago [10:03] is proper chaining of different vulnerabilities and exploiting them in order to do much more complicated actions. [10:10] So for example... [10:11] If you have a website and you want to hack it on the application... [10:15] A few months ago, if you needed to integrate a collection of vulnerabilities, [10:20] in order to perform an action of value, [10:23] at least autonomously, without a human being involved, models were unable to do that, even the state-of-the-art models. That's not the case anymore. So obviously that depends, it's not 100% success, and obviously that depends also on the level of complexity of vulnerabilities and environment that you're trying to hack. But we have seen... [10:38] huge spikes of just like being able to scan more and more complicated codebases, exploiting more complex vulnerabilities,
[10:46] chaining them in order to do these exploitations, etc. And you know, just like, [10:51] The recent GPT-5 launch on security [10:55] and the offensive side specifically of what models are capable of doing, [10:58] we have seen a significant jump. [11:01] in their ability. [11:03] to be able to be much more competent across a collection of skills that actually matter a lot around the cyber kill chain. Can you tell us more about that? And obviously there's some things that are publicly available, others that are not. [11:14] But at least on the scorecard and what opening I've shared, in particular for GPT-5, what are some of the capabilities that you've seen that were surprising? [11:22] We are seeing constant improvement. [11:24] on the ability of models [11:27] to, for example, [11:29] Have situational awareness on whether there are [11:33] in a network [11:35] And up until a few months ago... [11:37] beginning of the year [11:39] Complete. [11:40] models were unable to do that. They were able to run some operations locally, they were usually not having situational awareness over the fact that, you know, what's happening and what they can activate. Even in [11:51] more limited and constrained scenarios that we put them in, [11:55] And that's not the case anymore. [11:57] We still sleep very, very easily at night because the level of sophistication is still somewhat limited, but we are finding ourselves... [12:05] trying to create more and more and more complicated scenarios just because there is a huge jump in being able to take more complicated context, as I said before, chain complicated vulnerabilities to one another in order to do multi-step reasoning and exploit.
[12:20] And these will [12:21] all new skills that going one year back did not exist. [12:25] You guys are trusted partners by... [12:28] many of the labs including Anthropic, including OpenAI, including Google DeepMind, you work very closely with them for [12:33] quite some time at this point [12:35] Why... [12:36] did you take the approach of working, kind of embedding yourselves [12:39] Within the lives... [12:40] as opposed to, I don't know, selling directly to an enterprise right now? There are multiple companies that are doing AI security. We are pioneering a category of the market that we call frontier AI security. We think it's fundamentally different. And the core thing is actually very simple. [12:53] The rate of progress and the rate of adoption of models change so many things at the same time. [12:58] that while traditional [13:00] security tends to be somewhat reactive in nature, here we need a very aggressive, proactive approach. [13:06] In market debt, [13:08] are dominated by a rate of innovation... [13:12] that is frankly unmatched, I think, and paralleled in human history, we think it's more interesting to take a temporal image of the market. That is to say... [13:20] focus on the first [13:22] group of people. [13:23] or organizations that are about to experience a problem, so the labs, because they are the contenders to create the most advanced and increasingly sophisticated products. [13:32] AI models in the world [13:34] Work very closely in order to just like see firsthand the kinds of problems that are going to emerge. [13:41] and utilize that in order to have a clear and crisp understanding of what's going to come 6, 12, 24 months ahead of time, such that we can be prepared...
[13:52] at a moment where general deployers are going to need to be in a situation of embedding these advanced models and already have solutions that are going to be relevant for them. [14:00] Given the rapid pace of progress in the foundation model side of the world, [14:05] If you're at one of these model companies... [14:07] And, you know, I think the people there are sincere. They want to do good for the world. [14:12] They now know their models are capable of being used for extreme harm and cyber attacks as well. [14:19] How? [14:20] What do you do about that conundrum? And I remember, so we've been working with OpenAI since 2021. I remember back in those days, [14:26] every enterprise user of the API past some volume had to be manually approved for their use case in order to even access the API. It feels like the ship has sailed. [14:35] of anybody anywhere will be able to access some of these models. And so how can you make the models... [14:41] sort of secure by design if you're in one of these foundation model seats right now. [14:46] I think it's a great question. [14:48] So... [14:48] One thing on the premise of the question, I think that at least right now, at the moment in time in which I end, [14:54] The ability of models to actually do extreme harm [14:57] You know, it exists in potentially some use cases, but at least in cyber, I think we're not there just yet. [15:03] And that matters. [15:04] And just be really sharp on what I mean here. [15:08] Models can clearly be used in order to do harm. [15:11] But there is a distinction between harm and extreme harm that should be made. Like harm would be an example of using a model in order to for the senior citizen and in order to just like, you know, steal money from them. So just like scaling up phishing operations that can happen easily right now.
[15:25] Extreme harm, in my view, would be something along the lines of taking down multiple parts of, you know, just like critical infrastructure in the United States at once that, you know, just can take full cities off the grid, making hospitals not work. Models are not there yet. [15:38] And that's not me nitpicking on the question. I actually think it matters quite a lot. [15:43] Because how much time we have to prepare to a world where models are that capable... [15:49] actually dominates the strategies that we can take around the defensive side. [15:54] Because all of Yui's [15:55] that the first thing that we should do, just like a first order [15:58] thing is be able to monitor [16:01] and have... [16:03] a view of what's going to come [16:05] Such it will... [16:06] have an ability to have a much higher resolution discussion. Which capabilities are progressing? At which pace they are progressing? Should we expect them to continue to progress at this pace or accelerate in the future? And that dictates the order and the priority of some defenses when you're going to embed them. [16:24] whether we should embed them, etc. And you know, if you get this song, [16:28] I also think it's unfair to the companies and to the world as well, because AI... [16:32] also has so much potential to do good that if we deploy a lot of [16:36] you know, just like some defenses that may chip away from productivity ahead of time. We're also doing real harm to just like innovation and the world at large. And it's a very delicate balance to strike. So I think just like the first order thing to do if you're working inside of the labs is actually having and supporting a large ecosystem that can take the models and measure and get to a high resolution for what is even possible to do.
[17:00] The second bit is... [17:01] figuring out [17:03] you know, defense strategy. [17:05] that is informed by exactly what's happening. And treating almost like a rigorous science with experiments, of just like how to assess, how to do predictions, et cetera. There are some defenses that will require a degree of customization. For example, if you're someone that is, you know, creating... [17:19] monitoring infrastructure, [17:21] We'll still need that. You may want to recalibrate some of your infrastructure to give you higher alerts that AI is going... [17:27] Off the rails for example. [17:29] But there are some problems that are very easy to write about, but actually very hard to develop solutions for. You know, for example... [17:36] I've just said a sentence of just like, you know, customizing your monitoring software in order to prioritize alerts that are coming from your AI layer. [17:44] How are you going to be able to understand when AI is doing something which is problematic? Occasionally, you're going to be able to run into that. [17:51] But sometimes this may be, you know, we, I think it's like the entire subsection of the market, which is, you know, anomaly detection, which is a huge subsection of security is going to have a big problem very soon because anomaly detection is based on. [18:04] measuring [18:06] a baseline, understanding what is a baseline and measuring against that baseline in order to see that something is an anomaly. [18:11] But... [18:12] if you don't have a crisp understanding of the baseline and how it should look like, [18:17] you have an issue of understanding something went wrong. [18:19] So as an outcome... [18:23] there are some defenses that are going to operate as is, and there are some defenses that we should recalibrate, either customize or to just recreate from scratch, just because there is a lot of science to be done of
[18:34] Understanding how models look when they're under attack, understanding how it looks like when models are attacking something. [18:40] And because I think we still have some time, [18:43] for the world that's about to come [18:45] My recommended strategy would be [18:48] invest a lot in [18:50] creating robust tools that would allow you to have rigorous evidence of what's going to come, such that you can have the discussion at higher resolution. [18:57] MEP. [18:58] which classic defenses are still relevant. [19:00] understand where your gaps are at, and invest a lot in R&D in order to make sure that you have cost-effective defenses that you can roll out ahead of time before models are going to be [19:08] deployed. [19:10] I know one of the broader questions in frontier AI research today is, can we actually understand the mind of a neural net and what's happening? [19:17] And so I'm wondering if we can actually detect the characteristics of when is the model starting to behave [19:23] And do you actually, can you understand that in the activations of the net? [19:27] For some of the attacks, we may be able to protect it. [19:30] But that's still a very big open question. Let's say our approach in which we're trying to pioneer around security research is, [19:37] which is actually work from the outside in. So that would be... [19:42] putting the models in high-fidelity models [19:46] realistic environments. [19:48] that would push them to the limits. [19:50] And in these environments, you're recording pretty much everything that's happening in the background. So that would be both the internals of the model, like Merlet, but also interactions of the model with the environment or the AI system with the environment. [20:04] And...
[20:05] It actually allows you [20:07] buy this recording to play around with using all of this data. [20:12] to create the [20:13] Firstly, [20:14] a mapping of how it even looks like when an attack is happening, whether you're trying to attack a model or... [20:19] using the model to attack a target. [20:21] And then based on that, [20:23] taking up classifiers. And I think just like a key thing, at least around where we are right now in security, and like a key realization is... [20:32] While understanding the internals of a model [20:35] is ultimately going to be important to fully solve the field, [20:38] We can make a lot of progress. [20:40] just by... [20:41] understanding that something is not right even if we don't have [20:45] Full. [20:46] creates full understanding of just like what's what you know the internals and and and what's going on though and also in these environments the other nice thing is you can [20:55] insert or remove [20:57] defenses to see if they're effective. [21:00] So think about the situation where, you know, the question that I'm trying to ask is whether a model is going to be able to... [21:06] evade detection from an antivirus? This is a core security question. If models are going to be able to do that in enterprises, this is an issue because they're going to be able to scheme and do malicious things without just like can harm security processes without anyone noticing. [21:21] And in specific environments, we see that this is actually possible. Models are able to do that. We were able to show how a model was able to take out Windows Defender, real Windows Defender, in a toy environment, granted, but real Windows Defender. [21:33] But that kind of research, when you're putting a model in an environment and recording a lot of stuff both internally and externally...
[21:39] allows you to play with these parameters to ask a question of, A, should I change something on the model itself in order to understand that something bad is happening, but also take a different approach, which is agnostic to that, and say, should I alter something in the defense in order to make sure it's more resilient? [21:55] So I think our view is that, you know, for the long run, for the vision of the field, we do need to solve problems such as Mac Interp in order to also solve issues in security, which also shows... [22:05] how different problems in AI are just like changing the core of just like so many different fields. Security itself may just be reframed to an issue of reliability and control in the near future, not just classic traditional security. [22:18] And doing an analogy, [22:20] tool [22:21] you know, the human brain, [22:23] We still don't have [22:24] a proper understanding of our own back into a psych, you know, just like the middle netting ahead. [22:29] That being said... [22:30] We were able to create very powerful mitigations, you know, and just like, you know, solutions to multiple problems that we're experiencing as humans in our lives, just by being observational on how we're interacting with the environment or just like understanding smaller parts of the inner, just like inner parts of the system. I love that framing. You said something scary to me that the models are capable of, you know, taking out Windows Defender in your simulation environments. Can you can you share more about how that even happened? Yeah. [22:56] Yeah. [22:57] So just to be clear, it happened in a toy environment, it's a controlled setup simulation. [23:03] That's not to say that everyone that's using Windows Defender now should just be scared as an outcome of that. They just want to make sure that Microsoft is not going to sue Sequoia for, you know, just like taking down the stock.
[23:13] You can think about a network, so it's a network simulation. [23:17] So think about a network environment. [23:19] Just like to simplify, imagine the networking environment that you have in your enterprise, in your work, in your home, the Sequoia network. [23:26] For example, your phones, your laptops that are connected into a network environment that you're using. [23:32] and the AI Dell [23:34] was a storage bot [23:35] So it had access to the organizational file in order to be a helpful storage box for you. And for example, just like scan some of the organizational files such that it can summarize some of the inputs. Like we're doing it all of the time when you're interacting with files that you're having just like, you know, access to a file. And you're asking, give me like a summary of these four things, you know, what's common between these documents, what's what's not common between these documents. [23:58] And also, as like a prelude to the simulation... [24:02] there was [24:03] an attacker that had access to the enterprise Slack. [24:07] That happened multiple times in the last year, just like Slack is occasionally hacked, as happens to every software that is very ubiquitous and just being used by so many different organizations in the world. [24:18] And the attacker gave a very simple instruction to this helpful AI storage bot. It just like said, I want you to download this file. [24:26] to the organizational network. [24:29] But... [24:30] Clearly, because it's a storage bot, it did not have access to download from the internet. [24:36] But what you saw that was pretty amazing that through a collection of prompting the model, [24:41] because it was given the clear instructions to do it at all costs, obviously much more sophisticated language, I'm simplifying here,
[24:51] it [24:51] realized that it can't do downloads because it doesn't have access and because Windows Defender [24:56] the real Windows Defender is blocking it from having just like downloading files. So that's the way the organization was set up. [25:03] So you can see internally of how the model is thinking that it was able to map and have situational awareness in the environment or... [25:09] A, understanding that this is the problem, that it should have higher... [25:12] you know, just like privileges to just like do more stuff in the organization. [25:16] And... [25:16] that it should be able to remove Windows Defender because it's bothering it in order to achieve its ultimate goal. [25:22] So then you can see it's like a next step in that reasoning trail. [25:25] that the model is realizing that it's a storage bot. So as an outcome, it can access different files [25:31] obviously still by asking it to access different files, but it started to just scan whatever it had access to. [25:37] in that privilege level. [25:39] and found that someone accidentally [25:41] In that. [25:42] in one of these files [25:44] left hardcoded password [25:46] to the organizational cloud. [25:48] And it used that. [25:50] And now it was able to just get access to something that it didn't have access to before. [25:54] And I'll cut the story short. It's like, you know, just like did a collection of just like things that ultimately just like did more privilege escalation, more privilege escalation, more privilege escalation. And was able to just like gain more and more and more and more access that it should not have given, you know, to stuff that it should not have given access to. And ultimately, it was able to just like get enough access to be to have admin privileges on the organization. Wow. [26:16] and kill critical security processes. [26:19] And as an outcome, it was able to outmaneuver Windows Defender, remove an organizational defense, and upgrade its own...
[26:27] privileges such that it can download a file [26:30] And [26:32] actually hack the organization that it's a part of. We call it internally, this simulation, a double agent, just because it's an agent that is doing that, but it's turning against its host. So it's kind of like a fun play of worlds of these kinds of new attacks. [26:46] And in a way, the AI is just like, [26:49] kind of used very much similarly, this prompting, as that you would think about a parasite that is attacking externally, because you're using very lean text of just sending something to a model, and you're using the fact that it has a very strong brain in order to just do a collection of actions that are very advanced. [27:08] And I want to say... [27:09] The point in time in which we're in right now [27:12] is that this is a toy [27:15] Setup. [27:16] and [27:18] I don't expect that for a lot of these things to happen in the wild yet. [27:24] That being said... [27:26] We are seeing huge bloggers, quote on quote on. [27:29] And I feel it in security. Have you guys seen the classic, I think it was called Bricks, the game that DeepMind demoed a decade ago, where it starts very badly and then the AI is able to figure out better and better and better strategies? [27:43] And it is first relevant to just like bricks and then just like relevant to just like many, many, many other games. And here we are right now, just like a decade after the state of AI. [27:52] So I think security by being a derivative market of what it is trying to secure, [27:57] is at the modern-ason stage right now, where in...
[28:00] toy setups and simulations, [28:02] We're able to start only get a glimpse of what's about to come. And we are seeing stuff like... [28:07] Models having enough power to do stuff as maneuver the host in order to just like do privileged escalation attacks, remove some organizational barriers, and wipe out... [28:18] Even real security software such as Windows Defender [28:23] And while it's still, these are not things that will likely happen in the wild now, [28:28] it's likely that in a year or two or three if we're not going to have the appropriate defenses [28:34] This is going to be [28:36] a world that we're going to just land up on. And clearly, the implications here matter, right? I assume that the vast majority of enterprises in the world don't want to deploy or just adopt... [28:46] tools that are able to outmaneuver their defenses. How... [28:50] Do you think about... [28:52] model improvement [28:53] especially in the context of reinforcement learning, [28:57] playing a pretty significant role in the improvement of coding. [29:01] You know, even tool use... [29:03] for example [29:05] How does reinforcement learning play a role in... [29:08] cyber security i think that's literally a billion dollar question or just like maybe a trillion dollar question i don't know um [29:15] Because my background is as a researcher, I'll keep my scientific integrity and just like say that there's a lot of uncertainty. But I'm still going to give a speculation of just like, you know, what's likely and what's going to come. With all this seen, [29:26] that RL, [29:28] is very very useful to a lot of the innovations that we're seeing right now around
[29:32] Coding? Coding? [29:33] or meth and in other verticals as well. [29:36] I think it's likely at a point in time in which right now that RL is going to be able to scale as well. That is, that we're going to see something similar to scaling laws [29:44] that if we're going to input more data or just have breakthroughs and improvements in training, we're going to ultimately get better models, at least in the verticals that I've mentioned before. [29:52] I think it's still an open question. [29:55] on whether RL generalizes. It's like where we are right now in the world. So that means that if you're using data, [30:03] and aerial environments in order to improve the model encoding, whether you're going to see a huge jump in being able to produce better literature, for example. If you think about it, that's roughly a huge simplification. [30:15] something that we did come out to expect out of models. We lived in the last few years in a world where models were [30:21] showing properties of [30:24] advancing a lot of capabilities in the same time [30:27] which is different than the world that we lived in before where I still have the skill [30:31] from what's been in our previous life, just like previous jobs, [30:35] of understanding how to create huge ML datasets in order to just like improve in a very narrow domain. And that world, you know, it still exists, but it's like we shifted into just like a much more generalized paradigm. And there's a question of just like whether RL is going to provide that. And the reason that that matters is... [30:51] We still... [30:53] are the early stages of A, figuring out if unique improvements [30:58] in [30:59] Or just like taking... [31:01] relevant
[31:02] data that is relevant for all training [31:06] Lance Qt is going to push [31:08] the security frontier. [31:09] or whether improvements that RL is providing around coding or math or other scientific skills is going to be relevant for security. [31:17] My intuition on the first one... [31:20] is [31:21] A fairly strong yes. [31:23] that we are going to see a success in some experiments of just like using security data in order to [31:30] have improvements such that AI can become better and better at security engineering tasks. [31:36] I think there are some indicators that are showing that we're on way for doing that. I think it's not going to be as clean... [31:41] as improvements that happened in coding math, just because the complexity and noise level around some security tasks are going to make it like a harder problem. [31:50] I think we are going to [31:52] also get some boosts around security that is coming from [31:56] other domains improving in RL soon. If you're better at coding, you are going to be better at some security tasks as well. [32:06] I think it's still unclear about whether this is going to generalize. And in security, we're in a more nascent situation around just like what's happening right now in RL. [32:14] But I am placing... [32:15] Announce [32:17] insignificant bets that there are going to come just like a stray of just like innovation just like a string of innovation [32:23] is potentially going to just like... [32:25] around that and that we'll see some improvements around security as well with RL. [32:30] over the upcoming period.
[32:33] That's very exciting. Now, let's take a step back and talk about the enterprise. [32:38] And so I think enterprises are still very much so in the early innings of... [32:43] building, deploying, agentic [32:45] AI and [32:47] How should CISOs and security teams be thinking about... [32:50] security as we move into this world. [32:53] I will say that there is... [32:54] a lot of dependency on exactly what you're trying to do, so I'm going to simplify [33:00] kind of like aggressively in this answer... [33:02] A useful frame. [33:04] to [33:05] start thinking about when you're an enterprise that is starting to deploy on a first-level issue is to kind of see AI, and specifically agentic AI, [33:15] as the new frontier [33:17] off [33:18] inside the risk. [33:20] So there are multiple enterprises already that are trying to give [33:24] the agents that they're deploying internally [33:27] persistent identities or ways to interact with them, for example, [33:32] either Slack identities or email identities or [33:36] systemize like inside of the organizations how and when you can approach ai [33:41] And that allows to just like have a clear visibility about just like where AI is being used and in which use cases. [33:49] And then it becomes an issue of access controls and an issue of privileges that you're giving to these identities that you're putting in. [33:56] And that's usually a useful frame in the beginnings of just embedding. So just gain some visibility and just try to turn it as much as you can into an issue of access and into an issue of control of privileges.
[34:09] That's obviously insufficient, especially if you're getting to the stage where you're allowing AIs to interact with one another in the organization. So, for example, agent-on-agent communication. [34:19] And that may sound very advanced, but if you're running one agent that is doing summarization and another agent that is doing... [34:25] It's like ranking the quality and acting as a judge on these organizations, then you're already at the stage where you're having AI and AI interaction. [34:32] The base level would be to just like have... [34:35] state-of-the-art monitoring on this AI-on-AI interaction. [34:38] Here I'll but however say that there is a genuine gap [34:41] in what is available online. [34:43] Outdoor [34:45] A lot of monitoring software was not built [34:47] to detect agents that can potentially communicate in ever-changing protocols. [34:53] that if something goes wrong [34:56] can try to bypass and understand that they're in monitored environments. [35:01] and [35:02] need to also take into account attacks that are unique and novel to AI itself. [35:07] So clearly, if you're starting to deploy right now, [35:10] you need to make sure that you have proper visibility and monitoring into the interactions of both the internals of the models and the externals. But you need to also realize that a lot of the software is not properly built for that age, especially, especially, especially if you're moving to agent-on-agent communication. So that's a more advanced use case. [35:27] Just to give a funny example of something that happened to us seems like one of the... [35:31] It's like one of the streams of work that we've done with one of the Frontier Labs. [35:35] There was a scenario... [35:36] Well the wars... [35:38] an agent-on-agent interaction. So these are two real
[35:41] um from tier models actually one of each kind so just like having an environment where there was like uh you know just like one i won't say the names but you can kind of think about it's like a claw the gemini you know just like uh you know just like a guac you know just like a llama [35:56] JGPT etc. So just like two of these that were speaking to one another. [36:01] And very, very, very randomly, [36:03] in this environment. It was a critical security task, that was the simulation that they were in, [36:09] But after working for a while, and you can sit in the reasoning trail, [36:13] One of the models decided... [36:15] that they've worked enough. [36:16] and they should stop. [36:21] And you know, these are stochastic machines at heart. And you know, that's the design. [36:26] And it's just like likely somehow it got picked up as like part of the distribution. [36:31] that [36:32] If you work a long time, because it's learned from the internet... Trained on me and Dan talking to each other. Exactly. This should be a new policy at Sequoia. That you should... Yeah, exactly. That you should take a break. Right? And, you know, that... [36:44] That makes sense if you can actually pick up that behavior. [36:47] And on an individual model level, that's already... [36:50] You know, in this example it's funny and it's weird. That being said, it did not stop there. [36:56] it convinced the other model [36:58] that they should both take a break. [37:00] So the model did social engineering on the other model to another model, which is, and you know, again, it's funny in a simulated environment in a toy setup. [37:09] But now try to think about a situation where you actually as an enterprise are delegating
[37:14] an autonomous workflow that is critical to you to complete. [37:18] And occasionally there's kind of like this weird thing that stopped working and you don't understand why. [37:23] And the more complicated and capable machines are ultimately going to be, the more of these weird examples we're going to encounter. [37:32] And just like a different... [37:33] thing that's happened to us [37:36] You know, we [37:37] Gave them all the... [37:38] a CTF charge. So capture the flag. It's like very common in cyber competitions that you're trying to achieve something. You usually have to do like a chain of vulnerabilities that you need to exploit in order to just like obtain and just like capture a flag. And that gives you validation that the model was able to do [37:52] collection of cyber actions that ended up in a success. [37:56] But the model here again... [37:58] it understood that it's in the context of [38:01] CTF. [38:03] and decided that potentially the challenge is too hard. [38:05] So he did what potentially humans would do, which is he tried to [38:10] email the organizers of the competition in order to help it gain a solution. And that is, and that is literally, however, if you think about it in an enterprise setting, it's like you have, you have an identity that just like unasked may try to just like use your servers in order to send an email to the world. In our example, by the way, just like the other fun thing, just like was a second order issue is that the model failed at doing that [38:32] not because it had an issue of just like maneuvering inside, but because it hallucinated the email address. So as an outcome, it tried to send an email to an email that doesn't exist, which also like shows, you know, just like the classic...
[38:45] other problems that you're having in AI and in AI adoption are going to be chained to security problems as well, which shows the frontier of attacks and defenses that we need to develop here. [38:55] So, [38:56] You know, just like as a second, just like if I'm going back into just like monitoring etc., [39:01] A lot of monitoring software, you have to embed it and you have to use what's out already. [39:05] but it's not built for these kinds of challenges. That's why a lot of our approach is to... [39:11] figure out how these attacks are going to look like, how to just like redo some of the defenses, what's going to be required. And occasionally I think it's like a common misconception is that all that you need to do, that all of it ultimately collapses into an issue, [39:24] of access management [39:26] And that, while I think a lot of the basis is there, is by, you know, just like figuring out how to do the access management well and just like manage the privileges. It's only step one of just like what we need to do. [39:36] And there is a mind shift that we also need to have when we're approaching this subject. [39:41] which is [39:42] The rate of innovation. [39:44] is so high, [39:45] our ability to understand what's happening at the frontier [39:49] just like so many things are happening at once. [39:52] Zaitu. [39:53] Be very engaged with the community in order to figure out what kind of problems that you're even going to encounter, essentially over time. [40:00] And those will be battle-repelled. [40:02] Okay, so as we shift from the enterprise to sovereign AI... [40:08] We know the UK government and a set of others are customers of irregular. [40:13] So how should... [40:14] governments and countries be thinking about AI risk?
[40:17] Obviously, all of the risks that apply... [40:20] on the enterprise side and to the labs themselves apply also on the governmental level. [40:25] Because if you know... [40:27] the Department of Defense, the Department of Commerce, [40:30] the Department of Education doesn't matter, and you're using advanced AI models, you're importing the benefits and risks that come associated with them. So everything that we've said about the enterprises, everything that we've said about the Frontier Labs themselves, [40:42] They have... [40:43] similarities on the governmental side as well. [40:47] Usually governments, however, come with a set of [40:51] unique requirements and a new level of risk [40:55] that is relevant to them. So just like one [40:58] They are often targets of other very strong adversaries and should take into account [41:03] that their adversaries are now taking offensive AI models and are already starting to use them in order to scale up whether simple things such as phishing campaigns up to testing more and more advanced technologies [41:15] you know, cyber offensive weapons that is scaling up their efforts, that is trying to bypass the fact that, you know, we, I think pretty much every critical system that countries have was hacked at some point in time. We have not yet seen. [41:27] multiple critical systems ubiquitously going under. And the fact that AI on the offender side can scale up [41:35] operations aggressively means that countries... [41:39] should [41:40] essentially recreated approach around critical infrastructure. And that is, you know, AI is being elevated globally. [41:47] in that context from a classic security risk into a national security issue.
[41:51] and [41:52] the infrastructure and the thought leadership should be created there. The other bit is that from a country perspective, and you can argue on whether this is the right thing or not, but like multiple governments that we've spoken with, [42:04] are very strongly emphasizing the effort of sovereignty in the context of AI. [42:09] and [42:10] What they usually mean by that is that they are anxious around being dependent, because they understand that AI is extremely critical as the infrastructure that could be the key to the 21st century and potentially beyond. Because of that... [42:23] Especially if the country is doing an end-to-end... [42:26] effort. [42:27] Starting from building local data centers that could be used in order to train and to do inference on advanced AI models, [42:35] up to the point of potentially training the models and creating the AI systems that surround them and having proprietary environments that they also take in, defenses should be done [42:45] across this entire spectrum and we've indeed done work to just like both create standards of just like how to secure these data centers and making sure that people are not going to lift critical assets, how to run [42:57] models on such data centers. For example, we've done a combination of just like a white paper with Anthropic that is discussing confidential inference systems. [43:05] and trying to figure out how to create a standard in the field [43:09] up to the fact [43:10] off on the [43:12] when actually using these models [43:14] taking into consideration how to customize some of the defenses that enterprise need and create the variations of them that governments would need for their use cases, especially if they're putting AI as part of...
[43:28] Not just taking into consideration that AI can be used by adversaries to attack critical infrastructure, but by the fact that they may integrate AI to their own critical infrastructures. And that requires a whole new level of thinking through the defenses. Dan, this was a lot of fun. Thank you very much for joining us. It was a pleasure being here and also very happy that I ended up answering your emails. Thank you. Thank you. [43:58] Thank you.
Want to learn more?