Google I/O Afterparty: The Future of Human-AI Collaboration, From Veo to Mariner
Fresh off impressive releases at Google’s I/O event, three Google Labs leaders explain how they’re reimagining creative tools and productivity workflows. Thomas Iljic details how video generation is merging filmmaking with gaming through generative AI cameras and world-building interfaces in Whisk and Veo. Jaclyn Konzelmann demonstrates how Project Mariner evolved from a disruptive browser takeover to an intelligent background assistant that remembers context across multiple tasks. Simon Tokumine reveals NotebookLM’s expansion beyond viral audio overviews into a comprehensive platform for transforming information into personalized formats. The conversation explores the shift from prompting to showing and telling, the economics of AI-powered e-commerce, and why being “too early” has become Google Labs’ biggest challenge and advantage. Hosted by Sonya Huang, Sequoia Capital 00:00 Introduction 02:12 Google's AI models and public perception 04:18 Google's history in image and video generation 06:45 Where Whisk and Flow fit 10:30 How close are we to having the ideal tool for the craft? 13:05 Where do the movie and game worlds start to merge? 16:25 Introduction to Project Mariner 17:15 How Mariner works 22:34 Mariner user behaviors 27:07 Temporary tattoos and URL memory 27:53 Project Mariner's future 29:26 Agent capabilities and use cases 31:09 E-commerce and agent interaction 35:03 Notebook LM evolution 48:26 Predictions and future of AI Mentioned in this episode: Whisk : Image and video generation app for consumers Flow : AI-powered filmmaking with new Veo 3 model
- Published
- Published Jun 3, 2025
- Uploaded
- Uploaded Jun 11, 2026
- File type
- Podcast
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] you know, I was talking to a founder, he gave me the analogy of [00:02] You know, you want the user to almost be like, [00:05] the way that a director would direct the cast and crew of, you know, change the lighting here. Like, can you say this with a little bit more of an accent there and like almost like natural language, the way that a director would direct a cast and crew? What do you think is the right way to mold the mold? [00:19] I still think it's show and tell everywhere. So I don't think you do everything through text. I think it's kind of actually counterintuitive to have to [00:25] transcribe everything. So I think there's a lot of like showing and acting and mimicking or giving a reference just as inspiration in addition to the text. Yeah. [00:34] But the one thing that's [00:36] starting to become more clear, at least for me, is kind of-- [00:40] video generation, simulation, games, they're kind of like the same thing in this new world. And what that means is basically you're kind of world building. You're saying this is the stage. These are the assets. These are how things are supposed to look. And then you shoot in it and then you can reshoot and refine and pause and correct something and go back in time and regenerate. I think that's where this is heading in. [01:00] UI is going to be fairly novel. Yeah. [01:02] Thank you. [01:19] Welcome to Training Data. Fresh off of Google I/O, we're exploring some of the exciting AI updates with three leaders from Google Labs who are the leads on Google's product experiments around generative video, computer use, and notebook.
[01:31] Thomas Ilgic of Wisk and Veo reveals why the future of content isn't just about generation. It's about remixable experiences where the line between movies and games blurs, and where your creations become starting points for others. [01:43] Jacqueline Konzelman of Mariner explains how computer use agents will fundamentally change e-commerce by removing human friction from purchasing. And Simon Takamine of Notebook LM shares why personalized AI content, [01:56] designed for an audience of one, represents a completely new media category. [02:00] You'll discover why these teams feel like a new chapter is starting for AI at Google. Enjoy the show. [02:06] It's been exciting to see Google's just cooking in AI. And IO last week was very exciting. And it seems like the court of public opinion has just turned on its head so quickly. And right now everyone's just like Google's out in front in AI. Why do you think that is? Why did the public opinion change so quickly? [02:23] I mean, the models, you know, to start with, I think they have a big thing to play with. Good. Good answer. Definitely the models. And I think just the number of products that we have in seeing all of this breakthrough in technology and AI come out into all of those products, but also all the net new products that we're launching and the net new experiences. It just it was a lot last week and even not just at I.O., but like the week leading up to it, I think you had a big moment the day before. [02:48] Yes, yeah, I did. I did. Yeah, it's definitely validating to see the public opinion on the models and Google's position in AI changing. [02:59] Um, [03:00] maybe recently. It does feel to us on the inside, at least that it's kind of, it's the result of a lot of work though. So it feels like we've been improving to me at least for at least the last three years in this area of Gen AI. And maybe what we're seeing externally is people seeing what we've been up to. It helps that we're number one on many of the leaderboards and it helps some of the stuff that the models can do
[03:24] is state of the art and I think is only possible with some of the Google models. But I think internally, it just feels like the end of chapter one and the start of chapter two. [03:34] Wonderful. So here's what I'd love to do today. We have three of the leaders from Google Labs in the room with us, for those in the audience. What I'd love to do is spend a little bit of time on each of the topics that you are responsible for, and then we can round up with some overall thoughts on the AI. Does that sound good? [04:04] for lack of a better word. We'll go into Mariner, Google's computer use agent, with Jacqueline. [04:11] And then we'll close on Notebook with Simon and everyone knows Notebook. So that needs no introduction. Awesome. I like to hear that. Okay. Thomas, let's start with you. Tell us about the history of [04:22] you know, how you all have been cooking and building and experimenting in the creative image video generation space. And how long have you been experimenting with these products and what have been the key milestones so far? [04:33] Sure. It's been a really exciting space. I think that's a very long question. So, you know, I'll probably rant a little bit. I think the I mean, we've had for a long time, like good, you know, imagery models. There was like Imogen. There's been Dali, obviously, externally, etc. But something like two, three years ago is when [04:49] at least for us in labs, when we're thinking about products, we had the control net paper for people who remember. So it's kind of like, how do you
[04:55] how do you take the model and start channeling it where you want? So it's not just like a push button thing. You can start saying, I want the pose to be like this or the scene to be like this. That was one. And then the second thing was Laura's where like you can kind of show the model a range of things. And then suddenly you're able to kind of like, you know, pull from the image and be like, what's the range of possibilities for that particular piece? And so that like iteration, the sense that you can start controlling the outputs, that that felt like the right moment for us to start exploring the creative process. [05:20] When was that? Probably two and a half to three years ago. OK, I remember this. And so then a lot of stumbling and trying things and failing. I think we had things where we trained a bunch of our people internally to see what they could do with ConfiUI type workflows. We even had a little animation thing going on where we could have an episode with artists. [05:39] And, you know, we published the not so super villain, if you want to check it out on YouTube. And then more recently, we ended up with like a bunch of convictions out of that exercise. So we had things like creation has to be iterative. So we need to build kind of these controls next to the models. [05:55] Um, [05:56] Media comes with the blueprint, which is this idea that like, if I generate something, you're able to kind of pick up where I left off. [06:02] And then the third one was like, it should be show and tell. [06:06] This driving force was instead of just telling the model with very long prompts, I can actually show you images, [06:12] say like it should do kind of like this and we can build off of that. [06:17] This is where we started with WISC on the consumer side for imagery and [06:21] flow for everything that's [06:22] high-end filmmaking
[06:25] exercise. Yeah. Really cool. And do you imagine WISC and Flow will be kind of end consumer products in the kind of, you know, Google portfolio of, you know, billion user scale consumer products eventually? Or how do you, is it your playground for kind of testing model UX and, you know, how best to bring this magic to users? [06:44] Yeah, I think [06:46] We see it as a spectrum. So I think Wisk is kind of our play in the, you know, really consumer space and thinking about like everybody now has this visual language at their fingertips. They might not have it necessarily like. [06:56] the most advanced ideas in terms of like storytelling, but they can quickly remix each other's things. And so we're trying to see what those dynamics look like. So I think that's kind of our exploration space with WISC. Yeah. [07:06] we'll see how it picks up. I think a lot of the lessons will probably also graduate in just how we [07:11] deal with user inputs and treat those across multiple surfaces. And then Flow is the other side of like, you have a vision, you know what you want, and it's kind of like, how do we give you all the [07:21] all the tools to create, you know, [07:23] the best version of this in video. [07:25] Yeah. [07:26] Okay. [07:26] - Super cool. Who's the ideal user do you think for Flow and Wisk? [07:31] For Flow, I think it's pretty clear for us, we're starting with AI filmmakers. And the reason is we want to build this kind of, [07:39] we call it the generative AI camera. Like, you know, you're doing world building, then you're shooting inside this world. How do we actually develop the DSLR camera of, you know, generative AI video? And then we'll distill kind of the Android version of the pixel, you know, pixel camera version out of it. [07:54] WISC is much more a consumer. There's a wide range of audiences.
[07:57] Is it you creating something funny with your friends in a chat? Is it kind of more inside the company you're trying to create some visuals for slides? This whole range that we're [08:09] exploring, we'll see where it lands. Yeah, so cool. Okay, you said AI filmmakers. Is that a thing? Are people calling themselves AI filmmakers now? And does it tend to be... [08:18] existing filmmakers that are looking to be more AI savvy? Are you seeing net new creators [08:24] come in and try to create feature films. I think it's certainly an ill-defined term. But the reason why I like to say AI filmmakers versus filmmakers is I think [08:32] If you take the extreme end of the spectrum, these are people who need very bespoke tools. They have like entire workflows and processes and you need to develop very specific ideas. There's one tier under that, which maybe I classify as a filmmaker, what potentially is, you know, pre visualizations where you're trying to quickly get like a version out and maybe then you do the full process or people who just don't have the budget. So they're like, I don't have a hundred thousand dollars to. [08:55] you know, put my idea out there, but now I can at least take a shot at it. Yeah. And so those people are interesting to us because like, you can really start from the ground up thinking of like, [09:03] If you had this generative AI camera, what would the user flow look like? Like how would you fit those pieces? Yeah. Your answer to my initial question of, you know, the models are the reason that the court of public opinion has flipped so quickly. It's been amazing to see VO's progress and VO3. And, you know, for me, I don't know what evals you all look at to look at performance. But for me, it's the Will Smith spaghetti eating test. And, like, we seem to have passed that. So, like, are we at video AGI?
[09:33] Like, how do you think about the quality and the performance and what's ahead? [09:35] I [09:36] There's still some room, but it's pretty cool. I mean, the GDM team has done really great with VO3. [09:41] I think the joke last week was that it beat VO2 in the ranking, so it's kind of VO being VO. So people were very happy about this. I think it's, you know, adherence is going up. Yes, we don't have the six finger problem. Physics are getting pretty good. There's still things where like, [09:56] you know, if you want to have, for example, multiple characters and kind of choreographed characters, have like full consistency across multiple scenes, like that's where there's still like a lot to come. [10:05] How do you refine your output? Can you propagate changes across clips? There's going to be still a lot of improvements, but in general, [10:12] Yeah, a huge step up and the biggest [10:14] reveal this time was audio. So be able to co-generate audio with the video that brings kind of like, you know, [10:20] an image is what, like a video is more than an image, and a video of a sound is way more than a regular video. [10:26] that certainly has opened up like a lot of virality. Do you think the... [10:30] you know, the R&D left to do to make the ideal tool for the craft. [10:34] How much do you think is in the product and in the UI? And how much do you think is going to need to happen in the model research layer and things like steerability? [10:42] I think it's both, but at least [10:45] And I'm sure people will have a wide range of opinions, but it's almost like we're at a state where everything we imagine in terms of controls, I think we have visibility in how they can be built. You know, you want to have consistency of characters, of scenes, of location. There's like different ideas around this. You want to reshoot. [11:00] So that part, I think the part that's hard is still the abstraction of all of it.
[11:05] So how do you put this in the-- what are the inputs that you want from users? In the context of Vodio, for example, [11:10] Why do I define the voice? How do I touch the voice to the character? How do you find the mannerism? How do that propagate? So I think there's going to be a lot of work in that. [11:17] abstraction layer on top of the models and on top of the controls. Oh, so interesting. So you think most of the model kind of R&D is almost as [11:25] solve problem is maybe too strong of a word. Not solve, but I think we... We know how to do it. It will happen. I think it's pretty clear that it's moving very fast. And then, you know, we see a lot of things just like week after week coming up. But how we do the connective tissue on top, I think is still like pretty much open. And audio is, you know, one of those new frontiers, for example, of like... [11:43] Should I be talking and driving the audio, then changing my voice? Should I be typing the text? How do I do diarization? There's a lot of like, what are the inputs? How do you give? How do you let people mold clay? [11:54] you know, with all these models. What's your guess for how that future is, for how people will mold clay? And, you know, I was talking to a founder, he gave me the analogy of, [12:02] You know, you want the user to almost be like, [12:04] the way that a director would direct the cast and crew of, you know, change the lighting here. Like, can you say this with a little bit more of an accent there? And like almost like natural language, the way that a director would direct a cast and crew. What do you think is the right way to mold the play? [12:19] I still think it's show and tell everywhere. So I don't think you do everything through text. I think it's kind of actually counterintuitive to have to [12:25] transcribe everything. So I think there's a lot of like showing and acting and mimicking or giving a reference just as inspiration in addition to the text. Yeah. [12:34] But the one thing that's
[12:36] starting to become more clear, at least for me, is kind of-- [12:39] How would I say it? [12:41] the video generation, simulation, games, they're kind of like the same thing in this new world. [12:47] And what that means is basically you're kind of world building. You're saying this is the stage. These are the assets. These are how things are supposed to look. [12:54] and then you shoot in it. [12:56] and then you can reshoot and refine and pause and correct something and go back in time and regenerate. I think that's where this is heading. [13:03] UI is going to be fairly novel. Yeah. You mentioned games. I wanted to ask about this. It feels to me [13:09] Like, you know, the existing way that we consume games versus movies is [13:14] is you know is because there's such a tremendous fix [13:17] upfront cost of producing a movie. If you imagine that, you know, in a world where every movie frame is generated, [13:24] not pre-rendered. [13:26] and that entire story arcs can unfold. It does feel like the movie and the game worlds [13:32] start to merge. How do you think that plays out? [13:36] I think with [13:37] I mean, so for example, we have the genie model that's been really interesting. So you give an image and you can kind of move your character and the world builds in front of you. But what's going to be really interesting is how do you ground it? [13:47] Games are fun because there's very set constraints. Movies are good because there's very small details that matter, the expression and the moment and the timing. And so I think it's all about [13:56] It's almost about the constraining of the capabilities [13:59] towards what we need. [14:01] Um... [14:02] So I don't know. I think the other thing that strikes me and I think a couple of people on the team is like, it's not clear that is we think in terms of the static formats that we have today, like an image, a video and a game.
[14:13] Is there something in between almost? And what does that mean? And kind of where is that going to be distributed and interacted with? Like I can share an image with you, but you can instantly turn it into a scene that you're walking into. Yeah. So am I sharing an image or am I sharing an experience? Yeah. [14:28] Lots of questions, I guess. [14:30] It does feel like, you know, the story is almost the common thing that makes a game and a movie good. So and that's different from an image. It's just a visual, right? Yeah, exactly. It's the setting, the constraints. You define the rules of the game, basically, and then you let other people enjoy themselves in it. [14:46] Really cool. Um, [14:47] My understanding is that video is still expensive and somewhat slow to generate. [14:54] Is your sense that that's getting solved quickly? And like, will we have... Everybody's going to be able to generate, you know, two-hour films in their... [15:02] you know, in their pocket in a couple years' time? Or is your sense that this is a... [15:07] you know, longer... [15:09] We got a lot of efficiencies that we need to build in order to make this kind of cost practical. [15:14] I think... [15:15] I mean, we've seen in imagery and we've seen in video kind of like the same speed of cost reductions that we've seen in other places, both, you know, the hardware is getting better. I think the efficiency to your point, we have like the regular models and then we learn how to distill them to kind of. [15:30] so that they just take less processing to get to whatever you asked for. So I'm actually pretty optimistic that, you know, the costs are just going to keep coming down and the speed is going to increase. [15:38] kind of aligned with what we're seeing with other models. Yeah, got it. [15:41] Fantastic. What do you think is ahead for AI in the creative space, at least from a Google Labs perspective?
[15:49] Well, we just launched flow, so we have a lot of things to do to just like, you know, deliver on that promise of like keeping you iterating. I think that's the first thing. Yeah. Refinement. [15:57] of like outputs and like keeping there going there and like insertion editing reshooting i think is really interesting to us [16:04] Um, [16:05] But I think the holy grail will be some of these new formats and experiences. Like what does it mean as a creator to share something with you that you can interact with? [16:12] That's something that we want to explore. [16:14] Really cool. I want to be able to talk to Will Smith as he's eating the spaghetti. Maybe he will. [16:20] Really cool. Thank you so much for sharing what you all are doing over in the creative sphere. [16:24] Of course. [16:25] Okay. [16:26] Jacqueline? [16:27] I would love to talk about computer use and Mariner. [16:32] Maybe first off, why is it called Mariner? Great question. So we wanted to give the project a name that really embodied what we were trying to do with this space, which was enable users to just go out and explore, enable agents to go out and explore. And Mariner is sort of this whimsical, open-ended name that just sort of embodies the spirit that we have on the team right now. I love that. [16:52] You all actually, you guys have really good product names across Google Lab. These are all really whimsical. I'm still trying to get rid of the LM bit on Notebook. Apart from that-- You just do it, Simon. I'm pretty happy with Whisk and Flow. I think we did decently there. We're evolving our approach to naming. That's what we evolved at I/O, naming. There we go. It's a statement improvement, yeah. That's funny. Can you say a little bit about how Mariner works? Is it computer vision model behind the scenes?
[17:22] Just feels like, you know, pure magic in a box, but give us a peek under the hood. [17:27] take pure magic in a box any day. So the way it works is really leveraging the power of [17:33] Gemini, that's kind of, you know, it's a [17:36] action-tuned model on a recent version of Gemini. But what that means is that we have all of the multimodal capabilities that Gemini gives us. So it's able to plan and reason when a user enters in a task. We're able to understand that. We're able to come up with a plan on how we should actually fulfill that task. And then the way it actually works is taking that and understanding the screenshots. So this is where the multimodality of the Gemini model really comes in handy. We're able to continue to take screenshots, continue down the trajectory of what it is that [18:06] from the user's tasks that they gave us and bring it all together that way. [18:09] Yeah, got it. Super interesting. What's the history of the project? And when do you anticipate you'll be rolling it out en masse? [18:16] So the project initially started last year, shortly after this time, actually. If we go back at IO... [18:24] Last year, we kind of graduated the Google AI Studio and Gemini API out of the labs team onto the developer team now. And that freed us up to start exploring what we thought was coming next. And that happened to be agents that could actually take action on behalf of users, not just answer questions or generate content. So the team started working on it. At that point, we started grouping up with a bunch of different folks across Google to kind of bring together what we launched in December.
[18:54] last year, which was Project Mariner as a Chrome extension that took action on your browser. And then we continued to iterate on it based off of a lot of the feedback that we got from the trusted testers of that initial launch. So we actually had a large group of trusted testers that we would be talking with regularly and understanding what was working well for them, what wasn't. And we took that feedback and iterated on the most recent launch of Project Mariner, which we announced last week at Google I/O. [19:20] Really cool. What was some of the feedback and what are the magic sparks when people are really like, this is a game changing product for me? [19:28] Yeah, great question. So it's funny, one of the initial kind of magic moments that everybody had was watching Project Mariner take control of the mouse on the browser and being able to click scroll. Typing text into text boxes actually felt net different when you realized it was an agent doing it. But quickly, as you were using the initial version, the feedback became incredible. [19:53] This is super cool. [19:54] can I please use my browser again? Like I'd also like to be able to do work. Yeah. Which makes a lot of sense. And so that was one of the big motivations behind moving towards this idea of users entering a task in the web app that could then run in the background on virtual machines. Like a virtual. Okay. Exactly. But one of the key things that we did also try to [20:15] keep true to the initial vision was how can we [20:19] start to think about bridging the context that a user had on what they were doing in their current [20:24] environment to the tasks that they were sending to the VM and Mariner executing in the background. And the way we tried to do that was if you install the companion extension now, it'll actually be able to see all the tabs you have open. So when you're giving Project Mariner a task, let's say you happen to be looking at a recipe on a recipe site and you're like, wouldn't it be great if I could, canonical use case, add all these ingredients to my Instacart cart? Now, when you go to Project Mariner, you could say, hey, add all the ingredients from this
[20:54] and you can select the tab that you have open with that chicken recipe. And Mariner will understand that context, will be able to revisit that site on the VM and complete the task with the context that you had in your local browser as well. [21:06] And it's almost superhuman in a way because as a human, I only, it's like hard to context switch between browser tabs. [21:13] Yes. And you're able to kind of see everything in the tabs all at once. [21:17] Yeah, I think a big a big net win also was the ability for Project Mariner to do 10 tasks. [21:23] at once, not just one. And that was really a big net unlock. I was using it the other day and, you know, I just come back from running an errand and there was a bunch of stuff on my mind that needed to get done. And the first thing I did was open up Project Mariner, enter in three different tasks for it, and then just sent them off to start making progress. And I was able to jump back into the document that I happened to be working on. And it was this like magic moment of just, okay, [21:53] Like I didn't have to keep thinking about it. - Do people wanna see the computer mouse moving around first for a while before they're like, okay, I trust that thing to go off and do things for me? [22:01] If they do, they have that opportunity in... [22:04] the current Project Mariner experience. You can go into full screen mode. You can see the agent moving around and clicking on things, entering text. You could also pause the task at any point and be able to take over it. So having... [22:17] or giving the user the ability to take over and or provide oversight on these tasks is something that we think is still very important when we have an open ended platform like this or an open ended experiment like this that really lets it up to or leaves it up to the user to try out different things. And what's the user behavior you're seeing? Like, are they like, please just take the wheel? I don't want to deal with it. Or they actually want to, you know, backseat drive and watch the agent and make sure it's doing what it's supposed to be doing.
[22:44] That's a great question. I think initially watching it is this fun element, but also it develops a comfort for knowing how the agent is thinking and what it's doing. But one of the pieces of feedback we also got from the initial launch was, [23:00] At the end of a task being complete, [23:02] We just... [23:03] save the entire conversation history and it can get quite long. And what users ended up wanting was just a summary of like, what did Project Mariner do to complete this task so I can make sure it did it correctly? And that really kind of points to the question you're getting at, which is I want to just hand the task off to this agent, but then I want to be able to just verify what it did at the end of the task, not sit there the entire time and watch it. Yeah. Yeah. So interesting. What do you think are the solved and the unsolved technical problems so far with [23:33] where maybe in the Will Smith, you know, the spaghetti is still sort of disappearing a little bit phase. And maybe that's an unfair characterization. But I'm curious where you think we are. [23:42] on the evals and the performance so far for computer use, and what are the unsolved problems right now? [23:46] I think that's actually a totally valid comparison. There's a reason we launched this as a research prototype with the experiment label on it right now. I think we've seen really big gains from December to what we launched last week. That said, there's definitely still model quality improvements to go. I think there's also just application level improvements to go.
[24:16] And then there's just more planning and reasoning that we could do, like at inference time or at the application layer time that sort of in addition to the model improvements, you know, improve system instructions, improve checks and calls to different models. And then, of course, right now, Project Mariner is... [24:34] entirely completes a task by... [24:36] actuating or taking action on a browser, you want an agent that has more skills than that. You want an agent that knows when to call the right tools, that has memory, that's able to, you know, take advantage of a lot of the other stuff that we already see out there. So I think it's just... [24:53] integrating a lot of that in and starting to innovate and climb on that. And then, of course... [24:58] Right now, Project Mariner. [24:59] It's in the browser. Um, [25:02] People use computers. So, you know, we call this computer use. So there's that entire dimension as well that I think we're going to continue to see innovations in. Really cool. Were there any contrarian opinions you all took in building Mariner? So, for example, I think some people have said screenshots. It's going to be too slow. It's not it's not going to be fast enough. You should use the use the website dom or whatever like any contrarian bets you guys made. [25:26] So the reason we went with the screenshot is we wanted to make sure that it [25:31] was a skill that we could develop that could be applied across things that aren't just websites. I think [25:38] the other aspect of that is like dom versus accessibility settings or accessibility trees is another um leverage [25:45] We're kind of betting on this one right now, but I would say everything's evolving, so we're just...
[25:51] willing to take pivots when it, if, and when it makes sense. Um, [25:56] Yeah, makes sense. What is it capable of doing it today and doing today and [26:00] um and what is the speed like if i tell it to go you know the canonical go order me a pizza from domino's um can it do that and how long does it take [26:08] The speed is definitely an area that we want to keep hill climbing on is what I would say. Yeah. But it's interesting you say that because one of the things that... [26:15] I was recently using Mariner to help me complete the task which was [26:22] come up with [26:23] Let me take a step back. [26:24] I have a three-year-old at home. She is going to be four soon. Part of that means organizing a birthday party for her and being able to figure out loot bags for kids at a four-year-old's birthday party. This task, as you can imagine, involves understanding what to put in the loot bag and then actually buying all of those things or like finding links somewhere to go buy them. Yep. [26:42] And I gave Project Mariner this task and it was basically a personal research that turned into an action taking task, which is find me the links and save them. And I [26:53] The thing that really resonated the most with me on that one is as it was performing this task, first it did a search for good ideas to go in a loop bag. And then as it just remembered those five items, that's something any of us could do. That itself wasn't impressive. But the first one was, I think, temporary tattoos. So then it started looking for temporary tattoos. It found a great link for it. Instead of having to copy that link and paste it in a doc somewhere else, it could just remember it. It could remember this massive URL, and then it moved on to the next one.
[27:23] URLs that it had been able to inherently store. So when we talk about speed and efficiency, I think there's two dimensions. One is just the model calls and the taking action and how do we improve it with different tool use. But then the other one is, [27:36] How can agents just do things in a different way that are inherently faster than the way we would do things? And I think we're going to continue to see improvements on both dimensions. Yeah. I wish I could remember five URLs. Oh, gosh. [27:50] OK, good point. Let's see. What is what do you think is ahead for Mariner? Where do you see it evolving from here? [27:56] I think there's a couple of things. Number one, we had a bunch of announcements last week around Project Mariner like capabilities making their way into different Google products. And I think that this is a kind of core capability that you'll start to see emerge everywhere from the Gemini app to AI mode in search. So I definitely see a lot more coming to Google products with the stuff that we're doing right now in Project Mariner and kind of paving that path forward. Yeah. [28:22] And then I think for Project Mariner itself, I actually like to think of things in three categories. There's the agent. [28:29] itself. I think that's going to get smarter, that's going to get better, that's a better model, that's tool use, that's memory, that's context. Then there's the environment. We talked about how in December, it operated on your local desktop in your Chrome browser. So that's in the foreground. [28:42] Then we move towards this idea of Project Mariner operating in virtual machines, which meant that it's now operating [28:49] on VMs. I think there's this middle layer, which is an agent that can still operate on your device, but in the background. And there's a bunch of reasons and types of tasks where that becomes a really important kind of way for the agent to operate. And then of course, there's all the other devices. But really what you want is a capable agent that's able to operate
[29:06] in a way that is omnipresent across all your devices locally on vms and then the last one is the ecosystem part which is where you start to get into the agent-to-agent interaction and like how does your agent interact with um all of the uh [29:20] things that exist outside of its own [29:23] world essentially. Yeah, so cool. I think the canonical examples for computer use are, you know, book me a flight or order me a pizza. [29:32] Is that your sense of what computer use agents will actually be really good for? Or like, what do you think? I'm sure you spend a lot of time thinking about like, what applications will actually be the bullseye here? How do you think that shapes out? [29:44] So I think we default to those because they're just easy to understand. The travel planner, I mean, literally it's a travel agent. It couldn't be more analogous when you think of agents right now. But no, the way I like to think about it is on a spectrum where you have tasks that are sort of in what I would consider do it with me, where you have your agent alongside and you can easily offload certain tasks to it, but it's really working in unison with you. [30:14] give my agent a bunch of stuff to go do and it will run it in the background. I think part of the reason we see these tasks being used is twofold. [30:23] One, they're just incredibly easy to understand and everybody kind of gets what that use case is. And they're usually starting from scratch. Like there's no context you need up front. You can just send an agent out to go do it. And the demo as a result is pretty easy to put together. And then the other one is just where the capabilities are at today. And so as agents get more capable and you start to have more of these realizations on what they are actually able to do,
[30:53] complex use cases and that also requires the user having more trust that they can give to [30:59] the agent. So I think that that will evolve over time and we'll see people [31:03] come up with even more interesting use cases that they're willing to give an agent good to do on their behalf. Yeah, totally. It's also going to require, I guess... [31:13] it's going to inspire, I think, a shift in business model. Right. Because if you have a bunch of agents going off and browsing, you know, trip planning, for example, [31:22] they're not necessarily looking at the ads and, you know, the first things that show up. And so it is, I think it's going to create some business model. [31:29] evolution as well. [31:31] I agree. I think there's a lot of evolution that's going to happen across business models, across how websites work, across how users will always want to... [31:43] use the internet going forward like there's a lot of joy i think we all get in it from content creation to consumption um [31:50] But there's also a lot of other tasks that it's just ripe for disruption in a lot of ways. Yeah. Yeah. I'm thinking humans are suboptimal in some ways. We see the ad, we get excited, distracted, and I go and buy the dress. And my agent, maybe I can instruct it to ignore the ads. Maybe it actually knows it's going to find the best content regardless of what's showing up on the page. So it's kind of interesting to think about how that future plays out as agents do more of our browsing. [32:20] I will say that the dress that maybe you got distracted
[32:24] I always get distracted by things too and end up purchasing stuff that gets sent my way. But I'm always happy. [32:30] with it by the time I do end up purchasing it. So I think that there's like new opportunities to think about how do you actually involve agents in this new sort of business model ecosystem. And hence that third bucket of like, there's going to be a lot of evolution happening in that space. And I think that that's where we need to evolve as an entire ecosystem. And it's not just like one player that's going to say this is how it's done. So it's been interesting just talking to different companies and different people who are also thinking in that space right now. [33:00] you know I often don't buy things on the internet because it's such a pain [33:05] oh i've definitely dropped off man i cannot i can't navigate this thing either i don't understand it yeah that happens quite a lot or it's just like i've just not got time that happens as well or i'm just i can't be i can't be bothered you know yeah um maybe it's just me but i'm not a fan of shopping let's put it that way in the real world and online um but i'm a fan in what i get you know i'm a fan in the outcome and so i don't know i i kind of feel like i [33:30] I might do more. I would probably do more online shopping. [33:34] i think you know if i if i didn't have that barrier of actually having to do the shopping bit i don't know that that would be me though no i agree what's really interesting is uh i don't know about you there's certain stores [33:45] that I'll go on to and I'll just like accumulate stuff in my cart. [33:49] And I won't want to [33:50] but like pull the trigger until a little bit later on. Yeah. I had a chance to think about it. Yeah. Yeah. But then I end up with a bunch of like half built carts across a bunch of different websites.
[33:59] And part of me also wonders, like, is there... [34:02] a world where my agent is at universal [34:05] cart, essentially, where I'm like, add all this stuff to it or like create this aggregate area of all the items that I might be interested in buying. And it can be across any site at this point because the agent represents me. [34:15] and it can remember which sites to go on. And then when I'm ready, it's sort of like, okay, one click, like, make this entire purchase, basically. And it can go in and check out on all of the different sites or all the different stores. So that'll be, yeah, that'll be an interesting area to think about. Yeah, okay. What I just heard from you guys is ecommerce conversions about to skyrocket then. I mean, on my computer, it will go up. That's all I'm saying. I don't know about anyone else. But also diversity as well. You know, like I go to the same old sites. [34:42] right but uh i would love suggestions yeah you know yeah so yeah it's like once you've kind of democratized computer use then the laziness of humans to get through checkout is no longer [34:53] the determining factor of which e-commerce companies will do well. It's just like the best product wins. Yeah. [34:58] So interesting. [35:00] Yeah. [35:00] Okay, cool. Thank you for sharing. You're welcome. [35:03] Okay, Simon, you last. Hi. Notebook. Notebook LM or notebook? We'll go with notebook LM. We're still notebook LM. I think it's been so long now that it's definitely notebook LM. There was a period where we were like, okay, is now the time? Yeah. But I think we've gone through that. [35:19] multiple hockey stick moments, which we can talk about. Yeah, it's going to be hard to remove it. I like it, though. I mean, you know, maybe [35:26] Um, [35:27] Maybe every product that has an acronym or some weird letters after it, and there are a couple of them in the AI space, regrets that. But at the same time, they become part of the team and the identity. And yeah, it's nice. I like it. I love that. Okay, so Notebook LM was one of Google's biggest viral hits last year? Last year. It went viral last year. Yeah. But the team had been building it for a long time.
[35:53] a while before it took off. So. Totally. Yeah. Tell me about how it's evolved in the last year. Yeah. Yeah. Well, [36:01] So firstly, the viral moment, you know, so my way into Notebook LM was through audio overviews. So me and the team had a kind of we were also exploring the future of content, but from a different angle, I think. And, you know, Notebook was the perfect. [36:18] balance of user control, but also the power of the technology. And our hypothesis was that there was an opportunity for personal content. [36:29] So not content that is for everybody, actually a content that's for an audience of one, maybe two, maybe three, small group maximum. And that was kind of how we shape the product. We didn't think it was gonna, [36:43] You know, looking at the notebook user base back then, we didn't [36:45] We thought that it was a great place to... [36:48] you know, kind of like test PMF, just kind of iterate on the product. But we didn't we were totally unprepared for the massive success of audio overviews and then through that notebook as well. So [37:01] It was honestly the first couple of months was really just kind of hanging on for dear life. Firstly, it was making sure that the TPUs don't fully melt as a riser, I think, had a gif out back then. But there was also just a lot of iterations and fixing things and improving things. And, you know, that was really the first couple of months. I think since the start of this year, maybe we've managed to take stock. At the end of last year, we launched the join mode, the ability to join in a podcast and audio overview, I should say.
[37:31] and talk with the host and ask questions and all this kind of stuff. [37:34] But at the start of the year, we kind of took stock. And we've really been thinking about, you know, what is a notebook? [37:42] for the notebook users. How are your notebook users really leaning into notebooks once they've come in the front door through audio overviews? And we've started to think about, and Jacqueline, you kind of touched on this, I think, you know, the criticality of context in really enabling these AI systems to be genuinely useful for you. And we found that a lot of users, when they're using notebook, they use them for these kind of more longer running, almost like projects that [38:12] these um they can be ongoing projects or they can be projects with a goal [38:17] you know, like I've got to prepare for a presentation or something like that. And so a lot of what we've been really doing is retooling, you know, how we look at notebook and also, you know, building a strategy as well that leans more into, I think there's more sort of [38:31] longer running opportunities that we see in the notebook user data. Of course, we've done a whole bunch of kind of improvements too. So we've just launched the mobile applications finally, so they came out last Monday. And we also launched international audio overviews as well. [38:50] which was kind of the end, it was the end of a long road, honestly, of upgrading the underlying AI infrastructure and models, away from the very first, almost like research grade model that we used for the initial launch, to, you know, native Gemini audio. So what you hear now, in the international
[39:10] audio overviews at the very least is native Gemini Audio. And that was a big push for many teams across labs and also GDM. Yeah. [39:17] Super cool. [39:19] It feels like audio overview was almost the viral hook. And you guys have been building out a lot in almost like the reg UI. Yeah. And just imagining what that workspace is. [39:32] looks like. Yeah. What do you think the actual just audio overview podcast thing becomes? And actually, I'm curious how you even ended up on the shape of two podcast hosts talking to each other is just like, it's such an engaging format. I'm curious how you even landed on that. And yeah, you know, I feel like it's only in its infancy still in terms of I would love, you know, podcasts every morning that pipe me up for my day and things like that. And so how much of your time is thinking about notebook, the kind of [39:58] RAG workspace environment, for lack of a better word, versus Notebook, the podcast killer. You know, the training data is going to be built on Notebook in the future. Yeah. Well, I hope not. But maybe it can help. [40:11] So the way that we're starting to increasingly look at notebooks is they're comprised of [40:16] kind of three, they give you sort of like three superpowers. [40:19] So one of them is they help you really accumulate information [40:24] you know, over time. And that's, you know, there's a lot of amazing, you know, underlying database technologies that we apply, that I think, you know, lean on first party Google technologies in a pretty unique way. The second is they bundle in intelligence.
[40:40] and [40:41] When we launched last year, [40:42] we used the old Gemini 1.5 Pro model back at that time. But obviously now we've got thinking models and so on. But the third thing is this ability for content and information to be adaptive. [40:57] to your situation. And so, you know, [41:00] podcasts or audio overviews a conversation [41:04] It's one form that information might take, but you can imagine many other forms that that information or knowledge might take as well. So you might imagine it coming at you in the form of a comic book. [41:16] or maybe a short movie, or maybe a mind map, which we've also launched. But you can imagine many other types of [41:23] of media that fit the right circumstance and form and function for the moment. [41:27] for you to understand information, to be able to analyze it, make decisions with it, kind of do something with it. So that's kind of that's the mindset that I think we have when we're thinking about, you know, [41:39] the different... [41:41] You might hear us talk about transforming information from one state to another. I think that's a fine word. It's a little bit technical, to be honest. It's more like adapting to you and fitting you, I think. That's really what we're going for. But in terms of just going back to your actual question around audio overviews and where it's going. [41:59] there is a huge amount, I think, of room left in that area. [42:06] in that technology. So I enjoy audio overviews, and I use them a fair amount, but I also
[42:13] you know every now and then i'll be like that's weird [42:15] You know, why do they say that? Well, they've kind of lost the plot there. I didn't quite get the right narrative. Sometimes it's like the uncanny valley or the illusion is broken. [42:25] when you're listening to them. And while it might seem like there's a small amount of work we might need to do, [42:32] to kind of [42:33] to fix that last step, there's actually a ton of work that we've got to do. And so there's a lot of effort being placed into all of the various components that you'll need to make the experience feel like something that, you know, where you suspend your disbelief more completely. And alongside that, you know, there are many other different show types. [42:53] We've kind of had one show type for a bit too long, I think, actually. And we're bringing more out. So we're actually working on some really cool things, a lot of them inspired by users, honestly. So one of the things that we saw users do right back at the start, but [43:09] you keep on saying it is users putting in their LinkedIn. [43:14] They're putting it on LinkedIn, well why? Number one, it's kind of fun to hear people talk about you, but a lot of users are using it to get feedback. [43:22] you know, like to kind of, to understand from another person's perspective, who they may not have access to, you know, feedback is truly a gift, like real feedback is hard to find, you know. So, you know, how would somebody else look at me? How would somebody else talk about my strengths? And how might somebody else talk about areas to improve? You know, this is something that we see users already using audio overviews
[43:46] to kind of access that sort of content or sort of information. We think we can make that easier for people. [43:52] So a lot of what we're thinking about now are different show types that lean into some of the [43:57] the more viral successes that we've seen our users, you know, explore online. Yeah. And also think about, you know, brand new formats as well. I think it's going to be fun. [44:06] Okay, so we're going to have training data, the comic strip. I'm not saying. We're definitely going to have it. But I mean, it's – I think – [44:16] Not everything has a story. [44:18] you know, and so applying different adaptations will almost be sort of context dependent, I think. Yeah. But oftentimes it does help, you know. So one of the things that we were looking at the other day was we were looking at 150 page PhD dissertation on it was invasive, invasive wolves, I think, in some part of Europe. And [44:43] Yeah, you could have looked at a mind map, you could have looked, you could have maybe listened to an audio overview if you had like 10-15 minutes to spare. But actually getting a kind of a... [44:52] a comic book rendition of of that PhD was really helpful just to kind of understand the overall narrative within it. Yeah. So, you know, we're still working on things like that. But I think there's a lot of there's a lot of opportunity there. And of course, you know, comic books are very similar to storyboards and that. I was just thinking exactly that. Thomas is doing too as well. So, yeah, there's a lot of there's a lot of interesting ways that I think labs projects intersect and.
[45:19] We'll continue to explore them. Yeah. You can create a hero's journey comic book of somebody's LinkedIn. [45:24] uh career arc i mean for an audience of one person and one person only that's probably going to be the most awesome movie that we'll ever ever have seen so maybe yeah maybe yeah yeah that's awesome really cool where do you see notebook going from here [45:38] Yeah, well, like I said, we're really, I think our focus is, aside from a whole bunch of different adaptations, we're really thinking about how we can be more useful to our users over their more longer running projects. [45:51] And so both in the world of the knowledge worker, but also in the world of students, these are our kind of like core users, I think. [45:59] um the project is really the the the an area where [46:04] those users [46:05] Both need the most assistance. [46:07] But it's also where it's also the point of highest value, I think, for them. Right. So if you're in the world of work, the project is where value accumulates. It's the atomic unit of work. Yeah. Right. It's a real unit of work. We actually call them units of knowledge, but it's a great way of putting it. And the same for a student as well. You know, the project, if it's a project with a goal, passing a test, that's a big deal. [46:30] or if it's an ongoing lifelong learning thing, that's also really important as well. So I think really focusing on use cases in those domains is something we're thinking a lot about. I'll say the other thing is, I think one of the things I'm personally very excited about, I've been in the consumer product space for many, many years, and...
[46:50] - One of the, [46:52] I guess one of the things that we did at Google when we went kind of mobile first in the mobile first era, [46:57] is we moved a lot of our desktop products to mobile. [47:00] And if you look at those mobile products, many of them are the desktop products shrunk down to a small screen. [47:05] And that's okay. [47:07] And I think because we were one of the first, because we built Android and a lot of our big products basically got mobilified at that point, we found it hard to change at that point forwards. But I've always been really interested in thinking about if you have a desktop experience, what is a companion mobile experience that doesn't have to just be a carbon copy of the desktop experience that maybe leverages the form factor, the sensors, the actual fact that it's with you all at all times. [47:37] to deliver an additive experience [47:40] on top of the desktop experience. So, you know, we've just launched the mobile experience after [47:46] a fair amount of time in development, it's fair to say. But what I'm really most excited about there is the opportunity to actually iterate on that [47:53] kind of novel mobile experience. [47:55] going forward. Like for example, wouldn't it be cool if I was, you know, maybe I'm in a discussion with some amazing, really smart people, and I've popped Notebook down, I've opened a voice, its native voice recorder, [48:07] And it's just able to record the conversation for me. And then I can transform that to later dates and accumulate them and all this kind of stuff. That's the thing that... [48:16] It's probably going to be weird if I open my laptop and push record on my laptop. But for the mobile device, it's the perfect opportunity. Totally. Yeah.
[48:24] Really cool. Thank you for sharing. Yeah. We're going to close it out with some predictions on AI as a whole. Please jump in. Hot takes. Welcome. Let's see. [48:34] Let's start with [48:35] What are your favorite Google Labs projects that we didn't talk about today? Like what what is under what is the what are the gems right now that you're most excited about? The unreleased Google products that we're not allowed to talk about. Not the unreleased ones. But you guys you guys just announced like like 15. [48:48] - 50 things, there has to be others beyond the three we talked about today. - Yeah, yeah. - I have one. [48:54] which is kind of still in this like video and image space. But I think the virtual try on [48:59] Yeah. [48:59] stuff that we presented, like there's a lot of exploration in it. I think that one to me is really nice because I think it [49:04] meets a real direct user need it's [49:06] the strength of Google, obviously, we know we have all the inventory and we know how to connect this. And it's just so fun to just see things on your site. I'm very excited about that one. I think this has like a [49:15] That's my favorite as well. That's so funny. Okay. I think that was a good one. [49:18] Stitch, I think, is really cool to be able to just talk to the product and describe what design you want and have it actually come out. [49:25] with that front end design. I'd been using it a little in dog food before it was launched. And so it's just I want to spend more time [49:33] using it now that it's actually live. Really cool. What about you? [49:38] Well, mine is going to be Stitch. [49:41] So I'm going to have to think. Two votes for Stitch, one vote for Shopping. I'm with you, two votes for Shopping. Yeah, there we go, there we go. What, I guess, what areas do you think will be hottest in the application space for AI broadly in 2025? Like, I think.
[49:56] coding was, you know, maybe the breakout application in the last 12 months. What do you think will be the breakout application in the next 12 months? [50:02] Video. [50:03] I think there's like something around like these remixable content. [50:08] You know, you generate something, I take your thing, I just riff off of it. Yeah, there's something around this that I think is going to pop up somewhere. I hope it's us. [50:16] But that part feels really interesting. It's going to feel like, you know, Whiskey's heading a bit that way. VU obviously can power a lot of this in video. [50:22] um, [50:23] Yeah, I think that's going to be something this year. [50:25] As you look back at past predictions of what you thought was going to be interesting in AI, where have you been really right and where have you been really wrong? [50:32] We're just... Let's say where we're really wrong altogether. All right. Three, two, one. Timing. [50:39] Okay, say more. A single time. [50:41] I think there's been several examples where we definitely felt like we were onto something, and we were onto something. We were just too early into the space. And so it's been fun to see, like, projects kind of go on pause or, you know, stop for a little bit. And then some of them are starting to even... [50:58] come back around again at this point. And so sometimes we just were a little too early, but it just gives us a jumpstart when the models and the capabilities are ready. [51:07] Good problem to have. Yeah. What do you think you've been really right on and like sticking to your convictions on? [51:12] I think this is for me in my space, like the show and tell piece, this idea that like you shouldn't [51:17] you shouldn't ask users to kind of write two pages of text to describe, for example, an image. The idea is like you should just be able to show and tell, like you would do a friend or an artist that's working with you.
[51:26] um, [51:27] I think that has stuck in this. It's kind of moving people away from prompting and towards kind of instructing and relying on the intelligence that leaves behind. So I think that one... [51:36] But when I'm sticking with my guns, I think you're there to stay. [51:40] I mean, this is probably obvious at this point, but... [51:42] When we all started in labs, [51:44] Um, [51:45] there was no Google LLM API. Google didn't have a functional [51:51] instruction tuned language model or anything like that [51:54] and [51:56] Believe it or not, back then, in fact, I think the general consensus was that these were not really things that are easy to build a business around. [52:05] because of their cost. And I think one of the things that we've all done actually is we've kind of stuck with the technology and now it's obvious, right? But in the early days, it certainly was not obvious. So yeah, and we got that bit of timing right. [52:20] Yeah, yeah, yeah. Yeah. Inference costs, just riding that curve and just [52:25] Capabilities up, costs down, and what will you build assuming that those curves continue? Yeah, exactly. In fact, when we joined, one of the transitions is right to think. [52:34] inside labs that Josh started actually. And a lot of the docs that we'd write were around, well, what happens in two years? [52:40] you know, and of course, yeah, that that curve is something that I think inspired a lot of us. Yeah. Yeah. Fantastic. Thank you all so much for joining to share what you're doing across the creative sphere, the, you know, computer use sphere and the [52:54] What do I call the notebook sphere? The podcast killer slash-- Let's not say podcast killer. But yeah, we can say knowledge. Knowledge, creation, transformation space. It's really, really cool what you all are building. And you guys have such a cool job getting to kind of cook in the little test kitchen of Google. And thank you for giving a preview of some of the stuff that's coming down the pipeline.
[53:17] Thanks for having us. Thank you. [53:20] *music* [53:44] Thank you.
Want to learn more?