Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens Ep44

Episode Description

Carolina Parada and the team have delivered Gemini Robotics, Google DeepMind's vision-language-action (VLA) foundation model. Gemini Robotics provides the general-purpose 'understanding' enabling robots to go from pixel to action.

How do you teach a machine to understand the physical world well enough to move through it, manipulate it, and help people in it, when every case is a corner case, never experienced in training?

Embodied AI. AI with arms and legs and the ability to interact with the real world. Gemini Robotics is designed to generalize across platforms, so it works for robots that walk, roll, fly, and swim, with any end-effector, be it a hand, gripper, pincher, or suction cup. Gemini Robotics is designed to generalize across tasks and skills to respond to just about any request that the robot receives.

I sat down with Carolina to explore Google DeepMind's approach to embodied AI at the Humanoids Summit 2025, hosted and organized by ALM Ventures at the Computer History Museum in Mountain View, California.

Carolina has been working on teaching machines to recognize and respond to the environment in more human-centric ways, starting with speech and voice, then computer vision, and now robotics.

At the heart of her work is Gemini Robotics, a foundation model that takes the multimodal reasoning capabilities of Gemini and extends them into the physical world. It's a VLA—vision-language-action—model. Going beyond "how many cars are in this image?" to "dunk the ball" when playing with a basketball toy. Embodiment-agnostic, it can adapt to control any robot: manipulators, mobile platforms, and the quickly developing humanoids.

Data, Constitutional AI, teleoperation, video training, good candidates for the top concepts covered. But what impressed me more was her description of bringing new people in to experience the robots, inevitably asking the robots to do things they've never heard before, or interacting in Japanese or another language, only to have the robot respond appropriately, creating 'delight, surprise, and joy.'

That is a robot future I can get excited about.

Please join me in welcoming Carolina Parada to Turn the Lens, in collaboration with Humanoids Summit and ALM Ventures.

This interview is a collaboration between Turn the Lens and Humanoids Summit, and was conducted at the Humanoids Summit SV, Computer History Museum, Mountain View, California, December 12, 2025. Humanoids Summit is organized and hosted by ALM Ventures 

Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens with Jeff Frick Ep 44

Learn more about Humanoids Summit at humanoidssummit.com

Episode Links and References

Carolina Parada Interview - Links & References

Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens with Jeff Frick
In collaboration with Humanoids Summit, brought to you by ALM Ventures, from Humanoids Summit 2025 Silicon Valley, December 2025

RECENT ANNOUNCEMENTS & PARTNERSHIPS (2025-2026)

Boston Dynamics Partnership (January 2026)

Waymo Expansion (November-December 2025)

Gemini Robotics On-Device (June 2025)

GEMINI ROBOTICS FOUNDATION MODELS (2025)

Gemini Robotics 1.5 (September 2025)

Initial Gemini Robotics Launch (March 2025)

Analysis & Industry Context

CONSTITUTIONAL AI & SAFETY (2024-2025)

ASIMOV Benchmark & Robot Constitution

  • Generating Robot Constitutions & Benchmarks for Semantic Safety (March 2025)

ROBOTICS TRANSFORMERS (RT-1, RT-2, AutoRT) (2023-2024)

RT-2: Vision-Language-Action Models

AutoRT, SARA-RT, RT-Trajectory (January 2024)

MUJOCO PHYSICS SIMULATOR

Official Resources

WAYMO AUTONOMOUS VEHICLES

Official Waymo Resources

Historical Context

Media Coverage

APPTRONIK & APOLLO HUMANOID ROBOT

Official Resources

Apollo Coverage

GOOGLE DEEPMIND - ORGANIZATION

Official Resources

Background

KEY CONCEPTS & DEFINITIONS

Vision-Language-Action Models (VLAs)

  • VLAs combine visual perception, language understanding, and robotic action
  • Transform multimodal inputs (images, text) directly into motor commands
  • Gemini Robotics is the most advanced VLA model to date
  • Predecessor: RT-2 (2023) demonstrated emergent capabilities

Vision-Language Models (VLMs)

  • Process visual and language inputs for reasoning
  • Gemini 2.0 is the foundation VLM
  • Gemini Robotics-ER is specialized VLM for embodied reasoning

Embodied AI

  • AI systems with physical presence that interact with real world
  • "AI with arms and legs" as Carolina describes
  • Requires understanding of physics, spatial relationships, safety

Foundation Models

  • Large-scale models trained on diverse data
  • Can be adapted to many downstream tasks
  • Gemini Robotics designed to be embodiment-agnostic
  • Single model works across different robot types

Embodiment-Agnostic

  • Ability to control different robot forms (humanoids, mobile manipulators, etc.)
  • Transfer learning across robot morphologies
  • Key innovation of Gemini Robotics

World Models

  • Simulate physics and spatial consistency
  • Used for evaluation and training
  • Complement teleoperator data
  • Critical for safety testing

Constitutional AI

  • Safety framework using natural language rules
  • Inspired by Asimov's Three Laws of Robotics
  • Context-specific behavioral constraints
  • Can be modified for different deployment scenarios

Teleoperation

  • Human remote control of robots
  • Primary data source for training currently
  • Being supplemented with simulation, world models, video learning

Cross-Embodiment Learning

  • Knowledge transfer between different robot types
  • Accelerates training for new platforms
  • Core capability of Gemini Robotics

Corner Case Problem

  • In autonomous systems, rare/unexpected scenarios
  • Waymo operates assuming "every case is a corner case"
  • Foundation models handle novel situations better than hand-coded rules

COMPUTER VISION & PERCEPTION

Self-Driving Cars (Historical Context)

  • Carolina's background in computer vision for autonomous vehicles
  • Perception systems for understanding environments
  • Foundation for robotics work

TECHNICAL TERMS MENTIONED

  • End-effector: Robot hand/gripper/tool at end of arm
  • Pixels to actions: Direct mapping from visual input to motor commands
  • Planners and controllers: Traditional robotics approach (separate from VLAs)
  • Eval/evaluation: Testing model performance
  • Simulation: Virtual testing environment (MuJoCo)
  • Benchmarks: Standardized tests for comparing models (ASIMOV, ERQA)
  • Fine-tuning: Adapting pre-trained model to specific tasks
  • Inference: Running trained model on new inputs
  • Generalization: Performing well on unseen scenarios
  • Dexterity: Fine motor control and manipulation skills
  • Semantic understanding: Comprehending meaning and context
  • Spatial reasoning: Understanding 3D relationships and geometry

HUMANOIDS SUMMIT

Event Information

  • Humanoids Summit Silicon Valley 2025
    • Location: Computer History Museum, Mountain View, CA
    • Hosted and organized by ALM Ventures
    • Founder: Modar Alaoui
    • First year: 2024
    • London show: June 2025
    • Upcoming: Japan show (May 2026)

Previous Coverage Reference

  • Interview conducted at inaugural Humanoids Summit 2024
  • Andra Keay interview from same event mentioned in sample

CAROLINA PARADA - BACKGROUND

Current Role

  • Senior Director and Head of Robotics, Google DeepMind
  • Leads Gemini Robotics team

Career Progression

  • Started in speech and voice recognition
  • Computer vision work
  • Perception for self-driving cars
  • Now leading robotics foundation models (10+ years in robotics)

Public Appearances

  • Humanoids Summit 2025 speaker
  • Quoted in major Google DeepMind announcements
  • Featured in MIT Technology Review, TIME, other major publications

RELATED HUMANOID ROBOTICS COMPANIES (Context)

Competitors & Partners Mentioned

  • Figure AI: Figure 02 humanoid, Helix VLA model
  • Agility Robotics: Digit humanoid
  • Boston Dynamics: Atlas humanoid, Spot quadruped
  • Sanctuary AI: Humanoid development
  • 1X Technologies: Humanoid robots
  • Tesla: Optimus humanoid
  • Mercedes-Benz: Apollo deployment partner
  • Jabil: Manufacturing partner for Apollo

Trusted Tester Program Partners

  • Apptronik
  • Agility Robotics
  • Boston Dynamics
  • Agile Robots
  • Enchanted Tools

GEMINI MODEL FAMILY (Context)

Gemini 2.0

  • Foundation multimodal model
  • Basis for Gemini Robotics
  • Advanced reasoning and world understanding
  • Supports 100+ languages (enables multilingual robots)

DATA SOURCES FOR TRAINING

Current Methods

  1. Teleoperation data: Human remote control
  2. Simulation: MuJoCo physics simulator
  3. World models: Generate synthetic scenarios
  4. Video learning: Research stage
  5. Cross-robot data sharing: Fleet learning

Research Directions

  • Learning from demonstration
  • Learning from videos (abundant internet data)
  • Synthetic data generation
  • Multi-robot collaborative learning

SAFETY APPROACHES

Multi-Layer Safety

  1. Constitutional AI: High-level semantic rules
  2. Pre-action reasoning: "Think before acting"
  3. Collision avoidance: Low-level motor control
  4. Force limits: Joint torque thresholds
  5. Emergency stops: Kill switches
  6. Alignment: Following Gemini Safety Policies

ASIMOV Benchmark

  • Dataset for evaluating semantic safety
  • Tests understanding of dangerous scenarios
  • Includes real-world injury reports
  • Measures ability to reject unsafe tasks

MEMORABLE QUOTES FROM INTERVIEW

  • "Embodied AI. AI with arms and legs and the ability to interact with the real world."
  • "Every case is a corner case" (Waymo philosophy applied to robotics)
  • "Delight, surprise, and joy" - Carolina's reactions to unexpected robot capabilities
  • "The robot can understand all the languages" (benefit of Gemini foundation)
  • Speaking to robots in Japanese and seeing them respond
  • Visitors asking robots to do unexpected things

FUTURE DIRECTIONS DISCUSSED

  • Robots learning from experience
  • Adaptation from deployment
  • Continuous improvement flywheel
  • Expanding to more robot types
  • Real-world applications beyond labs
  • International expansion (Japan, UK, global)

ADDITIONAL CONTEXT

Computer History Museum

  • Location of interview
  • Mountain View, California
  • Historic venue for tech events

Humanoids Market Trends

  • Rapid funding growth (Figure: $1B Series C, $39B valuation mentioned in Andra Keay interview context)
  • Pilots moving to commercial deployments
  • Exponential capacity growth
  • LLMs transformed training paradigm

--

Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens with Jeff Frick Ep44
In collaboration with Humanoids Summit, brought to you by ALM Ventures, from Humanoids Summit 2025 Silicon Valley, December 2025
Links and References
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved

Episode Transcript

Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens with Jeff Frick
In collaboration with Humanoids Summit, brought to you by ALM Ventures, from Humanoids Summit 2025 Silicon Valley, December 2025
English Transcript 
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved

====================================================================

Intro Cold Open:
It means AI that has a body
I used to just say AI with arms and legs but this is a much cooler way to describe it
It's embodied AI
[laughter] AI that can move
5, 4, 3 ... 

JEFF FRICK:
Hey, welcome back everybody Jeff Frick here coming to you from the park. I'm excited to announce a collaboration with Humanoids Summit, who's put on and sponsored by ALM Ventures with Modar Alaoui and Jesica Chavez's a huge part of the team there that really makes it all happen We got to cover our second Humanoids Summit and we had ten amazing interviews and I'm excited to share that we're now through this collaboration going to co-release those interviews through 'Turn the Lens' so I'm psyched 

So the concept here is 'Embodied AI'. And if you think about it you basically take these amazing foundation models and you put them in the robot and that is what is really happening. Humanoid robots is a subset of embodied AI. Waymo cars are a subset of embodied AI. And in fact, this little drone that's taking this interview for me is embodied AI. I'm not actually piloting this drone. I'm just telling it to stay over there and film me and it's wrestling with some wind. I don't have a tripod and you know I'm basically giving it a mission not flying it. 

So think of embodied AI as where it's AI, but it has arms and legs and wheels and can roll and fly and sail and do things. So traditionally it was something called a VLM, a Vision Language Model and now we've got what's called a VLA model, Vision Language Action where the robots can not only figure out the answer to the question, but then can figure out a way to accomplish a task and more importantly react to environmental changes on it's way to accomplishing that task 

Really important concept and what's really cool is these foundation models of which Carolina's team is creating the Google Gemini foundation model for robotics that wants to be so foundational that it doesn't really matter, it's kind of platform agnostic as to what type of a robot it goes into what are the sensors, what are the actuators, the arms, the way that it moves. 

And the first interview is a great representative of some of the leading minds in humanoids and really beyond humanoids. She was a keynote at Humanoids Summit. She was a keynote at the Consumer Electronics Show [CES] this year which kicked off everything with humanoids, with robotics with embodied AI. 

Carolina Parada she is the Senior Director and Head of Robotics for Google DeepMind. Carolina is in charge of the Google Gemini project, which is a foundational model for robotics [Gemini for Google Robot (GR00T)] and the idea is that this is going to be platform agnostic. This foundational model will really allow those computers to use a combination of vision language and then create an action plan for accomplishing that task and more importantly as all sophisticated robots do, not only do they have an action plan but they have a way to react to the environment around them and make a change if they need to make a change to accomplish the task. 

I was going to record this in a Waymo. Waymo's are great example of embodied AI. Their goal is to take you from point A to point B on public streets. You can pretty much assume that every situation a Waymo encounters was never in one of its training tapes. So how do you train them with enough to educate the system as to be able to figure things out? We're a long way from perfect but they're getting a lot closer than I think most people would assume. 

The other example I use all the time is autonomous drones and you've heard my story about Skydio 2 and the original kind of follow along and do not crash drone. Now I have this new one. It's taking my picture. It's recording my video. I don't have a tripod. I don't have a gimbal. I don't have a whole bunch of things. It's all packaged in this little 240 gram package that is amazing that I don't really have to fly, I just have to tell it what I want it to do and the autonomous vehicle figures out how to accomplish that task. 

We talked about some other things like Constitutional AI where you basically create a constitution with some bounds and the robot is trained to stay within those constitutional bounds or check their actions against the constitution One of the many layers in the safety processes and protocols that we talked about as well. 

And that's not even my favorite part. My favorite part is when I asked Carolina to talk about the magic And she talked about the light and joy and really the emotions as these things become better and better at generalizing tasks This concept that once you can show that an increasing level of data for any task, increases the success rate for all tasks, and you just drive more data in it, that really opens up opportunities that I think are a little further along than most people think, and I think they're going to continue to develop faster than most people think. So I'm really excited. So without further ado and a big thanks to big thanks to Humanoids Summit My interview with Carolina Parada Thanks for watching. Enjoy.

Interview Cold Open:
5, 4, 3 ... 

JEFF FRICK:
Hey welcome back everybody. Jeff Frick here. Coming to you from the Computer History Museum at Humanoid Summit [2025] We were here a year ago. It was the very first year of the show. They did a summer show in London [June 2025] and they just announced they're going to have another summer show next year in Japan [May 2026] And we're really excited for our next guest. She's been involved in basically getting computers to learn how people operate for years and years and years. She started in computer vision [and voice] and has been very involved in that part of the world and now has switched over to the robotics. So we're excited to have. She is Carolina Parada She is the Senior Director and Head of Robotics for Google DeepMind. Carolina, great to see you today.

CAROLINA PARADA:
Thanks for having me. 

JEFF FRICK:
So you've been involved in this for a long time. Let's take a step back to vision. Because computer vision seems like that was one of the earliest attempts to try to take human capability and have it work well in a robot which ultimately ends up with things like Waymo's and all kinds of cool things. Talk about the computer vision problem when you started, how hard is it? Were you surprised at how fast you were able to solve it? You and the team. Were there some really critical breakthroughs that happened that moved it along. And how do you think about that compared to where we are now with your latest challenge?

CAROLINA PARADA:
So I certainly have worked on vision models prior to even working in robotics. I was working on self-driving cars and doing perception for self-driving cars. And that is something that is still very, very useful. And a lot of traditional robotics has very good strong perception systems followed by planners and controllers. So it's still very much used in the industry. The thing that we've been working on for about ten years now is on building foundation models for robotics. 

Our mission is to build embodied AI responsibly to help people in the physical world. So we are focused on building that intelligence layer that powers general purpose robots. And we're not doing that in the traditional way which robotics works today. It's more like a full end-to-end system that goes from pixels all the way to robot actions. So the same way that Gemini, for example, you can talk to it today and it will give you either a text or image. Now we can get Gemini to actually move a robot and physically embody a robot and decide how to move in order to complete a task.

JEFF FRICK:
Okay, well let's back up a step and talk about foundation models because I think a lot of the audience will be familiar, obviously with ChatGPT and kind of LLMs as their baseline knowledge of a foundation model. And we've also heard here talk about the world model which is kind of a digital, digital twin of the world and all the environmental factors and gravity and those types of things. So it's a big potential space. How do you break it down and what are the key pieces to a foundational model that will allow it to be leverageable across all the different platforms and lots of different applications? 

CAROLINA PARADA:
Yeah. So there are many different types of foundation models. The foundation models that we built are called Gemini Robotics. So Gemini is a foundation model. It's a VLM [Vision Language Model] So it takes visual input language input and then reasons back to you, right. So it can speak. You can ask a question like how many cars are in this image. And it will reason back that. Or you can ask a question like, How would I move from point A to point B in this image? And it can reason about that. So that's a VLM [Vision Language Model] Gemini Robotics is a VLA. [Vision Language Action model] So what we do is that we take a VLM like Gemini and then we adapt it to also predict actions. So now the same kind of reasoning that you have about the real world about how do I reason about going from A to B instead of just responding how to do that? It actually decides how to move the robot. So it's a different type of foundation model called a VLA.  There's other types of foundation models like there is video generation models or world models that are essentially producing pixels as you move around in a world and they're modeling the physics of that world and the spatial, consistency over time in that world. And that's yet another type of foundation model. 

JEFF FRICK:
So are you guys building lots of different ones that cover particular verticals or particular types of skills or actions or how should people think about it? 

CAROLINA PARADA:
Certainly for robotics we're building one foundation model that tries to do everything, so 

JEFF FRICK:
The mega one for robotics. 

CAROLINA PARADA:
That's correct 

JEFF FRICK:
Targeted for robotics specifically. 

CAROLINA PARADA:
Yes. So we build a single foundation model that can take any natural language request and output the motions of a robot. The other thing is that we really care about not just moving one robot. We care about a foundation model that is capable of moving any robot, including humanoids. Right? So it is a embodiment agnostic AI that can be adapted to move any robot and importantly, thus bringing all of Gemini's world understanding to bear when he's making a decision about how to move an arm or how to do any task. 

JEFF FRICK:
So is there like an SDK or some interpretive layer between a particular robot that's got particular attributes, you know, runs around on treads versus walking around on two feet, where the robot, the robot vendors can leverage the work that you guys have done to get that translation. How does that work?

CAROLINA PARADA:
Yeah, I mean, the way we work with different hardware partners is that we have a trusted tester program is called the Gemini Robotics, a tester program. And essentially what we do through that is that we enable them to test our models, and then we also get feedback about how their models are doing by building benchmarks or building data sets that capture the use cases that they care about. 

JEFF FRICK:
Does that get back incorporated back to the big model, or is it kind of a almost like a, like a shard for that particular relationship?

CAROLINA PARADA:
Yeah, it depends on the relationship. Definitely. There is like, partners that we can work with to improve the model with, and there's partners that are simply just testing it and evaluating it. And then we're just getting feedback, which is also extremely valuable. Right, right. Because it's much better to be able to evaluate this in real world applications than just in our lab with toys and objects that you can find in a lab setting.

JEFF FRICK:
So it's a great segue, because one of the things we can hear over and over is that for LLMs it was easy to train, and there's lots of words. There's a whole internet full of words and books and libraries and even for visual, there was a lot of data that you guys could pull from in terms of photos that existed. There isn't that giant corpus of kind of robotic movement data. So I wonder if you can speak a little bit about how the training has changed with things like synthetic data and teleoperation and these different levers that have really accelerated the whole training cycle to a major way.

JEFF FRICK:
So you don't want it so that you have to have a foundation model for each platform, whether it's a humanoid, or whether it's a traditional arm with actuators, or whether it's a wheeled device. So you're trying to make it very foundational and agnostic across lots of different types of platforms. 

CAROLINA PARADA:
Yeah, I would say data is still a bottleneck for robotics. I think most people will, acknowledge that. I think, today our models are still used to liberated data in order to translate all these world understanding into actions. So that's a sure way to ensure that you can teach a model how to operate a new robot. But we're certainly always exploring many different data sources. So as you saw in my talk today, we're starting to explore world models. And we found that critical because as you started evaluating models on, you know, hundreds, thousands of different scenarios, it becomes really time consuming to even run the of our, let alone collect data for it.

So we're starting to leverage world models in order to evaluate many different scenarios. It's also super important for safety reasons. You don't want to be able to put really dangerous situations in front of the robot, just to know whether you can handle it. So you can do this in a world model in a much safer place. We also evaluate different types of data sources, like we use simulation, for example, our team is, the Mujoco team is part of our team. And so they have really advanced simulation. And then we also look into learning from videos like which is there's abundant information off. Right. Right. But it's all of this is still I would say in research stages there is no like very clear recipe that translates this into robot actions just yet. But, I think the entire community is really trying to figure out how to leverage other data sources in addition to delivery.

JEFF FRICK:
Let's talk a bit about safety, because, you know, another huge difference in this world compared to kind of classic industrial robot world is those, are super high precision machines. They went to a point, they made the weld, and they were roped off or glassed off. They weren't around people. These things want to be around people. We want them out and about. So how do you think about safety? There's so many kind of layers to safety. How do you kind of bake it in and make sure it's an important piece for every step of the puzzle? 

CAROLINA PARADA:
Yeah, I mean, we actually have quite a bit of investment on safety, we think is a critical component that we need to think about it from the start, not just like an add on at the end of the development cycle. So when we think about safety, we have a multi-layer approach to safety. So there's traditional robotics safety which is ensuring that, for example, the robot is stable, that the robot is not doing strong collisions or it has limited forces when it's touching the environment. All of that is something that we can connect to. So our models can work with those existing safety controllers. On top of that, though, if you, as you say, if you now are putting a robot, that is general purpose, that can do thousands of tasks, there's all these other scenarios that are going to come up, and a lot of them, us humans, deal with them through simply, common sense, right? We have common sense safety. So part of what we're working on is what we call semantic physical safety, which is essentially bringing to robots common sense when it comes to safety scenarios. So we have quite a lot of work in that area where we essentially are trying to extract from foundation models because it's all there, like it's all in the human data. 

What are the common sense ways in which you should act in a new scenario? So for example, if we asked the robot, hey, bring me a glass of water, it won't put it here at the edge of the table, even though we didn't specify. It knows that it doesn't make sense to put it there. It will bring it here.

JEFF FRICK:
Right

CAROLINA PARADA:
And like that there is, you know, millions of situations where things can go wrong. So that's one of the areas we work on. We also work on Constitutional AI. This is actually the ability that you have to bring a constitution into a model. So

JEFF FRICK:
Explain that concept

CAROLINA PARADA:
Yeah

JEFF FRICK:
I’ve never heard of that

CAROLINA PARADA:
Yeah. So it's super interesting. So the idea in constitutional AI is that there's a set of rules you can give the robot which could be human generated, or it could be data generated. 

JEFF FRICK:
Okay.

CAROLINA PARADA:
And the robot has to follow that set of rules. We use this all like a

JEFF FRICK:
so it's like a 'check against,' like the Constitution is kind of. 

CAROLINA PARADA:
Yeah. It's exactly like that. 

JEFF FRICK:
You got to make sure. 

CAROLINA PARADA:
Yeah. And it's one of the many ways in which we add a safety layer. And the Constitution can also be given in context. So you can say in this context I want to make sure that for example you don't go in that area or you don't approach this type of situation. So it's one of the many ways in which you can bring a robot, bring some context into a robot. 

JEFF FRICK:
It's interesting. And, you know, we're all enjoying the Waymo as this self-driving, autonomous thing that's out and about. And one of the items that came up was, when you get to the point where you have to assume that every case will be a corner case. 

CAROLINA PARADA:
Yeah. 

JEFF FRICK:
There, like, you know, there's just no way for them to anticipate every potential driving situation in the cities. And so to flip the bit and just assume everything is something that you've never seen before. So you're going to have to use your inference and your learning and your foundation models to sort out what's the right thing to do. That's a really different approach. 

CAROLINA PARADA:
Yeah, that's exactly right. The minute that you're exposing, you're asking a robot to now do thousands of tasks. They're like chances that they're going to be, have seen that in the training data is non-existent. So not only do you need to build the robot and the AI in the robot in a way that is adaptable and general that it can handle lots of different situations that it's never seen before but you also need to get it to learn from those situations. That's another research area that we're working on, is how do you get robots that adapt from their experience?

JEFF FRICK:
Right. 

CAROLINA PARADA:
But yeah, I'm a big huge fan of Waymo. I'm super excited to see them all over it. I use them all the time, and it's fantastic to see that they're now approaching all of these problems where every case is an edge case. 

JEFF FRICK:
Yeah. It's really, it's a completely different way to to define the problem. So you've been at this a long time and you've seen amazing things and you're kind of you know, behind the curtain and you can see this crazy technology. What still amazes you? What still just, where your eyes just light up when you see some of these things that are starting to come into fruition which you could have never imagined. 

CAROLINA PARADA:
Yeah. 

JEFF FRICK:
Five years ago six years ago. Forget about 10 or 15. Forget that 

CAROLINA PARADA:
Yeah. No, I mean, I think we've had many moments where we feel amazed like when all of a sudden. First of all, I bring a lot of people to see the robots and experiences themselves. And I always find a moment where I'm surprised myself because when, these policies are very, very general, so people will come and ask all kinds of things I haven't thought of asking the robot. And, it's just wonderful to see how like the, how deep the understanding is. How it can break things down into different steps. Also really amazed when it started doing really complex behavior like those belts or, those things are extremely hard and the team has spent years working on them, all the sudden, see a general model that can also do that. 

JEFF FRICK:
Right. 

CAROLINA PARADA:
Is really cool. I've also found it super delightful to see people speaking to the robot in their own language. So we bring it to conferences and people will come in and speak it, to it in like Japanese. And the robot just responds. And just like the joy of people seeing that the robot understands all, what they're saying is wonderful, Obviously I haven't tested it in Japanese right?

JEFF FRICK:
Right, right 

CAROLINA PARADA:
But it's something that we inherit simply because Gemini supports all of these languages. Now the robots can understand all the languages. So 

JEFF FRICK
That's a great line, right? Life has more imagination than we do. 

CAROLINA PARADA:
Yeah.

JEFF FRICK:
Like you can't even think of these things. So that's really cool. 

CAROLINA PARADA:
Yeah. 

JEFF FRICK:
All right. Well, Carolina thanks for taking a few minutes. 

CAROLINA PARADA:
Thanks for having. 

JEFF FRICK:
I know you're super busy. People are lined up like crazy to get at you. So, really, thanks for taking the time. 

CAROLINA PARADA:
Thank you so much. Bye.

JEFF FRICK:
All right. She's Carolina, I'm Jeff, you're at Humanoids Summit here in Mountain View, California at the Computer History Museum. Thanks for tuning in. We'll catch you next time. Take care.

COLD CLOSE:
Clear
Awesome
Thank you
Thank you

==

Carolina Parada: Embodied AI, Gemini Robotics, Delightful Surprise | Turn the Lens with Jeff Frick
In collaboration with Humanoids Summit, brought to you by ALM Ventures, from Humanoids Summit 2025 Silicon Valley, December 2025
English Transcript 
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved

==

Jeff Frick

Entrepreneur & Podcaster

Jeff Frick has helped tens of thousands of executives share their story.

Disclaimer and Disclosure

Disclaimer and Disclosure

All products, product names, companies, logos, names, brands, service names, technologies,  trademarks, registered trademarks, and registered trademarks (collectively, *identifiers) are the property of their respective owners. All *identifiers used are for identification and illustrative purposes only.

Use of these *identifiers does not imply endorsement.

Other trademarks are trade names that may be used in this document to refer to either the entities claiming the marks and/or names of their products and are the property of their respective owners.

We disclaim proprietary interest in the marks and names of others.
No representation is made or warranty given as to their content.
The user assumes all risks of use.

© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved