Pete Florence: Generalist, Scaling Laws, Train One Improve All | Turn the Lens with Ep46

Or listen on:

Episode Description

Pete Florence: Generalist, Scaling Laws, Train One Improve All

What if training a robot to do ONE thing automatically made it better at EVERYTHING?

Pete Florence, Co-founder & CEO of Generalist and former Google DeepMind Senior Research Scientist, joins Jeff Frick at Humanoids Summit 2025 to reveal a breakthrough that fundamentally changes how we think about robot intelligence.

The big discovery? Robotics has finally found its scaling laws, just like large language models. At 7 billion parameters, models cross an "intelligence threshold" where more data predictably equals more intelligence. No more hitting walls. No more plateaus. Just continuous improvement.

But the real magic is cross-task generalization: when you train on one skill, the robot gets better at all skills. It's not just learning faster, it's learning universally.

Pete explains why Generalist is betting on generalist robots (yes, the double meaning is intentional) when specialists have dominated for decades, how smaller models experience "ossification" and literally stop learning, and why reaching a "data-rich regime" of 270,000+ hours of real-world interaction data changed everything.

He also introduces fascinating concepts like "physical hallucinations" (when robots confidently do the wrong thing) and why teaching robots epistemic humility, the ability to say "I don't know" might be more critical than any task-specific training.

From his award-winning work on Dense Object Nets at MIT to pioneering RT-2 and PaLM-E at Google DeepMind, Pete has been at the cutting edge of embodied AI. Now with GEN-0, he's proving that foundation models can work in the physical world, with all the scaling properties that made LLMs so powerful.

Key Topics:

The 7B parameter intelligence threshold breakthrough
Why training one task improves all tasks (cross-skill learning)
GEN-0: First embodied foundation model with proven scaling laws
Generalist vs specialist: Why Pete's betting against conventional wisdom
Ossification: When models give up and stop learning
Physical hallucinations in robotics
270,000+ hours of real-world data and why it matters
The data-rich regime that enables scaling
Teaching robots to know their limits
Comparing robotics timelines to autonomous vehicles

Guest Bio: Pete Florence is Co-founder & CEO of Generalist, an embodied AI company building foundation models for physical robots. Previously a Senior Research Scientist at Google DeepMind, Pete led groundbreaking research on RT-2 (vision-language-action models) and PaLM-E (embodied multimodal language models). He earned his PhD in Computer Science from MIT under Russ Tedrake, winning multiple Best Paper awards including CoRL 2018 for Dense Object Nets and the IEEE RA-L Best Paper Award 2020. His work has been cited over 20,000 times and featured in the New York Times, WIRED, and CNN.

About the Event: Recorded at Humanoids Summit 2025 (December 11) at the Computer History Museum in Mountain View, California. The Summit brought together 2,000+ attendees from 400+ companies and 40 countries, featuring leaders from Google DeepMind, Boston Dynamics, Physical Intelligence, and dozens of humanoid robotics startups.

This interview is a collaboration between Turn the Lens and Humanoids Summit, and was conducted at the Humanoids Summit SV, Computer History Museum, Mountain View, California, December 11, 2025. Humanoids Summit is organized and hosted by ALM Ventures.

Links:

Pete Florence: https://www.peteflorence.com
Generalist AI: https://generalistai.com
GEN-0 Blog: https://generalistai.com/blog/nov-04-2025-GEN-0
RT-2 Research: https://robotics-transformer2.github.io
Humanoids Summit: https://humanoidssummit.com

Host: Jeff Frick, Turn the Lens / Work 20XXEpisode: 46Series: Humanoids Summit 2025 Interviews

Listen to our full series from Humanoids Summit, including interviews with Carolina Parada (Google DeepMind), Jeff Burnstein (A3), and other robotics leaders.

‍

Episode Links and References

Links and Referencees

Pete Florence

Co-founder & CEO, Generalist
Former Senior Research Scientist, Google DeepMind
PhD in Computer Science, MIT (Advisor: Russ Tedrake, 2019)
Research focus: Robotics, AI, computer vision, manipulation
Google Scholar citations: 20,000+
Personal website: https://www.peteflorence.com
LinkedIn: https://www.linkedin.com/in/peteflorence

Generalist AI

Company Overview

Mission: Make general-purpose robots a reality through embodied foundation models
Founded by Pete Florence (left Google DeepMind Spring 2024)
Focus: Data, models, and hardware intersection for physical AI
Team backgrounds: OpenAI, Boston Dynamics, Google DeepMind, frontier labs
Investors: Spark Capital, NVIDIA, Boldstart Ventures, Bezos Expeditions, NFDG

‍GEN-0: Embodied Foundation Model

Official announcement: November 2024
Blog post: https://generalistai.com/blog/nov-04-2025-GEN-0
Key innovation: First embodied foundation model demonstrating predictable scaling laws with physical interaction data
Dataset: 270,000+ hours of real-world manipulation data (growing 10,000 hours/week)
Architecture: Built on "Harmonic Reasoning" - think and act simultaneously
Model size: 10B+ parameters
Cross-embodiment design: Tested on 6DoF, 7DoF, and 16+ DoF semi-humanoid robots

Key Findings from GEN-0 Research

Intelligence threshold discovered at 7B+ parameters
1B models: Experience "ossification" - weights stop absorbing new information
6B models: Begin showing benefits from pretraining, stronger multi-task capabilities
7B+ models: Fully internalize large-scale robotic pretraining, achieving transfer with minimal fine-tuning
Scaling laws: More pretraining data + larger models = predictably better downstream performance

Pete Florence's Key Research & Publications

At Google DeepMind

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (2023)

Co-authored with 50+ researchers including Chelsea Finn, Sergey Levine, Brian Ichter
Website: https://robotics-transformer2.github.io
Paper: https://arxiv.org/abs/2307.15818
Blog: https://deepmind.google/blog/rt-2-new-model-translates-vision-and-language-into-action
6,000+ robotic evaluation trials
Demonstrates emergent capabilities: chain-of-thought reasoning, semantic understanding, novel object generalization

PaLM-E: An Embodied Multimodal Language Model (2023)

Led by Danny Driess and Pete Florence
Blog: https://research.google/blog/palm-e-an-embodied-multimodal-language-model
Combines large language models with robotic sensor data
562B parameters (largest version)
State-of-the-art vision-language model with robotics capabilities

Additional Google Research

"Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition" (with Huy Ha, Shuran Song)
"Large Language Models as General Pattern Machines" (with Suvir Mirchandani, Fei Xia, et al.)
"Interactive Language: Talking to Robots in Real Time" (RA-L 2023)
"Code as Policies: Language Model Programs for Embodied Control" (ICRA 2023)

At MIT (PhD Work)

Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation (2018)

Co-authors: Lucas Manuelli, Russ Tedrake
Paper: https://arxiv.org/abs/1806.08756
Winner: Best Paper Award, CoRL 2018
Winner: Amazon Robotics Best Technical Paper Award in Manipulation 2018
Self-supervised learning for robotic manipulation
Media coverage: WIRED, CNN, Newsweek, Engadget, VentureBeat

Self-Supervised Correspondence in Visuomotor Policy Learning (2020)

Co-authors: Lucas Manuelli, Russ Tedrake
Paper published in RA-L and ICRA 2020
Winner: IEEE Robotics and Automation Letters Best Paper Award 2020
Best Paper Finalist, CoRL 2021 (for XIRL follow-up work)

PhD Thesis: "Dense Visual Learning for Robot Manipulation" (2019)

Massachusetts Institute of Technology
Advisor: Russ Tedrake (Toyota Professor of EECS, MechE, and Aero/Astro)
Thesis PDF: https://groups.csail.mit.edu/robotics-center/public_papers/Florence19a.pdf

Additional MIT Publications

"kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation" (ISRR 2019)
"LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data" (ICRA 2018, Vision Paper Finalist)
"NanoMap: Fast, uncertainty-aware proximity queries with lazy search over local 3d data" (JFR 2018)
Work on high-speed quadrotor flight and collision avoidance

Topics Discussed in Interview

Scaling Laws in Robotics

Evidence of predictable scaling behavior in embodied AI (similar to LLMs)
Model size thresholds: 1B vs 6B vs 7B+ parameters
Cross-skill generalization: training on one task improves performance on others
"Intelligence threshold" phenomenon

Generalist vs. Specialist Robots

Historical dominance of specialist robots (vacuum cleaners, warehouse AMRs)
Pete's thesis: Generalist capabilities will expand into specialist domains
Specialist domains will eventually need broader capabilities
Strategic bet on generalist approach based on scaling evidence

Data Requirements for Embodied AI

Cannot rely solely on internet data/simulation
Need high-fidelity raw physical interaction data
Generalist's approach: 270,000+ hours real-world data, growing 10,000 hours/week
Data quality and diversity matter as much as quantity
Different data mixtures produce models with different behavioral characteristics

Hallucinations in Robotics

Language model hallucinations: confidently stating false information
Robotics equivalent: performing completely wrong actions ("physical lies")
Importance of robots knowing their limits: "I don't know" capability
Visible mistakes vs. undetectable errors

Timeline Comparisons

Waymo example: 14 years from Google self-driving car project to open San Francisco service
Advantage for robots: Many applications don't require autonomous driving safety levels
Multiple deployment pathways with varying risk profiles
Not a singular problem like autonomous driving

Humanoids Summit Context

Event Details

December 11-12, 2025
Location: Computer History Museum, Mountain View, CA
2,000+ attendees from 400+ companies, 40+ countries
65+ speakers, 60+ exhibiting companies
Organized by ALM Ventures (Modar Alaoui, founder)
Website: https://humanoidssummit.com
Inaugural event: December 2024 (1,400 attendees)
London edition: May 29-30, 2025
Tokyo edition: Announced for 2026

International Humanoids Day

Annual celebration: December 12 (12/12)
Coincides with Humanoids Summit
Focus: Ethical development, innovation, global collaboration

Other Speakers at 2025 Summit

Carolina Parada (Head of Robotics, Google DeepMind): "From Language to Motion: How Gemini Powers the Next Generation of Robots"
Abhinav Gupta (Co-Founder, Skild AI): "Any Task, Any Robot, One Brain"
Allen Ren (Physical Intelligence): "Towards a Model That Controls Any Robot to Perform Any Task"
Moritz Baecher (Director, Disney Research Zurich Lab): Disney's Robotic Character Platform
Companies: Apptronik, XPeng, Boston Dynamics, Agility Robotics, Sanctuary AI, Figure, and many more

Related Companies & Research Mentioned

Similar Approaches to Foundation Models

Physical Intelligence (Pi): π0 series models
Figure AI: Helix vision-language-action model
Skild AI: Unified robot control
OpenAI: Robotics research
Boston Dynamics: Atlas, Spot, Stretch robots + Large Behavior Models

McKinsey Report

"Humanoid robots: Crossing the chasm from concept to commercial reality" (October 2025)
~50 companies worldwide raised $100M+ for humanoid development
~20 companies in China, ~15 in North America
China's government mandate: humanoid ecosystem by 2025

Key Concepts & Technical Terms

Embodied AI / Foundation Models

Models that learn from physical interaction with real world
Differs from language/vision models trained on internet data
Requires direct sensorimotor experience
"Embodied intelligence is when AI meets gravity" (Pete Florence)

Scaling Laws

Predictable improvement in model performance with more compute and data
Established in LLMs (Kaplan et al., 2021)
Now demonstrated for robotics with GEN-0
Enables confident investment in data collection and model development

Ossification

Phenomenon where smaller models stop absorbing new training information
Weight saturation in insufficient model capacity
Occurs in 1B parameter models
Overcome at 7B+ parameter scale

Cross-Embodiment Learning

Single model architecture works across different robot forms
6DoF, 7DoF, 16+ DoF robots
Enables fleet learning and knowledge transfer

Harmonic Reasoning

Proprietary architecture in GEN-0
Enables simultaneous thinking and acting
Integrates high-level reasoning with low-level motor control

Vision-Language-Action (VLA) Models

Architecture that combines vision, language understanding, and robot control
Introduced in RT-2
Actions represented as text tokens
Enables transfer of web knowledge to physical tasks

Moravec's Paradox

What's hard for humans (math) is easy for computers
What's easy for humans (perception, movement) is hard for computers
Core challenge in robotics

Background Reading & Context

Scaling Laws Literature

"Scaling Laws for Neural Language Models" (Kaplan et al., OpenAI, 2021)
"Scaling Laws for Transfer" (Hernandez et al., 2021)
"Learning to reason with LLMs" (OpenAI, 2024)

Related Robotics Research

RT-1: Robotics Transformer 1 (Google, prior to RT-2)
AutoRT: Automated robot data collection system (Google DeepMind, 2024)
SARA-RT: Self-Adaptive Robust Attention for Robotics Transformers (Google DeepMind, 2024)
Droid dataset: Large-scale manipulation dataset
Figure Helix: Humanoid foundation model

Philosophical Concepts

"Mind Children" (Hans Moravec, 1988) - early robotics philosophy
Moravec's Paradox on intelligence and embodiment

Additional Resources

MIT CSAIL Robotics

Russ Tedrake's Lab: https://groups.csail.mit.edu/locomotion/
Research on manipulation, locomotion, control
Drake: Model-based design and verification for robotics

Google DeepMind Robotics

Robotics blog: https://deepmind.google/blog
Robotics at Google team
RT-X: Open-source robotics initiative

Industry Analysis

McKinsey: Humanoid robots research
Silicon Valley Robotics: Industry organization
TechEx / AI & Big Data Expo series

Media Coverage

NY Times Hard Fork Podcast on RT-2
Kevin Roose (NYT) front page article on RT-2
The Cognitive Revolution podcast (Med-PaLM M discussion)
The Gradient Podcast interview with Pete Florence (2023)
The Robot Brains Podcast with Eric Jang and Kevin Zakka (2022)

GEN-0 Video

Robot builds Lego copies | Generalist
https://www.youtube.com/shorts/eF9PqpPB0MM
An early preview of model capabilities | Generalist z
https://www.youtube.com/watch?v=mhfleCK_IAI
Gen-0 Robot from Generalist manipulating objects super fluidly
https://www.reddit.com/r/robotics/comments/1oovscx/gen0_robot_from_generalist_manipulating_objects/
GEN-0 Is The First Truly Intelligent Robot (Learning 500X Faster Than Humans)
https://www.youtube.com/watch?v=muz-ThTlrco

Interview Production

Turn the Lens Podcast

Host: Jeff Frick
Collaboration with Humanoids Summit / ALM Ventures
Location: Computer History Museum, Mountain View, CA
Series: Humanoids Summit 2025 interviews
Other interviews in series: Carolina Parada (Google DeepMind), Jeff Burnstein (A3), and others

Work 20XX Podcast

Host: Jeff Frick
Focus: Future of work, AI adoption, workplace technology
50+ episodes on distributed teams, hybrid work, organizational leadership

Key Quotes from Interview

On generalist vs. specialist: "Any specialty area eventually kind of expands beyond its bounds and gets bigger. And any generalist area is eventually going to come into whatever you thought was kind of a walled garden in your specialty area."

On scaling breakthrough: "At 7 billion parameters, something kind of magical happened. It just kept getting better. The more data, the more trials you pumped into the machine, the better and better it got across skills and behaviors."

On the importance of task: "It doesn't even really matter what the X is, what the skill is, what the behavior is. If you can do more of it and you get better at it by doing more of it, the machine will also get better at doing everything else."

On hallucinations: "For language models, you can either say the correct thing or you can say, I don't know. That's the other way to avoid hallucinations - recognize the limits. And that is a very useful concept to have for physically acting robots as well."

On robotics timeline: "There's just a lot of different types of robots, a lot of different types of use cases that people will want to use them for. It's not as much of a singular problem as self-driving has been."

Interview Date: December 11, 2025 Location: Computer History Museum, Mountain View, California Event: Humanoids Summit 2025

‍Last updated: January 2026

Episode Transcript

Pete Florence: Generalist, Scaling Laws, Train One Improve All | Turn the Lens with Jeff Frick Ep 46
English Transcript
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved

Introduction

Jeff Frick:
Hey welcome back everybody. Jeff Frick here. Coming to you from the Baylands. One of my favorite places. I'm excited to release the second interview that we have from Humanoids Summit in collaboration with the team at Humanoids Summit and ALM Ventures, is with Pete Florence. He is the Co-founder and CEO of Generalist and Generalist has something called GEN-0 and GEN-0, I’m just going to read it is the ‘embodied foundation model that scales with physical interaction’.

And Pete shared some really interesting slides during his keynote. The one that really got my attention is he was showing what happens with big models. And the ones he was talking about with a billion parameters. At a certain point, the model really stops reacting positively to more data and more training and, and kind of does what they call kind of forcing it to fit in the model anyway, stop scaling. And then at 6 billion parameters the model does a lot better. It gets further before it kind of stalls out, but ultimately it doesn’t continue to get better and better with more data and more trainings. And then at 7 billion parameters something kind of magical happened. It just kept getting better. So the more data, the more trials you pumped into the machine, the better and better it got across skills and behaviors and that becomes pretty compelling because if that's in fact the case as Pete said, it doesn't even really matter what the X is, what the skill is, what the behavior is. If you can do more of it and you get better at it by doing more of it, the machine will also get better at doing everything else.

That's pretty amazing. I mean that kind of stops you in your tracks, right? If that's in fact true. And we've gotten that big in terms of the number of parameters.

The other thing that Pete talked about that I thought was pretty interesting was, the generalist versus specialist discussion and you know, the specialist robots have been winning for a long, long time. The generalists just weren't there. But his point was any specialty area eventually kind of expands beyond its bounds and gets bigger. And any generalist area is eventually going to come into whatever you thought was kind of a walled garden in your specialty area. And so what they decided to do was bet on the generalist. And he's been around long enough to know, and he's seen some things that if he's betting on generalist he's got a pretty good feeling that it's just around the corner, not that far away.

So without further ado, my interview with Pete. Thanks for watching. Thanks for listening. Bye bye.

Main Interview

Cold Open:
Let’s go in. 5, 4, 3…

Jeff Frick:
Hey, welcome back everybody. Jeff Frick here, coming to you from the Computer History Museum. This is our second trip back to the Humanoids Summit. We were here a year ago for the inaugural event, and I think I’ve heard that it’s grown by a factor of three. They had another one, I believe, in London last summer, and they just announced they’re going to be in Tokyo next summer. So as Andra Keay says, humanoids are having a Cambrian moment. And the progress from just 12 months ago is pretty amazing. So we’re excited to have our next guest. He’s Pete Florence. He is the co-founder and CEO of Generalist. Pete, good to see you.

Pete Florence:
Good to see you. Thanks for having me.

Jeff Frick:
Absolutely. So you’ve been in this space for a little while, and you’ve been in deep research around AI and vision and voice and some of these things. So you’ve seen this magic kind of come to fruition. One of the big topics that’s always been around is specialist versus generalist. And probably the most successful robots out there are maybe the vacuum cleaner or the little Amazon robots that are running around at scale. You decided to come at it as a pure generalist. Why did you want to tackle it that way?

Pete Florence:
It’s a good question. I think for us, we really think that the huge untapped opportunity is in making robots that are extremely general purpose. At the same time, you do need to excel and be specialist enough at any one particular thing to actually make it useful in the real world. But I think the lessons of the last handful of years of machine learning have taught us that every time you try to think, “Oh, I’m just going to have this narrow little model in this one little domain, and that’s going to be my niche, and then the general models will do other things but not my thing,” that’s not a long-term bet that we think is the right one to take. A lot of it just comes down to the fact that all of the data makes everything better. And not just the data, but the way the models are trained. Once you sort of take the leap of faith that all of the tasks you can possibly think of, trained all together, do indeed make the model better at all the individual little things, you want a general-purpose system. That said, today it really matters to actually achieve a level of mastery on particular tasks to make them relevant for real use cases. But yeah, generalism is the way to go for sure.

Jeff Frick:
So talk a little bit about how LLMs change the game in terms of training, and also other foundational models, because it seems like there’s a really big shift in training, which is just one of many factors that’s accelerated this whole thing. But it feels like the foundation models and the language models were really a big factor in bumping this thing up a step.

Pete Florence:
Yeah, 100%. There are many different ways to look at it. If you go back to before language models started to take off, and I haven’t been around forever, but I’ve been around long enough that I remember when people were starting to whisper that these language models were really starting to work. Back in the mid-2010s, there were several different frontiers of deep learning that were all happening in parallel. I would say that in the mid-2010s, vision was really the one that had the most momentum. And then into the late 2010s, at least from my memory and personal experience, it started to become clear that these language models were really starting to work. Once the obvious implications started to sink in, what GPT-3 and beyond were going to be capable of, you started to see just like the entire way in which we train these models, the way we understand them, the way we do evaluations, this whole maturation of the model factory, so to speak. A lot of the lessons in how we train those types of models impact, in many different ways, how we train robotics models today. One part of that is taking a lot of different aspects of the general recipe and applying them, sometimes completely separately, in robotics. But then also, and this was part of a bunch of my work back at Google, literally taking the language model and making it the robot brain. Once upon a time, not that long ago, it was a crazy idea to say, okay, we’re just going to take the language model and train the language model itself to also be the robot model. But now, of course, that’s sort of obviously a good idea. There are a lot of nuances there, but that is clearly the direction.

Jeff Frick:
Right, right. Is it the VLA, right? The vision, the language, and then taking action from that, which was a really revolutionary concept, to combine those two.

Pete Florence:
I think in particular, a few years ago, it was just wild times in terms of so many different ways we could think about taking language models and using them in robotics. It was also a time where all the labs were still publishing pretty freely. It’s different now. But it was very easy to come up with lots of different ideas, like, oh, we could do a language model this way, a language model that way, and bring it into robotics. You could have a lot of agentic-type systems where you design how the language model interacts with the robot. Ultimately, though, I do think the most powerful way is to take the language model and make sure it’s a multimodal language model. Back at Google, at the time, there was basically only one multimodal language model that existed before the one we made, which was called PaLM-E. So we had to make our own. Then we made the whole thing directly the brain of the robot, rather than some engineered system layered on top. There are different trade-offs, but it becomes a lens through which you look at everything afresh if you can take the entire model and make it the brain of the robot.

Jeff Frick:
Right, right. So tell us a little bit about what you’re doing with GEN-0. You’ve taken your learnings and you’re doing a new, fun thing. What’s the basis?

Pete Florence:
Yeah, sure. At the heart of GEN-0, the most important thing is that it makes scaling in robotics, in a very general-purpose sense, possible now. What that means in terms of how we think about it is that we have a very general-purpose recipe. And to be honest, it doesn’t even really matter too much what the x-axis is, other than the fact that it’s something we can continually do more of. And the y-axis is some measure of how good the robots are. And we see that we can just continue to pour more and more effort onto that x-axis, and the y-axis continues to get better and better. That’s a little bit of a generic phrasing, but importantly, we’re able to take a model that’s trained on very general-purpose physical interaction data, and we train on more and more of it. And every single task that we’re tracking continues to get better. And that’s a landmark type of moment, at least relative to what we’ve seen before. There have been a lot of attempts at making what you might call a foundation model for robotics. I think most of them, or I would say all of them today, the main attempts have either been like some of the stuff where you’re talking about Google, where the generalization comes from taking internet-scale data and soaking that all in, or you have more of a task-specific paradigm. People might claim it’s general, but it always kind of seeps in through how people design tasks. Instead, with GEN-0, the way we scale pre-training is completely separated from any idea of how we think about any particular task we’re solving. We continue to scale general-purpose data. Every single task that we track continues to get better. This has been a moment that we’ve been excited to share.

Jeff Frick:
Right. Now it’s interesting, you showed a graph earlier in your presentation, and in the model that didn’t have enough parameters, it stalled out.

Pete Florence:
It gave up.

Jeff Frick:
It hit a wall.

Pete Florence:
It gave up, right?

Jeff Frick:
The line stops. So what’s so different? Is it because you’re able to incorporate so many more parameters to get past that critical function? Is there some other magic or secret sauce, or did you just break through that tipping point?

Pete Florence:
In some ways, yeah. The concept there is very simple, but it is very profound. If you go back to machine learning 101, like a decade ago, if your model looked like the validation loss was going up, you would call that overfitting. And you would say, “Okay, I need to somehow reduce my overfitting.” And one way you might try, and it doesn’t always work, is to actually make the model smaller. And it wasn’t forever ago, I forget the exact years, when the field started to understand this concept of double descent. It depends on the regime you’re in, and it depends on how much data you have. There are regimes where, if you have enough data, making your model much bigger is actually much more effective at avoiding these overfitting effects. Again, it’s very simple, but actually having enough data, number one, in robotics to see these types of effects, being in a data-rich regime within robotics, has been a very challenging thing to attain. And then number two, actually having all of the model training set up just right to create the conditions so that we can observe this, that’s been tricky for us. And it wasn’t like the first time we tried all this we beautifully got this result. It took a lot of iteration over more than a year from the team.

Jeff Frick:
So one of the concepts talked about a lot here is synthetic data. Because, you know, to train an LLM there’s lots of text out there, there’s a lot of data. For training robots, there isn’t necessarily this giant corpus of data like there is for language.

Pete Florence:
Yeah.

Jeff Frick:
So there’s this whole concept of synthetic data. For the folks at home that don’t understand, what is synthetic data, and how could synthetic data contribute to real data? Explain synthetic data as a concept and how it’s used in training these things.

Pete Florence:
So generically speaking, synthetic data is any data that we wouldn’t say is real. And for robotics, the most direct way to think about that is any data that’s not from the real world. There are a lot of different ways you can think about getting synthetic data. I think the two main paradigms people think about today are either from a more traditional simulator, maybe a better term would be a physics-based simulator, or you could use some type of learned model of the world, a world model. And there’s a lot of excitement around that as well.

Jeff Frick:
So the concept, for people that aren’t as familiar, is you build basically a digital world with digital attributes around the use case you’re trying to do. And then you can run a million, kajillion “pick up the bottle” scenarios and lock it to a different location within that world. Is that right?

Pete Florence:
Yeah, that’s a good way to think about it.

Jeff Frick:
I can simulate the real world faster to get more trials to feed back to the machine. Is that what it is?

Pete Florence:
Yeah, yeah. So for us, we haven’t talked publicly about how we use synthetic data. But I would say that when we’ve presented GEN-0, and as we’ve talked about publicly, all of the data in GEN-0 that we’ve talked about, including the sheer amount of it, that is all real-world data.

Jeff Frick:
Right.

Pete Florence:
We have many different threads in synthetic, but we really do believe that real-world data is essential.

Jeff Frick:
Right, right. And just to be clear, everyone here has talked about there being lots of different ways to train. There are lots of data sources, real data, synthetic data, teleoperation, and you use them all, right? As much as you can to get the most benefit out of them.

Pete Florence:
Where they fit in, the limit, I think the thing is that focus is very important. Right? If human organizations operated such that focus could be infinitely sharded, or you could work in parallel on as many different things as you want, then yeah, you would want every single data source you could. But the reality is that building a culture of a team where you’re really pushing the frontier, it’s helpful to have a certain amount of focus on the particular bets you’re making in terms of research and how you’re pushing capabilities. For us, yeah, we are primarily focused so far today on real data.

Jeff Frick:
You’ve talked about the marginal cost of labor getting to zero. And I’ve heard that in other robotics talks, which is pretty interesting, because I used to always say, if compute, networking, and storage were zero, what would you build? Because they’re asymptotically approaching that every single day. But now when you add agentic, embodied AI, you put it in something that can move and do things, the possibilities, especially compared to not that long ago, are pretty astounding. As evidenced by Waymos that are driving all around as we walk outside, taking people to the airport.

Pete Florence:
Waymos are amazing

Jeff Frick:
Yeah.

Pete Florence:
I do think that sound bite by itself needs some context. Perhaps a good way to think about it is that over in the LLM model provider world, there’s this concept that we might eventually reach intelligence that’s “too cheap to meter.” And honestly, some of the models these days are amazing, especially the ones on the lower-cost part of the frontier of capabilities. That does exist today, but primarily for limited levels of LLM-type intelligence. I think something similar will happen in the physical world. The way these models will have impact will be very gradual. We very much see humans and machines figuring out how to work together. Now you have a robot that can help you be more productive, get more things done, build more. You can think of it as having a robot that can help you with almost any task you can imagine, and having that be a very productive partner to amplify your productivity. We think that’s very much the world we’re headed toward.

Jeff Frick:
Yeah, yeah. So just a final point before we wrap. You had one other conceptual thing that was really powerful. You want your robots to respond to stimuli, not necessarily just execute the skill. They’re executing the skill, but you want them to have the flexibility to respond and do things. What’s the essence of baking that in? Because you have to do that from training and design and everything, if that’s your holy grail.

Pete Florence:
Yeah, I mean, ideally most good robots need to react to stimuli. The ones that don’t, it depends on your definition of a robot. But having closed-loop interaction with the world, meaning you sense the world, then you take an action in response to sensing the world, that’s kind of the core closed-loop nature of what I would call the definition of a robot. So really figuring out how to make decisions, given observations of the world, including different types of stimuli. I think that is the core of robotics in general.

Jeff Frick:
Right. And when people think about robotics, there’s the old robotics, factory floors, robot arms.

Pete Florence:
At the lowest level they are closed loop, but a lot of robot arms in factories, like putting sheet metal together for cars, don’t have anywhere close to the level of intelligence we’re talking about when we think of the future of robotics, responding to vision and other sensors.

Jeff Frick:
Yeah.

Pete Florence:
Very much the future of robotics is being able to take in all these different multimodal inputs, multimodal sensing of the world, and figure out how to not just do a task, but really importantly, to generalize the types of skills that happen in one task across all the different tasks you can have the robot think of doing. Things like common sense in the physical world, being able to recover from edge cases, being robust no matter what happens if the environment changes, if the packaging changes, or some other notion of the task changes over time. Those are the types of things we take for granted as being very easy, but they’re exactly the things we need to solve for the next generation of robots.

Jeff Frick:
How do you think about hallucinations, just in the context of LLMs, or the model doing not quite exactly what you want? It’s one thing if it gives you a bad answer, but it’s different if you say, “Take the glasses out of the dishwasher,” and it gets a little more active.

Pete Florence:
In language models, we’re all familiar with hallucinations. The way that manifests is the model telling you something it completely made up, often very confidently.

Jeff Frick:
Very confidently.

Pete Florence:
For robotics, it has a different flavor. It would be like you telling the robot to pick up the cup and it decides to do something else entirely, take your glasses off, or just not at all grok what you asked it to do. It doesn’t feel as much like lying, because it’s not verbal, but it’s kind of like a physical lie, if the robot is completely doing the wrong thing or making up an action it really shouldn’t be taking.

Jeff Frick:
Yeah. But at least you can see it, right? I guess the difference is, if you said, “Clean up the dishes while I was out,” and it lied, and you got home and they were

Pete Florence:
Yeah. I mean, there are kind of two ways to avoid a hallucination for a language model, right? You can either say the correct thing, or you can say, “I don’t know.”

Jeff Frick:
Right.

Pete Florence:
That’s the other way to avoid hallucinations, recognize the limits.

Jeff Frick:
Right.

Pete Florence:
And I think that’s a very useful concept to have for physically acting robots as well.

Jeff Frick:
Yes, I like that, because that’s certainly not in all the word ones, that’s for sure. They never come back with, “I don’t know.” All right, Pete. Well, exciting times, and you’re right in the middle of it. I think this is going to go so much faster than anybody expects. Again, I love Waymos just as an example, because everybody can see them. And I think it’s been 14 years since Google launched the self-driving car project until they opened it up to any rider in San Francisco. So is that a long time or a short time? I don’t know. Once it’s here, it’s here. I need to go to the airport, I can dial up the Waymo.

Pete Florence:
I think the thing with self-driving cars is that you really needed to solve the ability to take somebody on public roads from point A to point B in order to make that a useful thing that you could ship into the world. For the next generation of robotics, some aspects are going to be a long journey in terms of full capabilities. Yet at the same time, there are a lot of robots that can be shipped to do things that are not as dangerous to humans as driving on public roads. So I think there are a lot of different types of robots and a lot of different use cases people will want them for. It’s not as singular a problem as self-driving has been.

Jeff Frick:
Right, right. Great. Well, thanks a lot.

Pete Florence:
Great to chat with you.

Jeff Frick:
All righty. He’s Pete, I’m Jeff, you’re watching Humanoid Summit. Thanks for watching. We’ll see you next time. Take care.

Cold Close:
Cool
We're out
Cool
Thank you

—----------------------

Jeff Frick

Entrepreneur & Podcaster

Jeff Frick has helped tens of thousands of executives share their story.

Disclaimer and Disclosure

All products, product names, companies, logos, names, brands, service names, technologies, trademarks, registered trademarks, and registered trademarks (collectively, *identifiers) are the property of their respective owners. All *identifiers used are for identification and illustrative purposes only.

Use of these *identifiers does not imply endorsement.

Other trademarks are trade names that may be used in this document to refer to either the entities claiming the marks and/or names of their products and are the property of their respective owners.

We disclaim proprietary interest in the marks and names of others.
No representation is made or warranty given as to their content.
The user assumes all risks of use.

Latest episodes

Browse Episodes

Werner Kraus: Clean Room Humanoids, Got Particles? Get Certified | Turn the Lens Ep54

"We have around 30 to 40 tests, like can we clean the surface? Does the robot emit particles? Is it biological resistant? How does it behave with temperature, for example, to make sure that the robot can be safely and cleanly operated in a clean room." - Werner Kraus



Listen Now

Chris Kudla: A Humanoid to Hug,‘ Just Right’ Level of Human | Turn the Lens Ep53

"When we put Codey in front of children, they just run up and want to talk to him. They're not looking at a screen, they're not looking at an iPad, they're actually communicating in the real world." - Cris Kudla



Listen Now

Joe Michaels: Teleoperation, Controlling Complex Robots By Feel | Turn the Lens Ep52

"There's a dream out there of just showing enough video to the robots to build big enough VLA models so that the robots can just go out on their own and do everything. But the big problem is that's really not how humans or robots learn how to do complicated things." - Joe Michaels



Listen Now