Pete Florence: Generalist, Scaling Laws, Train One Improve All | Turn the Lens with Jeff Frick Ep 46
English Transcript
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved
Introduction
Jeff Frick:
Hey welcome back everybody. Jeff Frick here. Coming to you from the Baylands. One of my favorite places. I'm excited to release the second interview that we have from Humanoids Summit in collaboration with the team at Humanoids Summit and ALM Ventures, is with Pete Florence. He is the Co-founder and CEO of Generalist and Generalist has something called GEN-0 and GEN-0, I’m just going to read it is the ‘embodied foundation model that scales with physical interaction’.
And Pete shared some really interesting slides during his keynote. The one that really got my attention is he was showing what happens with big models. And the ones he was talking about with a billion parameters. At a certain point, the model really stops reacting positively to more data and more training and, and kind of does what they call kind of forcing it to fit in the model anyway, stop scaling. And then at 6 billion parameters the model does a lot better. It gets further before it kind of stalls out, but ultimately it doesn’t continue to get better and better with more data and more trainings. And then at 7 billion parameters something kind of magical happened. It just kept getting better. So the more data, the more trials you pumped into the machine, the better and better it got across skills and behaviors and that becomes pretty compelling because if that's in fact the case as Pete said, it doesn't even really matter what the X is, what the skill is, what the behavior is. If you can do more of it and you get better at it by doing more of it, the machine will also get better at doing everything else.
That's pretty amazing. I mean that kind of stops you in your tracks, right? If that's in fact true. And we've gotten that big in terms of the number of parameters.
The other thing that Pete talked about that I thought was pretty interesting was, the generalist versus specialist discussion and you know, the specialist robots have been winning for a long, long time. The generalists just weren't there. But his point was any specialty area eventually kind of expands beyond its bounds and gets bigger. And any generalist area is eventually going to come into whatever you thought was kind of a walled garden in your specialty area. And so what they decided to do was bet on the generalist. And he's been around long enough to know, and he's seen some things that if he's betting on generalist he's got a pretty good feeling that it's just around the corner, not that far away.
So without further ado, my interview with Pete. Thanks for watching. Thanks for listening. Bye bye.
Main Interview
Cold Open:
Let’s go in. 5, 4, 3…
Jeff Frick:
Hey, welcome back everybody. Jeff Frick here, coming to you from the Computer History Museum. This is our second trip back to the Humanoids Summit. We were here a year ago for the inaugural event, and I think I’ve heard that it’s grown by a factor of three. They had another one, I believe, in London last summer, and they just announced they’re going to be in Tokyo next summer. So as Andra Keay says, humanoids are having a Cambrian moment. And the progress from just 12 months ago is pretty amazing. So we’re excited to have our next guest. He’s Pete Florence. He is the co-founder and CEO of Generalist. Pete, good to see you.
Pete Florence:
Good to see you. Thanks for having me.
Jeff Frick:
Absolutely. So you’ve been in this space for a little while, and you’ve been in deep research around AI and vision and voice and some of these things. So you’ve seen this magic kind of come to fruition. One of the big topics that’s always been around is specialist versus generalist. And probably the most successful robots out there are maybe the vacuum cleaner or the little Amazon robots that are running around at scale. You decided to come at it as a pure generalist. Why did you want to tackle it that way?
Pete Florence:
It’s a good question. I think for us, we really think that the huge untapped opportunity is in making robots that are extremely general purpose. At the same time, you do need to excel and be specialist enough at any one particular thing to actually make it useful in the real world. But I think the lessons of the last handful of years of machine learning have taught us that every time you try to think, “Oh, I’m just going to have this narrow little model in this one little domain, and that’s going to be my niche, and then the general models will do other things but not my thing,” that’s not a long-term bet that we think is the right one to take. A lot of it just comes down to the fact that all of the data makes everything better. And not just the data, but the way the models are trained. Once you sort of take the leap of faith that all of the tasks you can possibly think of, trained all together, do indeed make the model better at all the individual little things, you want a general-purpose system. That said, today it really matters to actually achieve a level of mastery on particular tasks to make them relevant for real use cases. But yeah, generalism is the way to go for sure.
Jeff Frick:
So talk a little bit about how LLMs change the game in terms of training, and also other foundational models, because it seems like there’s a really big shift in training, which is just one of many factors that’s accelerated this whole thing. But it feels like the foundation models and the language models were really a big factor in bumping this thing up a step.
Pete Florence:
Yeah, 100%. There are many different ways to look at it. If you go back to before language models started to take off, and I haven’t been around forever, but I’ve been around long enough that I remember when people were starting to whisper that these language models were really starting to work. Back in the mid-2010s, there were several different frontiers of deep learning that were all happening in parallel. I would say that in the mid-2010s, vision was really the one that had the most momentum. And then into the late 2010s, at least from my memory and personal experience, it started to become clear that these language models were really starting to work. Once the obvious implications started to sink in, what GPT-3 and beyond were going to be capable of, you started to see just like the entire way in which we train these models, the way we understand them, the way we do evaluations, this whole maturation of the model factory, so to speak. A lot of the lessons in how we train those types of models impact, in many different ways, how we train robotics models today. One part of that is taking a lot of different aspects of the general recipe and applying them, sometimes completely separately, in robotics. But then also, and this was part of a bunch of my work back at Google, literally taking the language model and making it the robot brain. Once upon a time, not that long ago, it was a crazy idea to say, okay, we’re just going to take the language model and train the language model itself to also be the robot model. But now, of course, that’s sort of obviously a good idea. There are a lot of nuances there, but that is clearly the direction.
Jeff Frick:
Right, right. Is it the VLA, right? The vision, the language, and then taking action from that, which was a really revolutionary concept, to combine those two.
Pete Florence:
I think in particular, a few years ago, it was just wild times in terms of so many different ways we could think about taking language models and using them in robotics. It was also a time where all the labs were still publishing pretty freely. It’s different now. But it was very easy to come up with lots of different ideas, like, oh, we could do a language model this way, a language model that way, and bring it into robotics. You could have a lot of agentic-type systems where you design how the language model interacts with the robot. Ultimately, though, I do think the most powerful way is to take the language model and make sure it’s a multimodal language model. Back at Google, at the time, there was basically only one multimodal language model that existed before the one we made, which was called PaLM-E. So we had to make our own. Then we made the whole thing directly the brain of the robot, rather than some engineered system layered on top. There are different trade-offs, but it becomes a lens through which you look at everything afresh if you can take the entire model and make it the brain of the robot.
Jeff Frick:
Right, right. So tell us a little bit about what you’re doing with GEN-0. You’ve taken your learnings and you’re doing a new, fun thing. What’s the basis?
Pete Florence:
Yeah, sure. At the heart of GEN-0, the most important thing is that it makes scaling in robotics, in a very general-purpose sense, possible now. What that means in terms of how we think about it is that we have a very general-purpose recipe. And to be honest, it doesn’t even really matter too much what the x-axis is, other than the fact that it’s something we can continually do more of. And the y-axis is some measure of how good the robots are. And we see that we can just continue to pour more and more effort onto that x-axis, and the y-axis continues to get better and better. That’s a little bit of a generic phrasing, but importantly, we’re able to take a model that’s trained on very general-purpose physical interaction data, and we train on more and more of it. And every single task that we’re tracking continues to get better. And that’s a landmark type of moment, at least relative to what we’ve seen before. There have been a lot of attempts at making what you might call a foundation model for robotics. I think most of them, or I would say all of them today, the main attempts have either been like some of the stuff where you’re talking about Google, where the generalization comes from taking internet-scale data and soaking that all in, or you have more of a task-specific paradigm. People might claim it’s general, but it always kind of seeps in through how people design tasks. Instead, with GEN-0, the way we scale pre-training is completely separated from any idea of how we think about any particular task we’re solving. We continue to scale general-purpose data. Every single task that we track continues to get better. This has been a moment that we’ve been excited to share.
Jeff Frick:
Right. Now it’s interesting, you showed a graph earlier in your presentation, and in the model that didn’t have enough parameters, it stalled out.
Pete Florence:
It gave up.
Jeff Frick:
It hit a wall.
Pete Florence:
It gave up, right?
Jeff Frick:
The line stops. So what’s so different? Is it because you’re able to incorporate so many more parameters to get past that critical function? Is there some other magic or secret sauce, or did you just break through that tipping point?
Pete Florence:
In some ways, yeah. The concept there is very simple, but it is very profound. If you go back to machine learning 101, like a decade ago, if your model looked like the validation loss was going up, you would call that overfitting. And you would say, “Okay, I need to somehow reduce my overfitting.” And one way you might try, and it doesn’t always work, is to actually make the model smaller. And it wasn’t forever ago, I forget the exact years, when the field started to understand this concept of double descent. It depends on the regime you’re in, and it depends on how much data you have. There are regimes where, if you have enough data, making your model much bigger is actually much more effective at avoiding these overfitting effects. Again, it’s very simple, but actually having enough data, number one, in robotics to see these types of effects, being in a data-rich regime within robotics, has been a very challenging thing to attain. And then number two, actually having all of the model training set up just right to create the conditions so that we can observe this, that’s been tricky for us. And it wasn’t like the first time we tried all this we beautifully got this result. It took a lot of iteration over more than a year from the team.
Jeff Frick:
So one of the concepts talked about a lot here is synthetic data. Because, you know, to train an LLM there’s lots of text out there, there’s a lot of data. For training robots, there isn’t necessarily this giant corpus of data like there is for language.
Pete Florence:
Yeah.
Jeff Frick:
So there’s this whole concept of synthetic data. For the folks at home that don’t understand, what is synthetic data, and how could synthetic data contribute to real data? Explain synthetic data as a concept and how it’s used in training these things.
Pete Florence:
So generically speaking, synthetic data is any data that we wouldn’t say is real. And for robotics, the most direct way to think about that is any data that’s not from the real world. There are a lot of different ways you can think about getting synthetic data. I think the two main paradigms people think about today are either from a more traditional simulator, maybe a better term would be a physics-based simulator, or you could use some type of learned model of the world, a world model. And there’s a lot of excitement around that as well.
Jeff Frick:
So the concept, for people that aren’t as familiar, is you build basically a digital world with digital attributes around the use case you’re trying to do. And then you can run a million, kajillion “pick up the bottle” scenarios and lock it to a different location within that world. Is that right?
Pete Florence:
Yeah, that’s a good way to think about it.
Jeff Frick:
I can simulate the real world faster to get more trials to feed back to the machine. Is that what it is?
Pete Florence:
Yeah, yeah. So for us, we haven’t talked publicly about how we use synthetic data. But I would say that when we’ve presented GEN-0, and as we’ve talked about publicly, all of the data in GEN-0 that we’ve talked about, including the sheer amount of it, that is all real-world data.
Jeff Frick:
Right.
Pete Florence:
We have many different threads in synthetic, but we really do believe that real-world data is essential.
Jeff Frick:
Right, right. And just to be clear, everyone here has talked about there being lots of different ways to train. There are lots of data sources, real data, synthetic data, teleoperation, and you use them all, right? As much as you can to get the most benefit out of them.
Pete Florence:
Where they fit in, the limit, I think the thing is that focus is very important. Right? If human organizations operated such that focus could be infinitely sharded, or you could work in parallel on as many different things as you want, then yeah, you would want every single data source you could. But the reality is that building a culture of a team where you’re really pushing the frontier, it’s helpful to have a certain amount of focus on the particular bets you’re making in terms of research and how you’re pushing capabilities. For us, yeah, we are primarily focused so far today on real data.
Jeff Frick:
You’ve talked about the marginal cost of labor getting to zero. And I’ve heard that in other robotics talks, which is pretty interesting, because I used to always say, if compute, networking, and storage were zero, what would you build? Because they’re asymptotically approaching that every single day. But now when you add agentic, embodied AI, you put it in something that can move and do things, the possibilities, especially compared to not that long ago, are pretty astounding. As evidenced by Waymos that are driving all around as we walk outside, taking people to the airport.
Pete Florence:
Waymos are amazing
Jeff Frick:
Yeah.
Pete Florence:
I do think that sound bite by itself needs some context. Perhaps a good way to think about it is that over in the LLM model provider world, there’s this concept that we might eventually reach intelligence that’s “too cheap to meter.” And honestly, some of the models these days are amazing, especially the ones on the lower-cost part of the frontier of capabilities. That does exist today, but primarily for limited levels of LLM-type intelligence. I think something similar will happen in the physical world. The way these models will have impact will be very gradual. We very much see humans and machines figuring out how to work together. Now you have a robot that can help you be more productive, get more things done, build more. You can think of it as having a robot that can help you with almost any task you can imagine, and having that be a very productive partner to amplify your productivity. We think that’s very much the world we’re headed toward.
Jeff Frick:
Yeah, yeah. So just a final point before we wrap. You had one other conceptual thing that was really powerful. You want your robots to respond to stimuli, not necessarily just execute the skill. They’re executing the skill, but you want them to have the flexibility to respond and do things. What’s the essence of baking that in? Because you have to do that from training and design and everything, if that’s your holy grail.
Pete Florence:
Yeah, I mean, ideally most good robots need to react to stimuli. The ones that don’t, it depends on your definition of a robot. But having closed-loop interaction with the world, meaning you sense the world, then you take an action in response to sensing the world, that’s kind of the core closed-loop nature of what I would call the definition of a robot. So really figuring out how to make decisions, given observations of the world, including different types of stimuli. I think that is the core of robotics in general.
Jeff Frick:
Right. And when people think about robotics, there’s the old robotics, factory floors, robot arms.
Pete Florence:
At the lowest level they are closed loop, but a lot of robot arms in factories, like putting sheet metal together for cars, don’t have anywhere close to the level of intelligence we’re talking about when we think of the future of robotics, responding to vision and other sensors.
Jeff Frick:
Yeah.
Pete Florence:
Very much the future of robotics is being able to take in all these different multimodal inputs, multimodal sensing of the world, and figure out how to not just do a task, but really importantly, to generalize the types of skills that happen in one task across all the different tasks you can have the robot think of doing. Things like common sense in the physical world, being able to recover from edge cases, being robust no matter what happens if the environment changes, if the packaging changes, or some other notion of the task changes over time. Those are the types of things we take for granted as being very easy, but they’re exactly the things we need to solve for the next generation of robots.
Jeff Frick:
How do you think about hallucinations, just in the context of LLMs, or the model doing not quite exactly what you want? It’s one thing if it gives you a bad answer, but it’s different if you say, “Take the glasses out of the dishwasher,” and it gets a little more active.
Pete Florence:
In language models, we’re all familiar with hallucinations. The way that manifests is the model telling you something it completely made up, often very confidently.
Jeff Frick:
Very confidently.
Pete Florence:
For robotics, it has a different flavor. It would be like you telling the robot to pick up the cup and it decides to do something else entirely, take your glasses off, or just not at all grok what you asked it to do. It doesn’t feel as much like lying, because it’s not verbal, but it’s kind of like a physical lie, if the robot is completely doing the wrong thing or making up an action it really shouldn’t be taking.
Jeff Frick:
Yeah. But at least you can see it, right? I guess the difference is, if you said, “Clean up the dishes while I was out,” and it lied, and you got home and they were
Pete Florence:
Yeah. I mean, there are kind of two ways to avoid a hallucination for a language model, right? You can either say the correct thing, or you can say, “I don’t know.”
Jeff Frick:
Right.
Pete Florence:
That’s the other way to avoid hallucinations, recognize the limits.
Jeff Frick:
Right.
Pete Florence:
And I think that’s a very useful concept to have for physically acting robots as well.
Jeff Frick:
Yes, I like that, because that’s certainly not in all the word ones, that’s for sure. They never come back with, “I don’t know.” All right, Pete. Well, exciting times, and you’re right in the middle of it. I think this is going to go so much faster than anybody expects. Again, I love Waymos just as an example, because everybody can see them. And I think it’s been 14 years since Google launched the self-driving car project until they opened it up to any rider in San Francisco. So is that a long time or a short time? I don’t know. Once it’s here, it’s here. I need to go to the airport, I can dial up the Waymo.
Pete Florence:
I think the thing with self-driving cars is that you really needed to solve the ability to take somebody on public roads from point A to point B in order to make that a useful thing that you could ship into the world. For the next generation of robotics, some aspects are going to be a long journey in terms of full capabilities. Yet at the same time, there are a lot of robots that can be shipped to do things that are not as dangerous to humans as driving on public roads. So I think there are a lot of different types of robots and a lot of different use cases people will want them for. It’s not as singular a problem as self-driving has been.
Jeff Frick:
Right, right. Great. Well, thanks a lot.
Pete Florence:
Great to chat with you.
Jeff Frick:
All righty. He’s Pete, I’m Jeff, you’re watching Humanoid Summit. Thanks for watching. We’ll see you next time. Take care.
Cold Close:
Cool
We're out
Cool
Thank you
—----------------------
Pete Florence: Generalist, Scaling Laws, Train One Improve All | Turn the Lens with Jeff Frick Ep 46
English Transcript
© Copyright 2026 Menlo Creek Media, LLC, All Rights Reserved