Robotics is undergoing a transformation, and it’s not just because of advancements in hardware. The rise of large language models (LLMs), like ChatGPT, is having an unexpected impact on how robots plan, reason, and interact with the world. Jeannette Bohg, an expert in robotics at Stanford University, shared insights into this new era of robotic development on a recent episode of The Future of Everything podcast.
For decades, researchers have dreamed of creating robots that can seamlessly assist us in daily life—whether it’s making dinner, tidying up the house, or working alongside humans in complex environments like hospitals. Yet, despite significant progress, many of these goals have remained just out of reach. Robots, while great at repetitive tasks in controlled environments, often struggle with the unpredictability of the real world. However, recent developments in AI, specifically the use of large language models, are offering a glimpse into a future where robots can not only follow complex instructions but also understand and reason about their tasks in ways that were previously impossible.
Traditionally, robots have relied on pre-programmed instructions to perform tasks, making them excellent at repetitive, structured work like assembling cars in factories. But when it comes to handling unpredictable environments, such as cooking in a cluttered kitchen, robots still face significant challenges. This is where LLMs come in. By leveraging the “common sense” baked into these models, robots can now approach tasks with a higher level of reasoning. The AI can help break down a task into logical steps, such as gathering ingredients, chopping vegetables, and cooking them in sequence. It’s the kind of high-level planning that would otherwise require painstaking manual programming.
But LLMs aren’t a complete solution. While they’re great at understanding tasks on a symbolic level—grasping the general idea of handling a fragile glass or picking up a tool by its handle—they still can’t help robots with the fine motor skills required to complete these actions. Turning a doorknob, for instance, requires an understanding of physical force, grip, and precise timing, skills that are still beyond the capabilities of today’s robots. The challenge now is to develop models that can bring the same level of sophistication to low-level sensorimotor control as LLMs have to high-level reasoning.
One of the biggest hurdles in achieving this is data. While LLMs are trained on vast amounts of text data—trillions of tokens, in fact—robots don’t have the same abundance of training data. Most robotic datasets consist of painstakingly collected information, often involving human operators using joysticks to teach robots how to perform tasks. This slow and costly process means that the gap between AI models trained on text and those trained on real-world robotic actions is vast. There’s simply not enough data to fuel the kind of leap forward that LLMs made possible in language processing.
Interestingly, humans can learn a lot from videos—whether it’s watching someone cook or repair a household appliance on YouTube—but translating this ability to robots isn’t straightforward. A human hand, with its complex system of joints, tendons, and muscles, is an amazing tool. Replicating this in a robot is no easy feat, and even when robots can “watch” a video of a human performing a task, they often lack the mechanical dexterity to replicate it. Understanding how to map human movements from video onto the unique mechanical structure of a robot is an ongoing research challenge, one that could unlock vast amounts of untapped training data for robotic systems.
As exciting as these developments are, there’s still a long way to go before we have fully autonomous robots that can handle all aspects of human tasks. Bohg is quick to point out that while general-purpose robots, the kind seen in science fiction, are still far off, the future of robotics likely lies in more specialized machines designed for specific tasks. Instead of a robot that can do everything, we’re more likely to see robots tailored for particular environments, such as hospitals, where they can shuttle supplies or assist with simple tasks that free up human workers to focus on more critical activities.
Take, for instance, the startup Diligent Robotics, which is working on robots to help nurses by shuttling supplies around hospitals. Rather than trying to create a robot that can take on patient care, Diligent Robotics identified a simpler yet highly impactful task—moving supplies—that robots can do more reliably. These specialized robots are not just augmenting human abilities but helping create more efficient workflows in environments where time and precision are critical.
The hardware side of robotics also remains a significant challenge. Today’s robotic arms are still fragile, expensive, and prone to breaking down. Many research robots can cost tens of thousands of dollars, and their fragility means that researchers spend as much time repairing them as they do testing them. Industrial robots in factories, which are stiff and precise, are reliable but not suitable for environments where they might interact with humans—they’re simply too dangerous. Developing robotic hardware that is robust enough for everyday use, while also safe for human environments, is an ongoing challenge.
Looking ahead, Bohg envisions a future where robots may not look humanoid, despite the popular image in science fiction. Humanoid robots might seem intuitive because they can operate in spaces designed for humans, but in reality, robots designed with simpler, more specialized forms might be more practical and efficient. Think of a Roomba with an arm that could pick up objects off the floor—less humanlike, perhaps, but far more useful for everyday tasks.
The future of robotics, then, isn’t about achieving human-like capabilities all at once, but about carefully designing robots for specific, impactful tasks while continuing to push the boundaries of what AI and machine learning can accomplish. By combining the reasoning power of large language models with more advanced hardware and sensorimotor control, the dream of robots seamlessly integrating into our daily lives is closer than ever—though still just out of reach.
Leave a comment