For robots to become genuinely useful in our everyday lives and industries, they need to go beyond simply following instructions and instead be able to reason about the physical world. From navigating intricate facilities to reading a pressure gauge’s needle, a robot’s „embodied reasoning“ is what connects its digital intelligence to real-world physical actions. Today, we’re launching Gemini Robotics-ER 1.6, a major upgrade to our reasoning-first model that allows robots to perceive and understand their surroundings with exceptional accuracy. By boosting spatial reasoning and multi-view comprehension, we are ushering in a new era of autonomy for the next generation of physical agents. This model is specifically designed for the reasoning skills essential to robotics, such as visual and spatial understanding, task planning, and success detection. It serves as the high-level reasoning engine for a robot, enabling it to perform tasks by directly invoking tools such as Google Search for information retrieval, vision-language-action models (VLAs), or any other user-defined third-party functions. Gemini Robotics-ER 1.6 delivers substantial gains over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, with notable advances in spatial and physical reasoning abilities including pointing, counting, and success detection. We are also introducing a new capability: instrument reading. This allows robots to interpret complex gauges and sight glasses — a use case we identified through our close collaboration with Boston Dynamics.
Starting today, Gemini Robotics-ER 1.6 is available to developers through the Gemini API and Google AI Studio. To help you get started, we are sharing a developer Colab containing examples of how to configure the model and prompt it for embodied reasoning tasks.. Figure 1: Benchmark results comparing Gemini Robotics-ER 3.03 with Gemini Robotics-ER 1.5 and Gemini 3.0 Flash models. The instrument reading evaluations were conducted with agentic vision enabled (except for Gemini Robotics-ER 1.5, which does not support it). All other evaluations were conducted with agentic vision disabled.
Google DeepMind News