Patrick Löber, Member of the Technical Staff, Gemini API. Lucia Loher, Product Manager for the Gemini API. Roberto Santana serves as the Product Manager Lead at Google Cloud. Mojtaba Seyedhosseini Engineering Director Google DeepMind. Last week, we officially released Gemini Embedding 2 for general availability through the Gemini API and the Gemini Enterprise Agent Platform. It’s the first embedding model in the Gemini API that maps text, images, video, audio, and documents into a single embedding space, supporting over 100 languages.. In this post, we will explore the diverse use cases this unified model unlocks, from agentic multimodal RAG to visual search, and show you exactly how to start building them.. The model handles an expansive range of inputs in a single call: up to 8,192 text tokens, 6 images, 120 seconds of video, 143 seconds of audio, and 6 pages of PDFs. By projecting various modalities into a shared semantic space, developers can create rich experiences capable of “seeing” and “hearing” proprietary data. (Link to YouTube video—visible only when JavaScript is disabled.) The real strength of Gemini Embedding 2 lies in its capacity to handle interleaved inputs—like mixed text and images—within a single request.
Google Developers Blog