Chintan Parikh, Product Manager. Dillon Sharlet, Software Engineer. Na Li Engineering Manager. Gian Marco Iodice, Distinguished Engineer at Arm. AI is advancing from basic text-based interactions to sophisticated multimodal features, including on-device generation of images and audio, which allows developers to build deeply personalized experiences for users. While CPUs have long been the default choice for inference, deploying large, complex models at the edge has traditionally forced a compromise between high-latency CPU execution and fragmented, specialized accelerators. Arm Scalable Matrix Extension 2 (SME2) removes this tradeoff by embedding a dedicated matrix-compute unit directly into the CPU cluster. This architecture allows the CPU to serve as a powerful AI accelerator, providing up to 5x faster inference for the matrix-intensive tasks that power generative AI. Developing and running on-device AI on Arm-based hardware is significantly simplified thanks to Google AI Edge, a comprehensive integrated stack that streamlines the entire development process. LiteRT automatically utilizes Arm SME2 at runtime by integrating with XNNPACK and Arm KleidiAI.
Google Developers Blog