Zum Inhalt

Accelerating on-device AI: A look at Arm and Google AI Edge optimization

Chintan Parikh, Product Manager. Dillon Sharlet, Software Engineer. Na Li Engineering Manager. Gian Marco Iodice, Distinguished Engineer, Arm. AI is advancing from basic text-based interactions to sophisticated multimodal features, including on-device generation of images and audio, which allows developers to build deeply personalized experiences for users. While CPUs have long been the default choice for inference, executing large, complex models at the edge has traditionally forced a compromise between high-latency CPU processing and fragmented, specialized accelerators. Arm Scalable Matrix Extension 2 (SME2) removes this tradeoff by embedding a dedicated matrix-compute unit directly within the CPU cluster. This architecture allows the CPU to serve as a high-performance AI accelerator, providing up to 5x faster inference for the matrix-intensive workloads that power generative AI. Developing on-device AI for Arm-based hardware is greatly simplified thanks to Google AI Edge, a fully integrated stack built to streamline the entire development process. LiteRT automatically utilizes Arm SME2 at runtime by integrating with XNNPACK and Arm KleidiAI.

  Google Developers Blog