Skip to content

LUMOS: Semantic OS Layer for Accessibility-Driven AI Agents

The gist: Semantic OS layer enables AI agents to interact via accessibility metadata instead of screenshots, reducing token costs and latency.

Researchers have developed LUMOS, a semantic interaction layer between AI agents and operating systems that provides machine-readable representations of UI elements and accessibility metadata instead of processing screenshots. This significantly reduces token costs, latency, and coordinate uncertainty in AI-driven computer-use agents.

The problem is fundamental: today’s operating systems expose interfaces for humans – pixels, icons, windows, mouse pointers – not for AI agents. Computer-use agents are therefore forced to interpret screenshots, process OCR outputs, and analyze visually ambiguous crop sections. This causes high token costs, increased latency, and uncertainty in coordinate specifications.

LUMOS (Language Model Unified Machine-Readable Operating-System Semantics) converts native accessibility metadata and browser UI structures into machine-readable semantic blueprints with stable identifiers, roles, names, values, bounds, and available actions. The layer also supports live grounding of semantic pointers: the system can query via OS automation APIs which UI element is located under or near the cursor. The LLM then acts through an accessibility-driven observe-act loop with constrained visible UI primitives – not with application-specific scripts.

For engineers, this means concretely: agents require less visual context processing when the OS already provides semantics. LUMOS does not replace visual agents, but reduces dependence on screenshot interpretation where structured accessibility interfaces are available. The approach sketches a path toward AI-native operating systems and machine-readable interaction layers that can make classic automation more efficient.


Source: arxiv.org · Published 28 June 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.2.

Share on: