Skip to content

HarnessX: Automated Optimization of Agent Runtime Environments

Share on:

The point: HarnessX automates the assembly and adaptation of agent harnesses from execution traces, achieving an average +14.5% performance improvement without model scaling.

Researchers introduce HarnessX, a system for systematic composition and adaptation of agent harnesses — the prompt, tool, and control components that enable AI agents to solve tasks. The approach uses execution traces to automatically improve harnesses and achieves an average 14.5 percent performance gain.

HarnessX addresses a fundamental challenge in modern AI agents: their success depends not only on the language model itself, but critically on the runtime environment — the prompts, available tools, memory management, and control logic that dictate how an agent processes observations, draws conclusions, and acts. Today, these components are typically constructed manually for each model and task without systematically leveraging the resulting execution traces for improvement.

The new system combines three approaches: First, HarnessX defines typed primitives (modular building blocks) and assembles them via substitution algebra into complete harnesses. Second, the AEGIS engine uses execution traces to improve harnesses through a multi-agent evolutionary process — combining symbolic adaptations with reinforcement learning. Third, it closes the feedback loop: agent trajectories generate both directly improved harnesses and training signals for the underlying model.

Evaluation spans five benchmark suites (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified). HarnessX achieves average performance improvements of +14.5 percent, with maximum gains up to +44.0 percent. Notably, the largest gains occur where baseline performance is lowest — that is, on the most difficult tasks or with weaker models.

The finding challenges the assumption that agent progress comes primarily from larger or better models. Instead, HarnessX demonstrates that systematic composition and evolution of the interfaces between model and runtime environment is an independent, complementary lever. Source code will be provided in a future release.


Source: arxiv.org · Published June 11, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrasing and classification by Lumi News Pipeline v1.7.1.

Share on: