Skip to content

Anthropic’s Arbor: AI Agents Conduct Autonomous Research Cycles

Share on:

At a glance: Arbor coordinates autonomous AI agents via persistent hypothesis trees and achieved 2.5× better results than Codex and Claude Code on six research tasks.

Anthropic presents Arbor, a framework that enables AI agents to independently conduct research loops of exploration, experimentation, and abstraction over extended periods. Compared to Codex and Claude Code, Arbor achieves 2.5× better results on six research tasks.

Arbor combines three components: a long-lived coordinator, short-lived executor processes, and a data structure called Hypothesis Tree Refinement (HTR). The coordinator steers global research strategy across the hypothesis tree, while executor processes implement and test individual hypotheses in isolated work environments.

The system persistently stores hypotheses, artifacts, evidence, and distilled insights in a tree structure and links them over time. When executor processes return results, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and integrates verified improvements. This transforms autonomous research from a sequence of isolated attempts into a cumulative process in which strategy, execution, and evidence are carried forward over time.

Arbor was evaluated under the “Autonomous Optimization” (AO) setting, in which an agent improves an initial research artifact through iterative experiments without step-by-step human intervention. On six real research tasks from model training, harness engineering, and data synthesis, Arbor achieved the best held results on all six tasks. On MLE-Bench Lite, Arbor with GPT-5.5 achieved an any-medal score of 86.36 percent, the strongest result in comparison.

For CTOs, this development is relevant because it demonstrates how autonomous agents can systematically refine hypotheses over extended research phases while replacing human intuition with structured, evidence-based decision-making. The tree refinement model enables insights to be transported between experiments rather than treating each attempt as a fresh start.


Source: arxiv.org · Published 9 June 2026
Lumi AI News — AI-assisted curation pursuant to Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.6.5.

Share on: