FAPO: Autonomous Optimization of Multi-Step LLM Pipelines with Claude Code

Share on:

In brief: FAPO automates the optimization of multi-step LLM pipelines through Claude Code, first suggesting prompt adjustments and escalating to chain modifications only when structural bottlenecks are identified, achieving gains up to +33.8 pp in complex scenarios.

Researchers present FAPO, a framework that leverages Claude Code to automatically optimize multi-step language model pipelines. The system identifies bottlenecks not only in prompts but also in the architecture of processing chains.

FAPO (Fully Autonomous Prompt Optimization) addresses a fundamental problem: multi-step pipelines with language models often fail not because of individual prompts, but because of interactions between retrieval, reasoning, and formatting steps. Traditional prompt-only optimization systematically overlooks these chain-level failures. The new framework allows Claude Code to examine, evaluate, and iteratively improve an LLM pipeline within a standardized codebase.

The optimization process follows a defined strategy: FAPO evaluates the pipeline, inspects intermediate steps, diagnoses failures, proposes limited changes, and validates variants against a score function. The system first attempts prompt adjustments. Only when these prove insufficient and attribution analysis identifies a structural bottleneck does FAPO modify the chain structure within the permissible scope.

In evaluations across six benchmarks and three task models, FAPO outperforms the baseline GEPA in 15 of 18 comparisons. In eleven model-benchmark combinations, gains fall outside standard deviation ranges; the average gain is +14.1 percentage points. In the six HoVer and IFBench scenarios where prompt-first search led to structural changes, FAPO wins all six with an average of +33.8 percentage points.

FAPO demonstrates particular relevance for security tasks: on CTIBench-RCM, a CVE-to-CWE classification task, prompt-only optimization increases test accuracy by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. For engineers, this means complex multi-step systems can in the future be systematically debugged and optimized without requiring external architecture changes.

Source: arxiv.org · Published June 16, 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 EU AI Act. Paraphrasing and classification through Lumi News Pipeline v1.7.1.

Share on:

FAPO: Autonomous Optimization of Multi-Step LLM Pipelines with Claude Code

Lumi AI News

Legal

Topics