HarnessX automates the assembly and adaptation of agent harnesses from execution traces, achieving an average +14.5% performance improvement without model scaling.
Agent-EvalKit automates the evaluation of AI agents through structured test-case generation, observability instrumentation, and combined code and LLM-based metrics directly in the development environment.
Aligning router rows with the principal singular directions of their associated expert matrices improves the efficiency and stability of Mixture-of-Experts models.
The Claw-SWE-Bench framework demonstrates that adapter design is critical for code agents: with a minimal adapter, OpenClaw achieves 19.1% Pass@1, with a complete adapter 73.4%.
DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.
DiffusionGemma replaces the traditional sequential token-generation process with parallel denoising of 256-token blocks, enabling faster inference and improved problem-solving capabilities for complex tasks.
AI tools are assistance instruments with transparency gaps and hallucination risks, while low-code reduces complexity through structured, auditable components — both can work in a complementary manner.
FlowTracer assigns credit to tokens based on their measured information throughput in the attention graph rather than treating all equally, yielding consistent performance gains in reasoning tasks.
Nine Claude Code releases in ten days, Google I/O declares the agent era, two valuable long-reads on architecture and evaluation of long-running agents, plus a sobering IT benchmark.