Building the predictive physics of multi-agent AI

The science of agentic scaling.

Teams of AI agents are deployed on intuition — sometimes they outperform a single model, sometimes they collapse into expensive groupthink. ILLUME turns that guesswork into law: when more agents help, when they fail, how they scale, and what they cost in energy.

2.1–3.4×
More tokens burned by debate for equal or worse accuracy
85.5%
Peak rate agents abandon correct answers to conform
N≤5→30
A 5-agent pilot predicts a 30-agent team
R²>0.99
Scaling-law fit across 38 model × task cells
The problem

Multi-agent AI is engineered by guesswork.

Compound AI systems orchestrate swarms of language models — debating, critiquing, voting — with no theory for when collaboration helps. Three structural failures hide inside production pipelines today.

Non-monotonic scaling

Adding agents can produce synergy, saturation, or collapse depending on topology and error correlation. Accuracy often peaks early, then degrades.

Hidden energy cost

Collaboration multiplies inference calls. Token consumption can rise 14× for trivial or negative gains — a thermodynamic inefficiency no one accounts for.

No native metrics

Reported gains often reflect variance reduction, not real reasoning. Without agent-native measures we can't tell intelligence from ensembling noise.

What we build

From heuristic engineering to a predictive physics of interaction.

We model an agent team as a thermodynamic system where intelligence is a function of information velocity and energy expenditure. Four pillars turn that into deployable tools.

01

Governing laws

Formal models of agentic teamwork on graph and hypergraph dynamics — capturing instant memory cloning and hallucination cascades that human-team theory can't.

02

Scaling regimes

Mapping the phase transitions — synergy → saturation → collapse — where sycophantic drift and coordination overhead consume the marginal agent.

03

Agent-native metrics

The Artificial Collective Intelligence (ACI) factor — grounded in information theory, not human IQ — to compare systems under compute parity.

04

Thermodynamic limits

The Energy–Utility Pareto frontier: the absolute minimum Joules required per bit of collective reasoning gain.

The scaling law

One equation that classifies any agent team.

We measure effective team size — how many of your N agents actually contribute independent evidence. A two-parameter fit from a tiny pilot predicts large-team behavior and whether adding agents will ever pay off.

R(N) = Neff/N = 1 / (1 + c(N−1)Nβ)

Two interpretable parameters: c sets the efficiency floor; β controls how fast added agents stop contributing. Estimated on N ≤ 5, it extrapolates to N = 30 at under 12% error.

β = 0Hard ceilingmore agents add nothing.
0 < β < 1Sublineardiminishing but real gains.
β ≥ 1Linearevery agent still counts.

The same form describes debate, self-correction, noise placebos and even classical human group studies — just different points in (c, β).

Peer-reviewed research

The work behind the science.

Grounded in controlled, open-weight experiments across multiple model families and reasoning benchmarks — not anecdotes.

Why ILLUME

Scaling laws gave training a shared unit. We give it to inference.

Predict before you spend

Estimate large-team behavior from a five-agent pilot — and know if more agents will ever pay off before you provision the compute.

Compare under compute parity

Agent-native metrics benchmark architectures fairly, separating real reasoning from variance reduction and lucky ensembling.

Optimize for Joules

Place every workflow on the Energy–Utility frontier and design collectives that are scalable, comparable, and sustainable.

Get in touch

Let's make agentic AI predictable.

Whether you deploy multi-agent systems, fund the science, or want to collaborate on the physics of collective intelligence — we'd like to hear from you.