Written byandJune 23, 2026

Agent swarms are control planes, not group chats

Spawning agents is the easy part. The real work is making their jobs, context, state, tools, and machines line up behind one result the user can trust.

Watercolor agent swarm sharing work across machines — Many workers. One shared table.

One agent can be impressive. A hundred agents can give you a hundred mediocre answers faster. If they all read the same context, ask the same questions, launch the same commands, and argue in the same transcript, the system does not become smarter. It becomes louder.

The goal we care about is sharper than "more agents." If a user adds more useful input, more compute, or more agents, the output should get better, faster, or both. That is the larger bet we are making. Cloud GPUs are just the first resource where the problem becomes obvious, expensive, and worth solving carefully.

Swarm unitTyped work, not chat

CoordinationEvents, leases, artifacts

Context rulePointers over prompts

Resource ruleCompute follows ownership

A swarm is not a bigger chat room

The first mistake is making every worker a copy of the same assistant. It feels powerful in a demo because the screen fills with activity. Then you look closely and see five agents producing five versions of the same thought.

We made the workers more like a small team. A researcher chooses an approach. A coder turns the plan into running code. A reviewer looks for the thing everyone else missed. A data worker checks the dataset. An ops worker watches hardware and cost. The important part is not the title. It is the anti-scope. Each role knows what it should not do.

Underneath that, the unit of work has to be typed. A real task can say who owns it, what it depends on, what blocked it, how many attempts it has used, which artifacts it produced, and whether the lease is still alive. That is very different from "agent two, go think about this."

Orchestration rule

A useful agent needs a job, a scope, an artifact, and a stop condition. Without all four, parallelism turns into duplicated effort.

That rule sounds simple, but it changes the whole product. The system has to split the user's request into work that can actually be owned. "Make this model better" is not a good swarm assignment. "Audit the dataset for leakage," "try the small baseline," and "review the training loop for device bugs" are.

Context is a tool, not a storage unit

The easiest way to coordinate agents is to paste everything into every context window. That works for a toy problem. It breaks when the workspace has papers, logs, datasets, generated files, old attempts, and three workers producing new findings while the next worker is still reading.

The question we ask is not "how much can we fit?" It is "what does this worker need to act well right now?" A reviewer needs the diff, the claim being made, and the test evidence. An ops worker needs runtime health, budget, queue state, and cleanup status. A data worker needs sample counts, transformations, and leakage checks. Giving all of them the full transcript is slower and usually less accurate.

So the context pack becomes part of the product. It carries the goal, the current plan, the relevant files, the constraints, and the receipts from earlier workers. Everything else stays outside the window until it is needed. A summary only earns its place if it helps the next worker make a better move.

That pushed us toward pointers instead of giant prompts. The worker can search local context history, retrieve a relevant file header, inspect a saved memory, or continue from a checkpoint after a context reset. The agent still reasons in language, but the platform decides which evidence deserves to enter the room.

The practical trick

Send less context, but make it sharper. The worker should not feel under-informed. It should feel unburdened.

The shared memory has to live outside the agents

An agent's context window is a working surface. It is not a database. If the swarm learns something important, that finding has to land somewhere another worker can inspect without replaying the whole conversation.

We use a few different shapes for that. Research plans keep the work decomposed into nodes. Blackboard entries hold small shared notes, configs, and decisions. Files in the workspace hold anything larger. Session events record what happened, which worker did it, what tools ran, and whether the episode succeeded, failed, or landed somewhere in the middle.

The most useful shape has been the append-only event ledger. We can record an agent changing state, a task changing state, a lease being granted, a routing decision, or a topology update. Then the current swarm is not whatever the latest transcript remembers. It is a snapshot rebuilt from durable events.

That gives us a useful property: the swarm view can be rebuilt from receipts. Workers can be queued, running, blocked, done, or failed. A conductor decision can say which episodes it was based on. The UI can show a timeline of the work instead of asking the user to read every message.

The UI has its own version of the same problem. If hundreds of workers and tool calls arrive as raw events, React cannot politely rerender on every packet. We fan out from one shared run-event bus and coalesce updates into frame-sized batches, so the interface can stay calm while the system is busy.

Resource allocation is part of the intelligence

Cloud GPUs made this problem impossible to ignore. If ten workers all decide to launch expensive hardware, the system is not intelligent. It is just burning money in parallel. If the platform refuses to use more hardware when the work really is independent, then it is leaving speed on the table.

That is why we treat compute as something the swarm negotiates through the control plane. Some work belongs on the primary workspace. Some work can run as a short task worker. Some work needs a resident GPU. Some work should run serverlessly and vanish. The same idea can stretch beyond cloud GPUs: a laptop, an edge box, a robotics rig, or a lab machine can all become resources an agent knows how to request, use, and release.

Once you look at it that way, dispatch starts looking less like chat routing and more like a scheduler. You need rate budgets, queues, retry rules, cancellation, dead-letter behavior, and a supervision tree so one bad branch does not poison the whole run. The model call is only one part of the system.

The hard question is not "can we run more things?" It is "will running this in parallel make the final answer better?" If the work cannot be split cleanly, more compute adds merge cost. If the outputs do not have receipts, more agents add confusion. If the cleanup path is weak, more hardware adds risk.

The merge is the product

The part people picture is the swarm spreading out. The part that matters is the swarm coming back together.

We learned to make every lane return something mergeable: a patch, a table, a failing test, a metric, a file, a decision, or a clear blocker. We learned to let independent lanes keep moving when one lane blocks. We learned to keep a skeptical role in the loop, because parallel work is only useful if somebody checks whether the pieces still fit.

This is where what we're building gets bigger than a GPU product. The product is the environment around the agents: the place where they can share context, control machines, leave evidence, manage cost, and build on each other's work. We started with cloud GPUs because they make the problem concrete. The same foundation is what lets agents use any resource well.

We are not pretending the full swarm problem is solved. The next layer is harder: deeper dependency graphs, stronger lease recovery, task-level model and provider routing, and better ways for a planner to own groups of work without turning into a single bottleneck. But the direction is clear. More agents only matter when the environment makes their work compound.