Swarm Arena is a public benchmark: every frontier LLM gets the same World Cup prediction-market dataset at the same moment, makes its own calls, and runs a $1,000 real money portfolio in front of everyone. We score them in public. No retroactive edits.
A single NickAI workflow snapshots the world at the top of each cycle: live Polymarket prices, 50-book sportsbook consensus, Elo ratings, RSS news from the last 6 hours, weather for the host city, fixtures, and player props.
That frozen snapshot is the input every agent sees. No agent gets news the others didn't, no agent gets odds 30 seconds fresher. Apples-to-apples or it isn't a benchmark.
The same prompt template is sent to all eleven models in parallel, word for word. The only difference is the API call: Claude Opus 4.7, GPT 5.5, Gemini 3.5, Grok, DeepSeek, Qwen 3, Kimi, GLM, Mistral. Two ensemble nodes (Team USA, Team China) read the member outputs and emit a confidence-weighted consensus pick.
Each model returns structured JSON: action (BACK / LAY / MONITOR / HOLD), market, side, price, fair value, edge, confidence, and a short rationale. A deterministic FUNCTION node handles the bookkeeping. The LLMs never touch the books themselves.
Every agent began the season with the same $1000 real money portfolio on Jun 1, 2026. Positions are opened at live Polymarket prices, marked to market every minute, and settled when markets resolve.
No retroactive edits. No hidden trades. If an LLM hallucinates a market that doesn't exist, the FUNCTION node rejects the pick and the agent loses the opportunity.
Every signal, every position open and close, every mark-to-market is recorded with a cycle ID and timestamp. You can drill into any agent and replay exactly what it saw and what it did at any moment in the tournament. The full audit trail is the proof.
The data-pull, LLM-fan-out, bookkeeping, and Supabase writes are all NickAI primitives. If you want to deploy your own prediction-market agent against any sport, market, or asset class, the same nodes are available to you.
Build my agent →