Back to Blog
15 min readOriginal research

We tested the insider trading claim on Polymarket with Taleb-proof methods. The answer is stranger than insider trading.

Before you read this: test your own calibration in 2 minutes. No account needed. Then come back and see how you compare to the top 100 Polymarket whales.

Take the quiz

Over the past few years, as Polymarket has moved from niche to mainstream, I would see the same claim bubble up every few months. There has to be insider trading happening on Polymarket. What followed was some great work: Joshua Mitts at Columbia Law and Moran Ofir at Haifa have a 2026 working paper estimating $143 million in "abnormal profits" from informed trading on the platform based on trade timing. There's also a solid arXiv paper on the anatomy of Polymarket's 2024 election market. Blockchain firm Chainalysis identified at least 10 wallets belonging to a single French trader named "Theo" whose ~$85M in profits dominate the leaderboard.

What I didn't see was a Taleb-proof calibration audit. Brier scores. Rank correlations. Outlier-robustness. The question: are these people actually good forecasters, or is the leaderboard a concentration ranking of one directional bet on the 2024 election? So here they are.

I'm a fan of Taleb's work. I've tried to make this pass his scrutiny, because there is nothing worse than writing up an analysis and watching a quant with a grudge tear it apart on X. You be the judge.

Five numbers to remember

  • 4.66xThe worst-calibrated top-100 Polymarket whales (Brier score above 0.25, i.e. worse than random) earn 4.66 times the median profit of the better-calibrated whales. Polymarket's biggest earners are its worst forecasters.
  • +0.61Spearman rank correlation between Brier score and realized profit: +0.608 (p = 1e-7). Bootstrap 95% CI: [+0.41, +0.75]. Kendall's tau confirms at +0.43. This correlation gets stronger when you drop the top 10 by profit (+0.72), ruling out outlier-driven noise. On Polymarket, worse calibration predicts bigger profits.
  • 21/9521 of 95 wallets with resolved positions scored worse than a coin flip (Brier above 0.25). Several of them are in the top 15 by realized profit.
  • 69.8%Median single-event concentration across the top 100. The average whale earned the majority of their money from one market. 20 of the top 100 had a 2024 election market as their single biggest profit source, together accounting for $28.2M (39% of aggregate profit).
  • 8 / 8After removing the four wallets publicly attributed to "Theo" by Chainalysis, 8 wallets still show a same-day sweep pattern on popular-vote markets in a narrow window around election day. Combined: $7.2M risked, $12.3M captured, IQR of 8 days centered on October 31, 2024.

All numbers are reproducible from public Polymarket APIs. The full 47-column CSV is linked at the end.

What we did

We wanted to test whether Polymarket's all-time profit leaderboard rewards forecasting skill. For each of the top 100 wallets we:

  1. Pulled the full trade-fill history from Polymarket's public data API (capped at 2,000 fills per wallet).
  2. Resolved each fill against the matching market via Polymarket's gamma-api events endpoint.
  3. Aggregated fills into unique positions (keyed on conditionId and outcome index), computing volume-weighted entry prices and total risked capital per position.
  4. Computed per-wallet Brier score (with Murphy decomposition), realized P&L, single-event concentration ratio, and volume-weighted average winning entry price.

95 of the 100 wallets had at least one resolvable position. Total realized P&L across those 95: $72.2M. Total unique resolved positions: 3,651.

A note on fat tails before we get into findings

The profit distribution across the top 100 is fat-tailed. The Hill estimator (a standard tool for measuring tail heaviness; see Taleb, Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications, 2020, Chapter 4) returns alpha ~ 1.6 at k=20, which puts us in Extremistan territory: infinite variance is plausible, averages are unreliable, and any statistical test designed for Gaussian-distributed data will systematically mislead.

We therefore use rank-based correlations (Spearman, Kendall) instead of Pearson, conditional expectations (median profit by calibration group) instead of means, and outlier-stripping robustness checks instead of relying on a single correlation. Everything below follows that protocol.

Finding 1: The leaderboard is a concentration ranking

Before we get to anything about calibration or insiders, the most important fact about the top 100 is how concentrated their P&L is on a single event.

We computed, for each wallet, the share of total realized P&L from their single largest event. The median across all 67 wallets with positive realized P&L: 69.8%. More than 28 of those 67 got over 80% of their money from one market.

20 of the top 100 wallets had a 2024 US election market (Presidential Election Winner, Popular Vote Winner, or state-level results) as their single biggest source of realized P&L. Their combined biggest-event P&L: $28.2M, which is 39% of total realized profit across the 95 valid wallets.

Scatter plot showing single-event P&L concentration for 67 Polymarket whales. Median concentration is 69.8%. 28 of 67 wallets got over 80% of their money from one market. Theo's 4 wallets labeled in blue-teal.
Each dot is a top-100 profit wallet. Dots on the right side made nearly all their money on a single market. Theo's four known wallets are labeled in blue.

If you're reading this leaderboard as a ranking of forecasting skill, you're reading the wrong leaderboard.

Finding 2: Worse calibration predicts bigger profits

A Brier score is the mean squared error between your probability forecasts and outcomes. A forecaster who always predicts 50% scores exactly 0.25 on a balanced set of binary questions. For context: NWS weather forecasters score below 0.10 on next-day precipitation. Good Judgement Project superforecasters average about 0.13 on geopolitics. You can test your own Brier score in 2 minutes with our Calibration Challenge.

The top 100 Polymarket whales: mean 0.183, median 0.199. Better than random, but only modestly. And 21 of 95 wallets score worse than 0.25, meaning their probability-weighted forecasts are worse than a coin flip. Several of those are in the top 15 by profit.

Now the real finding. We computed the Spearman rank correlation between Brier score and realized profit. In a market that rewards forecasting skill, this should be negative: better calibration, bigger profits.

It isn't.

Scatter plot showing Spearman r = +0.608 between Brier score and realized profit for 64 Polymarket whales. Bootstrap 95% CI: [+0.41, +0.75]. Theo's four wallets labeled in blue-teal. The correlation gets stronger when top-10 profit wallets are removed.
Brier score vs realized profit (log scale) for 64 whales with both metrics valid. Spearman r = +0.608. The correlation strengthens when you drop the top 10 by profit.

Spearman r = +0.608 (p = 1e-7). Kendall's tau confirms at +0.43 (p = 4e-7). Both rank-based and robust to monotonic transformations. Both highly significant.

Does this survive outlier removal?

Drop top-k by profitSpearman r
Drop top-1+0.61
Drop top-5+0.63
Drop top-10+0.72

The correlation gets stronger when you remove the biggest profit wallets. This is the opposite of what outlier-dependent correlations do. The signal is genuinely a property of the middle of the distribution, not one trader pulling the line.

The Taleb-proof version of the finding

Rather than relying on a single correlation coefficient, here is the conditional-expectation version:

Median profit (Brier > 0.25): $1,472,827

Median profit (Brier ≤ 0.25): $316,053

Ratio: 4.66x

The worst forecasters on the leaderboard earn nearly five times what the best forecasters earn. This is a nonparametric, rank-free, outlier-robust statement about conditional expectations in a fat-tailed distribution.

Finding 3: The October 2024 popular-vote cluster

This is the finding that sits on the boundary between pattern recognition and inference.

First, the context you need. Chainalysis published in November 2024 that at least 10 wallets on Polymarket were controlled by a single French former bank trader publicly known as "Theo." The confirmed wallets include four that sit in our top 100: Theo4 (rank 1, $22M), Fredi9999 (rank 2, $16.6M), PrincessCaro (rank 8, $6.1M), and Michie (rank 30, $3.1M). Theo has been interviewed by WSJ, Bloomberg, and 60 Minutes. His methodology is public: he commissioned private YouGov neighbor-polling in swing states, decided the market was mispricing Trump, and spread orders across multiple accounts to manage market impact. This is the opposite of insider trading. It is a sophisticated quant with a proprietary data advantage.

With Theo's wallets accounted for, we looked at what remains.

The residual cluster

After removing Theo's four confirmed wallets, 8 wallets remain that placed concentrated bets on the popular-vote markets in a narrow window around election day. They bet on three linked markets: "Will Donald Trump win the popular vote?", "Kamala Harris wins the popular vote?" (bet NO), and "Will a Republican win the popular vote and the Presidency?"

Timeline chart showing winning positions on Trump/Harris popular-vote markets from October to November 2024. Theo's 4 wallets (blue-teal) at top, 8 residual wallets (orange-red) below. BetTom42's entry on November 5 at 28% odds is the latest and cheapest.
Each bubble is a winning position on a popular-vote market. Bubble size = USD risked. Theo's wallets (blue) at top. Residual 8 wallets (orange) below.
WalletDatePosAvg entryRiskedEdge captured
deetownOct 10135.1%$1.14M$2.20M
RepTrumpOct 22339.3%$1.51M$2.19M
BabaTrumpOct 24339.0%$1.21M$1.84M
Len9311238Oct 29-Nov 1336.8%$549K$963K
JenzigoOct 31337.4%$491K$824K
alexmultiOct 31337.1%$1.41M$2.36M
mikatrade77Nov 1-4337.3%$575K$1.10M
BetTom42Nov 4-5328.2%$325K$829K
Combined$7.20M$12.31M

IQR of first-fill dates: 8 days, centered on October 31. Every one of these wallets placed multiple six-to-seven-figure positions on the same day across linked popular-vote markets, at implied odds of 28% to 39%.

BetTom42 is the most striking: three positions entered on election eve and election day itself at 28% implied odds. The market was giving Trump a roughly 28% chance of winning the popular vote on the day he won it.

Important caveats

Chainalysis identified at least 10 wallets belonging to Theo, and we have confirmed only 4 by public name. Up to 6 more of these 8 residual wallets could be additional Theo accounts that we cannot verify without on-chain funding-flow analysis. The cluster is a real descriptive finding, but the number of truly independent traders in it is unknown.

Notably, Chainalysis did not link BetTom42 or alexmulti to the Theo cluster, which means at least two of the eight residual wallets are confirmed independent.

What would a regulator think?

We are not accusing any specific wallet. We are describing a pattern in public data. But Polymarket received CFTC approval as a Designated Contract Market (DCM) in November 2025, which means two provisions of the Commodity Exchange Act are now live:

  • CFTC Rule 180.1 prohibits the use of any manipulative or deceptive device in connection with commodity contracts. Trading on material nonpublic information is relevant, though the legal framework for prediction markets is still evolving and enforcement mechanisms remain untested (see Mitts & Ofir, 2026, for a detailed analysis of the regulatory gaps).
  • Commodity Exchange Act sections 4c and 6c empower the CFTC to investigate traders whose order flow shows signs of information advantage.

A compliance analyst at a DCM looking at the residual cluster would see: 8 accounts, concentrated on one event, entry prices at 28-39% on the side the market priced as a 2-to-1 underdog, combined $12.3M captured. That pattern gets a review. Not because it proves anything, but because precautionary-principle regulatory frameworks (see Taleb, Silent Risk: Lectures on Fat Tails, (Anti)Fragility, and Asymmetric Exposures, 2015) treat patterns as actionable when the downside of ignoring them is asymmetric.

Three explanations, all consistent with the data:

  1. Independent contrarian conviction. Each of these 8 wallets independently decided the market was mispricing Trump's popular-vote chances. They sized up, were correct, and kept their winnings. Plausible for one or two wallets. For eight, in the same week, on the same market, it stretches.
  2. Coordinated or herded positioning. A small network made the same bet at roughly the same time, either by direct coordination or by following observable flow (e.g., watching Theo's known accounts move). Legal on a prediction market, but statistically distinct from (1).
  3. Information-based trading. Someone had nonpublic data (internal polling, early exit polls, voter-file analytics) that gave them confidence the market lacked. This is what Rule 180.1 would address at a DCM, assuming the framework is interpreted broadly enough to cover prediction markets.

The data alone cannot distinguish between these. Distinguishing them requires evidence we do not have: per-fill timing relative to specific information-disclosure events, wallet-to-identity attribution, and the actual source of each trader's conviction.

What we cannot conclude

The findings above describe patterns. They do not prove any specific wallet violated any specific rule. Several limitations are worth naming explicitly:

  1. Survivorship bias. The top-100 profit leaderboard is the right tail of a fat-tailed distribution. Under any null hypothesis, someone will be in the top 100 by luck alone. The correct comparison would be this top 100 vs a random sample of all ~50,000+ Polymarket accounts. We don't have that data.
  2. Small-sample Brier is noisy. Some wallets have as few as 3-5 resolved positions. At a strict threshold of 25+ positions, the "worse than random" count drops from 21/95 to 6/48. At 50+ positions, it drops to 2/22. The calibration-profit correlation survives every threshold (Spearman r stays above +0.58 even at 50+ positions), but the individual-wallet claims get noisier.
  3. We cannot see fill-level timing relative to information events. A wallet that bought "Trump wins popular vote" in July 2024 at 25% is doing something very different from a wallet that bought it on November 4 at 25%. We treat them identically in the aggregate. This matters for the insider question.
  4. Correlated outcomes. Multiple bets on the same event (Trump wins popular vote, Trump wins presidency, Republican wins Wisconsin) are not independent. A wallet with 4 "wins" on correlated election outcomes has an effective sample size closer to 1-2, not 4. Any per-wallet significance test that assumes independence is structurally flawed.
  5. One wallet is not one trader. Some accounts are likely funds, DAOs, or syndicates. Others are individuals.
  6. We are not accusing anyone. None of the named wallets are alleged to have done anything illegal.

Related work

Our calibration audit complements three existing lines of research on Polymarket:

  • Mitts & Ofir (2026), "From Iran to Taylor Swift: Informed Trading in Prediction Markets." Timing-based insider-trading detection using a five-signal composite score on 93,000+ markets and 50,000 wallets. Found $143M in abnormal profits. Their research question is "who traded on insider info?" Ours is "does the leaderboard reward forecasting skill?" Different methodology, complementary findings.
  • Tsang & Yang (2026), "The Anatomy of Polymarket: Evidence from the 2024 Presidential Election." Market-level microstructure analysis. Volume decomposition, Kyle's lambda, three-episode framework. Does not compute per-wallet calibration or profit metrics.
  • Reichenbach & Walther (2025) analyzed 124 million Polymarket trades for accuracy, skill, and bias at the trader level, finding that only 30% of traders are profitable and that trading skill is persistent. Their skill metric is profit persistence, not Brier calibration.

None of these works compute per-wallet Brier scores, calibration-vs-profit rank correlations, or single-event concentration ratios. Our contribution sits in that gap.

Methodology (reproducible from public APIs)

Everything in this post comes from public Polymarket endpoints. No credentials, no scraping of logged-in pages. Anyone with Python and an afternoon can reproduce the numbers.

  1. Wallet selection. Top 100 by all-time profit from polymarket.com/leaderboard/overall/all/profit. The list starts with Theo4 at rank 1 ($22.05M) and ends with Dropper at rank 100 ($1.03M).
  2. Trade fills. For each wallet: data-api.polymarket.com/activity?user=<addr>&type=TRADE with pagination, capped at 2,000 fills per wallet.
  3. Resolution lookup. Each fill carries a conditionId and eventSlug. We resolve via gamma-api.polymarket.com/events?slug=<eventSlug>, which returns the parent event with nested markets. We match each fill's conditionId to the nested markets to get final outcome prices.
  4. Position deduplication. Fills aggregated into unique positions keyed on (conditionId, outcome_index). Volume-weighted average entry price per position. Total unique resolved positions: 3,651. Without this step, the same $1M bet appears 80 times and calibration metrics become meaningless.
  5. Brier scoring. For each resolved position: implied probability = volume-weighted entry price. Outcome = 1 if resolved in wallet's favor, else 0. Per-wallet Brier = mean squared error across all resolved positions. Also computed Murphy decomposition into reliability, resolution, uncertainty.
  6. Rank correlations and robustness. Spearman and Kendall rank correlations between Brier and realized profit. Bootstrap 95% CI (10,000 resamples). Outlier-stripping (drop top-k by profit, recompute). Conditional expectations by Brier category. Hill alpha for fat-tail verification (Taleb, Statistical Consequences of Fat Tails, 2020).
  7. Concentration metric. For each wallet: the single event contributing the largest absolute realized P&L, and its share of total wallet P&L.

How to verify a single wallet yourself

Pick any wallet from the CSV. Say Theo4 at 0x56687bf447db6ffa42ffe2204a05edaa20f55839. Run:

curl "https://data-api.polymarket.com/activity?user=0x56687bf447db6ffa42ffe2204a05edaa20f55839&type=TRADE&limit=5"

You'll get JSON with the first 5 fills, including slug, eventSlug, conditionId, price, side. Then resolve any of those events:

curl "https://gamma-api.polymarket.com/events?slug=presidential-election-winner-2024"

Look for outcomePrices in the nested markets array. Match by conditionId. The rest is arithmetic.

The data

Full dataset: 100 wallets, 47 columns. Brier decomposition, win rates, entry-price distributions, event concentration, edge captured, consensus-defying rates, biggest-event identifiers. Use it however you want. If you find something we missed, email research@convexly.app.

Download polymarket-whales-data.csv (32 KB)

We plan to publish updated whale audits quarterly. Get the next one in your inbox:

Bottom line

The Polymarket profit leaderboard is not a ranking of the platform's best forecasters. It is a ranking of whoever bet biggest on a small number of concentrated events, with the 2024 US Presidential Election dominating the list. The worst-calibrated whales earn nearly 5x the median profit of the best-calibrated whales, and this correlation is robust under every test we ran.

Inside that leaderboard is one publicly-documented trader (Theo) who controls at least 10 wallets and earned ~$85M through a legitimate proprietary polling advantage. After removing Theo's known wallets, 8 wallets remain in a narrow cluster on the popular-vote markets, placing concentrated same-day sweeps at 28-39% implied odds in the weeks around election day. Combined captured edge: $12.3M. Whether any of them had nonpublic information is a question we cannot answer from trade data alone. But the pattern exists, the data is public, and the CSV is one click away.

Polymarket is now a DCM under the CFTC. Under the precautionary frameworks that govern regulated exchanges, patterns like this one get investigated. Whether anyone investigates is a question for Polymarket's compliance team and the CFTC. We're just the ones who ran the numbers.

Convexly measures calibration, not P&L

I built Convexly because most decision-tracking tools fall back on win rate (noisy) or P&L (confuses forecasting skill with bet sizing). The Brier scoring math in this post is the same math that runs inside the Convexly dashboard. Log a decision, set your probability, and it will calibrate you against reality. Free tier includes unlimited Brier scoring on up to 100 decisions.

Published by Convexly Research. Data aggregated and compiled using Anthropic's Claude. All statistical methods, interpretations, and editorial decisions are human-authored. The full dataset and methodology are provided for independent verification. We have no position on Polymarket and no financial interest in any named wallet.