Brier Score for Prediction Markets
A Brier score measures how closely a trader's stated probabilities match the outcomes that actually happened. On prediction markets, it is the mean squared error between a position's implied probability at entry and the market's resolved outcome (0 for no, 1 for yes). Lower is better: 0 is perfect, 1 is systematically backwards. On Convexly's 8,656-wallet Polymarket reference cohort, the median Brier is approximately 0.188 and the top quartile is below 0.105.
Formula
For N predictions with probabilities p_i and binary outcomes o_i:
BS = (1 / N) · Σ (p_i − o_i)²
Each squared error is bounded between 0 and 1. Brier is a proper scoring rule: it incentivizes honest probability assignment. A trader who is systematically overconfident (always says 90% when they are right 60% of the time) accumulates Brier faster than a trader who states 60% and resolves at 60%.
Empirical distribution on Polymarket
From the Convexly Research 10,000-wallet study (8,656 with five or more resolved positions, 582,921 total resolved positions):
- Median Brier score: ~0.188
- 25th percentile (top quartile): ~0.105
- 75th percentile: ~0.247
- Top decile: below 0.08
Context: always predicting 0.5 on a 50/50 market yields Brier 0.25 exactly. Good Judgment Project superforecasters average near 0.10 on broad geopolitical questions. The Polymarket leaderboard median is worse than superforecasters but skewed favorably at the top.
Why Brier does not predict Polymarket profit
Across the full 8,656-wallet leaderboard, Spearman rank correlation between Brier score and signed log PnL is only +0.148. Among the top 100 wallets by profit, the correlation actually flips: worse-calibrated whales earn more (Spearman +0.42). The reason is fat tails. Hill alpha on realized Polymarket PnL is 1.28, meaning variance is formally infinite on this distribution. One large concentrated bet dominates most wallets' PnL outcomes, and a well-calibrated forecaster who never concentrates never captures the upside.
Convexly's Edge Score V3b adds two pillars to calibration: conviction (concentration of PnL in a wallet's single largest event) and discipline (low resolved-position count). The composite achieves out-of-fold Spearman +0.514 against signed log PnL where Brier alone achieves +0.147. Fama-French bootstrap null with 10,000 PnL permutations places the observed Spearman outside every permuted sample at p < 0.0001.
Computing your own Brier
Two paths:
- Paste any Polymarket wallet at the wallet analyzer. Free, no signup, 15 to 30 seconds.
- Enter a list of probability and outcome pairs at the Brier score calculator. Client-side compute, nothing saved.
Both tools rank your result against the 8,656-wallet reference cohort so you know where a score sits on the real distribution, not a hypothetical one.
Open methodology
All cohort statistics cited above are published in the 13-page Edge Score Methodology V1 paper with frozen coefficients, pre-registered validation experiments, and full Fama-French bootstrap null.
Read the methodology paperRelated questions
Is there a free Polymarket calibration tool?