What is a Brier score for prediction markets?

A Brier score is the mean squared error between a trader's stated probabilities and the actual binary outcomes (0 for no, 1 for yes). For N predictions with probabilities p_i and outcomes o_i: BS = (1/N) sum over i of (p_i - o_i) squared. The score ranges from 0 (perfect calibration) to 1 (systematically wrong in the wrong direction). On prediction markets, the implied probability of a trade is the Brier input and the resolution is the outcome.

What is a good Brier score on Polymarket?

On Convexly's reference cohort of 8,656 Polymarket wallets with at least five resolved positions, the median Brier score is approximately 0.188 and the 25th percentile is approximately 0.105. A Brier below 0.15 is strong; below 0.10 is elite on this cohort. Good Judgment Project superforecasters average Brier near 0.10 on broad geopolitical questions, which is consistent with Polymarket's top decile.

Why does Brier score alone barely predict profit on Polymarket?

Because Polymarket PnL is fat-tailed (Hill alpha = 1.28 per Convexly V1 paper), single large events dominate realized profit. Across the full 8,656-wallet leaderboard, Spearman rank correlation between Brier and signed log PnL is only +0.148. Calibration is the marginal ability to not mis-state probabilities; profit is the joint outcome of calibration, position sizing, and concentration. Convexly's Edge Score combines all three pillars.

Brier score vs log loss: which should I use on prediction markets?

Brier score penalizes errors quadratically; log loss penalizes logarithmically and blows up on near-zero predictions that turn out to be wrong. Brier is less sensitive to extreme overconfidence; log loss punishes it heavily. For Polymarket-style binary markets where traders sometimes bet 0.99 on near-certainties, Brier gives a stable ranking; log loss is more sensitive to a few bad extreme predictions. Convexly uses Brier throughout because fat-tailed payoffs make log loss's sensitivity undesirable.

Can I compute my own Brier score on Polymarket?

Yes. Paste any Polymarket wallet at convexly.app/tools/polymarket-wallet-analyzer to see your Brier score, calibration curve, and percentile ranking against 8,656 other wallets. Your first wallet analysis is free and needs no signup; analyzing more wallets is free with an account. Or use the Brier Score Calculator at convexly.app/tools/brier-score-calculator to compute Brier from your own list of probability and outcome pairs.

Brier Score for Prediction Markets: Definition, Formula, and What's Actually Good

A Brier score measures how closely a trader's stated probabilities match the outcomes that actually happened. On prediction markets, it is the mean squared error between a position's implied probability at entry and the market's resolved outcome (0 for no, 1 for yes). Lower is better: 0 is perfect, 1 is systematically backwards. On Convexly's 8,656-wallet Polymarket reference cohort, the median Brier is approximately 0.188 and the top quartile is below 0.105.

Formula

For N predictions with probabilities p_i and binary outcomes o_i:

BS = (1 / N) · Σ (p_i − o_i)²

Each squared error is bounded between 0 and 1. Brier is a proper scoring rule: it incentivizes honest probability assignment. A trader who is systematically overconfident (always says 90% when they are right 60% of the time) accumulates Brier faster than a trader who states 60% and resolves at 60%.

Empirical distribution on Polymarket

From the Convexly 10,000-wallet study (8,656 with five or more resolved positions, 582,921 total resolved positions):

Median Brier score: ~0.188
25th percentile (top quartile): ~0.105
75th percentile: ~0.247
Top decile: below 0.08

Context: always predicting 0.5 on a 50/50 market yields Brier 0.25 exactly. Good Judgment Project superforecasters average near 0.10 on broad geopolitical questions. The Polymarket leaderboard median is worse than superforecasters but skewed favorably at the top.

Why Brier does not predict Polymarket profit

Across the full 8,656-wallet leaderboard, Spearman rank correlation between Brier score and signed log PnL is only +0.148. Among the top 100 wallets by profit, the correlation actually flips: worse-calibrated whales earn more (Spearman +0.42). The reason is fat tails. Hill alpha on realized Polymarket PnL is 1.28, meaning variance is formally infinite on this distribution. One large concentrated bet dominates most wallets' PnL outcomes, and a well-calibrated forecaster who never concentrates never captures the upside.

Convexly's Edge Score adds two pillars to calibration: conviction (concentration of PnL in a wallet's single largest event) and discipline (low resolved-position count). In the original V3b fit the composite reached out-of-fold Spearman +0.514 against signed log PnL where Brier alone reached +0.148; the constants were refit on 2026-07-13, so those fit statistics describe that fit, not current scores (dated method-change note at convexly.app/methodology). Fama-French bootstrap null with 10,000 PnL permutations placed the observed Spearman outside every permuted sample at p < 0.0001. In its one out-of-sample forward test, a pre-set per-wallet temporal holdout, the frozen composite held a Spearman of +0.11 (95% CI [0.05, 0.18]) with forward PnL, below the +0.30 bar set in advance; treat Edge Score as a descriptive behavioral profile, not a validated skill ranking.

Computing your own Brier

Two paths:

Paste any Polymarket wallet at the wallet analyzer. First wallet free, no signup; more wallets free with an account. 15 to 30 seconds.
Enter a list of probability and outcome pairs at the Brier score calculator. Client-side compute, nothing saved.

Both tools rank your result against the 8,656-wallet reference cohort so you know where a score sits on the real distribution, not a hypothetical one.

Brier Score for Prediction Markets

Formula

Empirical distribution on Polymarket

Why Brier does not predict Polymarket profit

Computing your own Brier

Related questions