What is a Brier score?
The mean squared error between what you said would happen and what happened. Lower is better, 0 is perfect, a coin-flip forecaster scores 0.25. The formula, a worked example, and the baseline-adjusted variant that makes it comparable across traders.
The answer first
The Brier score measures probability-forecast accuracy as the mean squared error between the stated probability and the realized outcome, coded 0 or 1:
BS = (1/N) · ∑ (pi− oi)2
The range is [0, 1] for binary outcomes. A perfect forecast contributes 0; the worst possible forecast contributes 1; a forecaster who always says 50% scores exactly 0.25 on any binary sequence. It was introduced by Glenn Brier in 1950 for weather forecasting and remains the standard scalar accuracy metric for probability forecasts, including on prediction markets where every trade is an implied probability statement.
Worked example
Take three resolved forecasts. You said 80% on an event that resolved yes, 60% on an event that resolved no, and 90% on an event that resolved yes:
(0.8 − 1)2 = 0.04
(0.6 − 0)2 = 0.36
(0.9 − 1)2 = 0.01
BS = (0.04 + 0.36 + 0.01) / 3 = 0.137
Two things to notice. The one miss (60% on an event that did not happen) contributes nine times as much error as the 80% hit, because the error is squared. And 0.137 is meaningless on its own: it only becomes interpretable against a baseline, which is what the next section fixes.
Reference values
On the Good Judgment Project's IARPA geopolitical tournament, the median forecaster scored around 0.20 and the top-2% superforecaster cohort around 0.16. US National Weather Service next-day precipitation probabilities run around 0.10. Prediction markets sit between those extremes, but the absolute number depends heavily on the category mix a trader bets on: a wallet that lives in 50/50 sports toss-ups faces a structurally higher Brier floor than one betting extreme base-rate markets. Raw cross-trader Brier comparisons are therefore not meaningful without an adjustment.
Skill-Brier: the baseline-adjusted version
Convexly computes skill-Brier: observed Brier minus the Brier a trivial always-predict-the-base-rate forecaster would have scored on the same set of events. Negative means the wallet beats the trivial baseline; positive means it is worse. This aligns with how skill scores are constructed in the weather forecasting literature (the Brier Skill Score), and it is the input to the posture pillar of Edge Score.
What a Brier score does NOT tell you
A good Brier score does not imply a profitable trader. Across the 8,656-wallet Polymarket cohort in the V1 study, the Spearman rank correlation between Brier score and realized PnL is only +0.148. Prediction-market profit is fat-tailed and dominated by a few large concentrated positions, so per-forecast accuracy is one input among several. The full story of why calibration alone barely predicts profit, including the sign flip inside the top-100-by-PnL cohort, is on the calibration explainer. The formal derivation lives in the V1 methodology paper.
Check a wallet's Brier score
Paste any Polymarket wallet address at the analyzer to see its raw Brier, skill-Brier, and where that places it against the 8,656-wallet reference cohort. Free, no signup, public on-chain data only.
Convexly publishes new methodology research roughly every 6-8 weeks plus the /learn series on a rolling cadence. Get the next paper in your inbox when it ships:
Frequently asked
What is the Brier score formula?
What is a good Brier score?
Is a lower Brier score better?
Does a good Brier score mean a profitable trader?
What is skill-Brier?
Where can I see the Brier score of a Polymarket wallet?
Related explainers
- /learn/calibration: the broader concept, the Murphy decomposition, and why calibration alone barely predicts profit on Polymarket
- /learn/realized-edge: the entry-price-based skill read Convexly pairs with a bootstrap interval
- /learn/edge-score: the composite that uses skill-Brier as one of three pillars
Related reading
AnswersBrier score prediction markets
BlogConvexly edge score difference
ResearchEdge score methodology v1 5