Learn

What is a Brier score?

The mean squared error between what you said would happen and what happened. Lower is better, 0 is perfect, a coin-flip forecaster scores 0.25. The formula, a worked example, and the baseline-adjusted variant that makes it comparable across traders.

The answer first

The Brier score measures probability-forecast accuracy as the mean squared error between the stated probability and the realized outcome, coded 0 or 1:

BS = (1/N) · ∑ (pi− oi)2

The range is [0, 1] for binary outcomes. A perfect forecast contributes 0; the worst possible forecast contributes 1; a forecaster who always says 50% scores exactly 0.25 on any binary sequence. It was introduced by Glenn Brier in 1950 for weather forecasting and remains the standard scalar accuracy metric for probability forecasts, including on prediction markets where every trade is an implied probability statement.

Worked example

Take three resolved forecasts. You said 80% on an event that resolved yes, 60% on an event that resolved no, and 90% on an event that resolved yes:

(0.8 − 1)2 = 0.04
(0.6 − 0)2 = 0.36
(0.9 − 1)2 = 0.01
BS = (0.04 + 0.36 + 0.01) / 3 = 0.137

Two things to notice. The one miss (60% on an event that did not happen) contributes nine times as much error as the 80% hit, because the error is squared. And 0.137 is meaningless on its own: it only becomes interpretable against a baseline, which is what the next section fixes.

Reference values

On the Good Judgment Project's IARPA geopolitical tournament, the median forecaster scored around 0.20 and the top-2% superforecaster cohort around 0.16. US National Weather Service next-day precipitation probabilities run around 0.10. Prediction markets sit between those extremes, but the absolute number depends heavily on the category mix a trader bets on: a wallet that lives in 50/50 sports toss-ups faces a structurally higher Brier floor than one betting extreme base-rate markets. Raw cross-trader Brier comparisons are therefore not meaningful without an adjustment.

Skill-Brier: the baseline-adjusted version

Convexly computes skill-Brier: observed Brier minus the Brier a trivial always-predict-the-base-rate forecaster would have scored on the same set of events. Negative means the wallet beats the trivial baseline; positive means it is worse. This aligns with how skill scores are constructed in the weather forecasting literature (the Brier Skill Score), and it is the input to the posture pillar of Edge Score.

What a Brier score does NOT tell you

A good Brier score does not imply a profitable trader. Across the 8,656-wallet Polymarket cohort in the V1 study, the Spearman rank correlation between Brier score and realized PnL is only +0.148. Prediction-market profit is fat-tailed and dominated by a few large concentrated positions, so per-forecast accuracy is one input among several. The full story of why calibration alone barely predicts profit, including the sign flip inside the top-100-by-PnL cohort, is on the calibration explainer. The formal derivation lives in the V1 methodology paper.

Check a wallet's Brier score

Paste any Polymarket wallet address at the analyzer to see its raw Brier, skill-Brier, and where that places it against the 8,656-wallet reference cohort. Free, no signup, public on-chain data only.

Convexly publishes new methodology research roughly every 6-8 weeks plus the /learn series on a rolling cadence. Get the next paper in your inbox when it ships:

Frequently asked

What is the Brier score formula?
For N binary forecasts, the Brier score is the average of (p - o) squared, where p is the stated probability and o is the realized outcome coded 0 or 1. A perfect forecast contributes 0, the worst possible forecast contributes 1, and a forecaster who always says 0.5 scores exactly 0.25 on any binary sequence.
What is a good Brier score?
It depends on the domain. On the Good Judgment Project's geopolitical tournament the median forecaster scored around 0.20 and the top-2% superforecaster cohort around 0.16. US National Weather Service next-day rain probabilities run around 0.10. On prediction markets the absolute number depends heavily on the category mix, which is why Convexly reports the baseline-adjusted version (skill-Brier) rather than comparing raw Brier scores across wallets.
Is a lower Brier score better?
Yes. The Brier score is an error measure: 0 is a perfect record, 1 is the worst possible record, and 0.25 is the always-say-50% baseline on binary outcomes. Lower is always better.
Does a good Brier score mean a profitable trader?
Not by itself. Across the 8,656-wallet Polymarket cohort in the Convexly V1 study, the Spearman rank correlation between Brier score and realized PnL is only +0.148. Profit on prediction markets is fat-tailed and dominated by a few large concentrated positions, so per-forecast accuracy is one input among several, not the whole story.
What is skill-Brier?
Skill-Brier is observed Brier minus the wallet's own marginal-frequency Brier: the score a trivial forecaster who always predicts the base rate of resolution would have earned on the same events. Negative skill-Brier means the wallet beats that trivial baseline. The adjustment makes scores comparable across traders who bet on markets with very different base rates, and it is the input to the posture pillar of Edge Score.
Where can I see the Brier score of a Polymarket wallet?
Paste the wallet address at /tools/polymarket-wallet-analyzer. The free analyzer reports the wallet's raw Brier score, its skill-Brier (baseline-adjusted), and where that places it against an 8,656-wallet reference cohort. It reads public on-chain data only; no signup or signature required.

Related explainers

  • /learn/calibration: the broader concept, the Murphy decomposition, and why calibration alone barely predicts profit on Polymarket
  • /learn/realized-edge: the entry-price-based skill read Convexly pairs with a bootstrap interval
  • /learn/edge-score: the composite that uses skill-Brier as one of three pillars

Related reading

AnswersBrier score prediction markets

BlogWhat is a Brier score

BlogConvexly edge score difference

ResearchEdge score methodology v1 5