Free Brier Score Calculator
Enter your probabilities and outcomes. Get your mean Brier score plus a percentile ranking against 8,656 real Polymarket wallets, computed live in your browser.
Built by the team behind the 10,000-wallet Polymarket calibration audit. Percentiles are not guesses; they are measured.
Mean Brier score
0.104
Lower is better. 0 is perfect; 0.25 is a coin flip on a 50/50 market; 0.50 is always wrong.
Reference cohort
Top 95%
of 8,656 real Polymarket wallets.
How the Brier score works
The formula
For each prediction, square the difference between your probability and the actual outcome (0 or 1). Average those squared errors across all predictions. Lower is better. The formula in math notation: BS = (1/N) · Σ(p_i − o_i)².
Why proper scoring rules matter
A proper scoring rule is one where the forecaster minimizes their expected score only by reporting their true probability. Brier and log loss both satisfy this; raw accuracy does not. If you score forecasts with accuracy alone, rational forecasters learn to round every prediction to 1 or 0, which destroys the information content.
What the reference cohort tells you
Textbook examples show Brier scores of 0.1 or 0.2. Real prediction-market traders score closer to 0.24 on average across their full trading history. The percentile ranking here answers the only question that matters: where do you sit on the real distribution of active forecasters?
Frequently asked questions
What is a Brier score?
A Brier score is a proper scoring rule for probabilistic forecasts. It is the mean squared error between predicted probabilities and actual binary outcomes, ranging from 0 (perfect calibration) to 1 (always wrong). 0.25 is the Brier score of always predicting 0.5 on a 50/50 market. The metric was introduced by Glenn Brier in 1950 for weather forecasting and is now the standard accuracy metric for probabilistic forecasters.
How do I calculate a Brier score?
For each prediction, square the difference between your predicted probability (between 0 and 1) and the actual outcome (0 if it did not happen, 1 if it did). Then average those squared errors across all predictions. In formula form: BS = (1/N) · Σ(p_i − o_i)², where p_i is your probability for event i and o_i is 0 or 1. Our calculator does this automatically and updates live as you add predictions.
What is a good Brier score?
It depends on the difficulty of the predictions. On a broad set of binary forecasts, a score below 0.20 is strong, below 0.15 is elite, and below 0.10 puts you in the top 1% of most cohorts. The Good Judgment Project's superforecasters average near 0.10 on diverse geopolitical questions. Our calculator shows your percentile against 8,656 real Polymarket wallets so you know where you stand on real-money predictions, not just textbook examples.
How is Brier score different from accuracy?
Accuracy rewards you for being right more than half the time, treating a 51% forecast the same as a 99% forecast. Brier score rewards you for being right with the right amount of confidence. If you say 70% and the event happens, you score better than if you said 95% and the event happens. That is why Brier is a 'proper' scoring rule and accuracy is not: only Brier incentivizes honest probability assignment.
Can I use the Brier score for more than two outcomes?
Yes, the multi-class Brier score generalizes naturally: compute the squared error between the predicted probability vector and the one-hot outcome vector, summed across classes. This calculator is focused on binary outcomes (the most common forecasting case) but the same principle extends. Libraries like scikit-learn's brier_score_loss support both.
What is the difference between Brier score and log loss?
Both are proper scoring rules for probabilistic forecasts. Brier penalizes errors quadratically: being off by 0.2 is four times worse than being off by 0.1. Log loss (cross-entropy) penalizes errors logarithmically and blows up when you assign near-zero probability to an outcome that actually happens (which is why extreme overconfidence is very costly under log loss). Brier is more robust to extreme predictions; log loss is more sensitive.
Where does the 8,656-wallet reference cohort come from?
Every Polymarket wallet on the public leaderboard with at least 5 resolved positions, as of April 2026. Convexly scored each wallet's calibration (Brier) on their historical bets and froze that distribution as a reference cohort. When you use this calculator, your Brier score is matched against the distribution of per-wallet mean Brier scores across the cohort, so your percentile reflects how you would rank if you were a Polymarket trader. Methodology: convexly.app/blog/polymarket-10k-wallet-study.
Does this calculator save my predictions?
No. All computation happens in your browser. Nothing is stored, nothing is sent to our servers unless you sign up for a Convexly account and explicitly save a session. If you want persistent tracking across predictions, the Calibration Challenge quiz (free, 10 questions, 2 minutes) saves your progress across attempts.
Track calibration across every prediction
Convexly scores your calibration on Polymarket, Kalshi, and Manifold in one dashboard. Free to start. Works with any wallet address, no signature or private key required.
Other Convexly tools and research
Calibration Challenge
10-question forecasting quiz. Brier score and percentile ranking in under 2 minutes. No signup required.
Polymarket Wallet Analyzer
Paste a wallet. Get Brier, Edge Score, sizing diagnostics, and category breakdown in 30 seconds.
10K Wallet Study
The calibration audit behind the reference cohort. Methodology, results, and open coefficients.