Question 1

What is a Brier score?

Accepted Answer

A Brier score is a proper scoring rule for probabilistic forecasts. It is the mean squared error between predicted probabilities and actual binary outcomes, ranging from 0 (perfect calibration) to 1 (always wrong). 0.25 is the Brier score of always predicting 0.5 on a 50/50 market. The metric was introduced by Glenn Brier in 1950 for weather forecasting and is now the standard accuracy metric for probabilistic forecasters.

Question 2

How do I calculate a Brier score?

Accepted Answer

For each prediction, square the difference between your predicted probability (between 0 and 1) and the actual outcome (0 if it did not happen, 1 if it did). Then average those squared errors across all predictions. In formula form: BS = (1/N) · Σ(p_i − o_i)², where p_i is your probability for event i and o_i is 0 or 1. Our calculator does this automatically and updates live as you add predictions.

Question 3

What is a good Brier score?

Accepted Answer

It depends on the difficulty of the predictions. On a broad set of binary forecasts, a score below 0.20 is strong, below 0.15 is elite, and below 0.10 puts you in the top 1% of most cohorts. The Good Judgment Project's superforecasters average near 0.10 on diverse geopolitical questions. Our calculator shows your percentile against 8,656 real Polymarket wallets so you know where you stand on real-money predictions, not just textbook examples.

Question 4

How is Brier score different from accuracy?

Accepted Answer

Accuracy rewards you for being right more than half the time, treating a 51% forecast the same as a 99% forecast. Brier score rewards you for being right with the right amount of confidence. If you say 70% and the event happens, you score better than if you said 95% and the event happens. That is why Brier is a 'proper' scoring rule and accuracy is not: only Brier incentivizes honest probability assignment.

Question 5

Can I use the Brier score for more than two outcomes?

Accepted Answer

Yes, the multi-class Brier score generalizes naturally: compute the squared error between the predicted probability vector and the one-hot outcome vector, summed across classes. This calculator is focused on binary outcomes (the most common forecasting case) but the same principle extends. Libraries like scikit-learn's brier_score_loss support both.

Question 6

What is the difference between Brier score and log loss?

Accepted Answer

Both are proper scoring rules for probabilistic forecasts. Brier penalizes errors quadratically: being off by 0.2 is four times worse than being off by 0.1. Log loss (cross-entropy) penalizes errors logarithmically and blows up when you assign near-zero probability to an outcome that actually happens (which is why extreme overconfidence is very costly under log loss). Brier is more robust to extreme predictions; log loss is more sensitive.

Question 7

Where does the 8,656-wallet reference cohort come from?

Accepted Answer

Every Polymarket wallet on the public leaderboard with at least 5 resolved positions, as of April 2026. Convexly scored each wallet's calibration (Brier) on their historical bets and froze that distribution as a reference cohort. When you use this calculator, your Brier score is matched against the distribution of per-wallet mean Brier scores across the cohort, so your percentile reflects how you would rank if you were a Polymarket trader. Methodology: convexly.app/research/polymarket-10k-wallet-study.

Question 8

Does this calculator save my predictions?

Accepted Answer

No. All computation happens in your browser. Nothing is stored, nothing is sent to our servers unless you sign up for a Convexly account and explicitly save a session. If you want persistent tracking across predictions, the Calibration Challenge quiz (free, 10 questions, 2 minutes) saves your progress across attempts.

Free Brier Score Calculator

How the Brier score works

The formula

Why proper scoring rules matter

What the reference cohort tells you

Frequently asked questions

What is a Brier score?

How do I calculate a Brier score?

What is a good Brier score?

How is Brier score different from accuracy?

Can I use the Brier score for more than two outcomes?

What is the difference between Brier score and log loss?

Where does the 8,656-wallet reference cohort come from?

Does this calculator save my predictions?

Track calibration across every prediction

Other Convexly tools and research

Calibration Challenge

Polymarket Wallet Analyzer

10K Wallet Study