March 28, 20268 min read

Overconfidence and Position Sizing on Polymarket

On a venue with fat-tailed returns, overconfidence does not just embarrass the trader. It ends the account. The calibration research explains why the error is nearly universal, and the Convexly 8,656-wallet cohort shows what oversized bets actually do over a full year of resolved positions.

The gap the trader cannot feel

Decades of calibration research show a consistent pattern: when people say they are 90 percent confident, the event occurs roughly 70 to 80 percent of the time. Lichtenstein, Fischhoff, and Phillips surveyed the literature in 1982 ("Calibration of probabilities: The state of the art to 1980," in Judgment under Uncertainty) and the pattern has replicated consistently since. Moore and Healy's 2008 meta-analysis confirms the overconfidence effect is stable across professions and task types ("The trouble with overconfidence," Psychological Review, 115:2).

The subjective experience of being 90 percent confident is the same whether the true probability is 90 or 70. No internal signal distinguishes the two states. Only post-hoc scoring of many predictions reveals the gap. This is the structural reason a Polymarket trader who has not run a calibration audit has almost no reliable sense of their own probability calibration.

Why overconfidence is particularly expensive on Polymarket

The overconfidence literature was developed on finite- variance domains: trivia questions, weather forecasts, medical diagnoses. The error cost on those domains is bounded. A weather forecaster who claims 90 percent and hits 75 percent is embarrassing but not bankrupt.

Polymarket is not that kind of domain. The Hill tail-index estimator on realized Polymarket PnL across the Convexly 8,656-wallet cohort returns α = 1.28, with a 95 percent confidence interval of 1.20 to 1.36 (Edge Score Methodology V1). For any α below 2, the variance of the underlying distribution is formally infinite. An overconfident trader on a fat-tailed venue sizes to a variance that does not converge.

The operational consequence: a 10 percentage point overconfidence gap that would cost a weather forecaster a slightly worse Brier score costs a Polymarket trader a position size that is 30 to 50 percent too large under Kelly-family math, on a payoff distribution where tail draws wipe the account. The Convexly 10K-wallet study found the top 1 percent of wallets by absolute PnL captures 36.2 percent of signed profit (10,000 Polymarket Wallets Scored). Staying in the distribution long enough to be counted depends on not sizing past survival.

The calibration-vs-PnL disconnect

There is a surprise in the Convexly data worth pricing in. Across the full 8,656-wallet cohort, Spearman rank correlation between Brier score (calibration) and signed log PnL is +0.148. Among the top 100 wallets by absolute PnL the correlation flips negative: the worst-calibrated quartile earns roughly 2.02 times the median PnL of the best-calibrated quartile (top-100 whale audit).

A surface reading of that result is "calibration does not matter." It is the wrong reading. The correct one is that calibration is a survival metric, not a profit metric. Good calibration is what keeps the trader from sizing a 90 percent bet that should have been 70 percent, and that matters most on exactly the fat-tailed payoffs where a single mis-sized position is terminal. The whales at the top of the leaderboard earned their concentration by surviving long enough to place the bet they had edge on. Overconfidence is the error that prevents that survival.

What the research-backed traders do differently

Tetlock and Gardner's Superforecasting (2015) identifies a small group of forecasters who calibrate materially better than base. Two of their characteristic behaviors translate directly into prediction-market trading:

Smaller adjustments from base rates. Superforecasters typically adjust 5 to 15 percentage points from the reference class, not 40. Overconfident traders treat every contract as if their private view overrides the historical frequency by an order of magnitude.
Frequent updates in small steps. Superforecasters revise probability estimates in 2 to 5 percentage point increments as information arrives. Overconfident traders hold the initial number and size up when the market moves their way.

Both behaviors show up in the Convexly wallet analyzer as higher posture percentile, which is the closest operational proxy the framework has for calibration discipline.

A self-check the trader can run today

Write down the next 10 probability estimates on resolvable Polymarket contracts. Use any confidence value between 0.5 and 0.95. Do not cluster at 0.8 because it feels safe. Two weeks later, score the results: of the estimates in the 0.7 to 0.8 band, roughly 70 to 80 percent should have resolved in the predicted direction. Of those in the 0.9 to 0.95 band, 90 to 95 percent. Systematic miscalibration shows up immediately: if 0.9 estimates hit 0.7 of the time, the trader is sizing to a probability that is not there.

The wallet analyzer runs this calibration check automatically on any on-chain Polymarket history. Posture percentile benchmarks the trader against the 8,656 ranked wallets in the cohort. A posture percentile below 40 means the calibration gap is real and the stake sizes need to come down until it is corrected.

See how your wallet scores.

Posture (calibration), conviction (concentration), and discipline percentiles against 8,656 benchmarked Polymarket wallets. Free, no signup.

Analyze a wallet Read the methodology paper

Sources. Lichtenstein, Fischhoff, Phillips (1982), "Calibration of probabilities: The state of the art to 1980," in Judgment under Uncertainty. Moore and Healy (2008), "The trouble with overconfidence," Psychological Review 115:2. Tetlock and Gardner (2015), Superforecasting. Convexly (2026), Edge Score Methodology V1 and the 10K-wallet study.