Polymarket Calibration Tool

Convexly's Polymarket Wallet Analyzer is a free, no-signup tool that computes Brier score, calibration curve by confidence bucket, and an Edge Score composite against a reference cohort of 8,656 real Polymarket wallets. Paste any Polymarket wallet address at convexly.app/tools/polymarket-wallet-analyzer and results return in 15 to 30 seconds. No private key, no signature, no wallet connection. The methodology is open and pre-registered, shipped as a 13-page paper on 2026-04-18.

What the tool returns

The analyzer returns a full breakdown in under 30 seconds:

  • Brier score (lower is better, 0 perfect to 1 worst) computed over every resolved position
  • Calibration curve binned by stated probability (50-60%, 60-70%, etc.) with observed win rate and edge per bucket
  • Skill-Brier (baseline-adjusted) showing whether the wallet beats always predicting its own marginal frequency
  • Concentration risk score showing the share of PnL attributable to the wallet's single largest event
  • Edge Score percentile against the frozen 8,656-wallet reference cohort, with posture / conviction / discipline pillar breakdown
  • Top winning positions, over-sizing risk vs fractional Kelly, and a category calibration slice (crypto, politics, sports, etc.)

What makes this tool different from other whale trackers

Polywhaler, Polysights, Hashdive, and similar services optimize for copy-trading: they show you what the top wallets are buying right now. Convexly measures a different question. Not “what did this wallet bet,” but “how skilled is this wallet at the bets it has made.” The distinction matters because calibration alone correlates only +0.148 (Spearman) with realized PnL across the full Polymarket leaderboard. Copy-trading a wallet whose PnL is concentrated in a single macro event replicates the concentration, not the skill.

How to interpret your result

The Edge Score is a 0 to 100 percentile against the training cohort. Above 90 is elite (top decile). Between 70 and 90 is strong. Between 30 and 70 is cohort median. Below 10 is the bottom decile.

The pillar breakdown is where the story lives. A high Edge Score driven by discipline (few resolved positions) is a different profile from one driven by conviction (concentration) or posture (calibration vs baseline). The posture pillar in particular is counterintuitive: on this cohort, the top of the leaderboard is in the worst calibration quartile. The full explanation is in the companion post Posture, not Calibration.

Methodology

Edge Score V3b was fit on 8,656 Polymarket wallets with at least five resolved positions as of 2026-04-15. Out-of-fold Spearman correlation with signed log PnL is +0.514. A Fama-French 2010 bootstrap null with 10,000 PnL permutations places the observed Spearman outside every permuted sample at one-sided p < 0.0001. Hill alpha on realized PnL is 1.28, so all inference is rank-based.

The full methodology is published at Edge Score Methodology V1, 13-page PDF with citations, validation experiments, limitations, and a dedicated responses section addressing six anticipated critiques.

Try the tool now.

Free, no signup, no private key. 15 to 30 seconds per wallet.

Open the wallet analyzer

Related questions

What is a Brier score for prediction markets?

How do you distinguish skill from luck in trading PnL?

How do you measure forecasting calibration?