Learn

What is Edge Score?

Q: What is a good Edge Score?

Edge Score is on a 0-100 percentile scale by construction, so a score of 50 is exactly the cohort median. A score above 80 puts a wallet in the top 20% of the 8,656-wallet reference cohort by composite skill ranking. Top decile is 90+. The current top of the daily Polymarket leaderboard sits around 95-100. Note that high Edge Score does not guarantee future profit: the composite ranks wallets cross-sectionally on the historical training cohort. Realized future PnL is fat-tailed (Hill alpha = 1.28) and individual outcomes vary widely.

Q: Why three pillars instead of just calibration?

Because calibration alone is a weak predictor of profit on prediction markets. Across the full 8,656-wallet Polymarket cohort, Spearman rank correlation between Brier score and realized PnL is only +0.148. Adding conviction (PnL concentration) and discipline (position count) lifts the composite Spearman to +0.514 out-of-fold. The intuition: Polymarket PnL is dominated by a few large concentrated positions; a trader who is well-calibrated but spreads tiny bets across many markets does not capture much of the available edge. The three-pillar composite captures the joint behavior that empirically tracks profit on this cohort.

Q: Is Edge Score the same as PnL or rank by realized profit?

No. PnL ranks wallets by historical realized profit, which is a backward-looking number heavily distorted by fat tails. Edge Score is a cross-sectional skill ranking that captures the behavioral profile (calibration + concentration + discipline) which correlates with realized profit on the training cohort, with the explicit intent of being a more stable signal than PnL alone. V1.5 explicitly tested per-wallet temporal predictive power and reported that both primary tests failed at their ex-ante thresholds; V3b is therefore supported as a cross-sectional ranker, not a per-wallet forecast oracle. The PnL-vs-Edge comparison at /truth-leaderboard shows the same Top-50 cohort ranked both ways side-by-side; the columns disagree on roughly half the entries.

Q: Where is the methodology published?

The V1 paper is at /research/edge-score-methodology-v1. The V1-M cross-venue extension is at /research/edge-score-methodology-v1m. The V1.5 follow-up paper covers per-wallet temporal holdout and per-quarter Information Coefficient stability. Code and reproduction scripts are in the public Convexly repository.

Q: Can I score my own wallet?

Yes, free and without signup. Paste any Polymarket wallet address at /tools/polymarket-wallet-analyzer to see the wallet's Edge Score, the per-pillar breakdown, the percentile against 8,656 reference wallets, and a plain-English narrative verdict. The analyzer reads public on-chain data only; no wallet signature, no private key, no personal data collected.

Q: What does Edge Score NOT do?

Edge Score is a cross-sectional ranking instrument on a survivor cohort. It does not bound expected returns for any individual wallet, predict returns under a different incentive regime (per the V1-M cross-venue findings, fitted coefficients differ materially across Polymarket vs Manifold), or substitute for category-specific or time-period-specific analysis. It also does not separate skill from luck on a single wallet's history; the per-wallet temporal holdout that would address this ran in the V1.5 follow-up paper and failed its ex-ante threshold (see /research/edge-score-methodology-v1-5).

A composite skill measure for prediction-market traders. Three pillars (calibration, conviction, discipline), one frozen coefficient set, fit on 8,656 real Polymarket wallets. Open methodology, public data, scorable on any wallet for free.

The one-paragraph version

Edge Score combines three z-scored predictors of trader behavior on Polymarket: posture (the standardized negation of baseline-adjusted Brier score), conviction (PnL concentration in the wallet's single largest event), and discipline (the log of resolved position count, with a negative sign in the composite). The frozen coefficients were fit by ordinary least squares against signed log realized PnL on a reference cohort of 8,656 Polymarket wallets, sampled April 15-16 2026, and never refit at inference time. The raw composite is mapped to a 0-100 percentile rank against the training cohort distribution. Out-of-fold Spearman rank correlation with signed log realized PnL (in-sample) is +0.514, against +0.148 for a Brier-only baseline. This +0.514 is a cross-sectional, in-sample association, not established forward skill: the per-wallet temporal holdout we committed to in advance held only an out-of-sample Spearman of +0.11 (95% CI 0.05 to 0.18) with forward PnL and did not clear the +0.30 threshold it was filed against.

Why three pillars and not just one

If calibration were the dominant skill on prediction markets, a trader's Brier score alone would predict their profit rank cleanly. It does not. Across the full 8,656-wallet Polymarket cohort, the Spearman rank correlation between raw Brier score and realized PnL is only +0.148. Among the top 100 wallets by realized profit, the relationship actually flips: worse-calibrated wallets in that group earn more (Spearman +0.42 in the whale audit). The empirical story is that prediction-market profit is fat-tailed (Hill tail index = 1.28; below the alpha = 2 threshold above which OLS variance is well-behaved), and a few large concentrated positions dominate realized PnL.

Adding conviction (concentration) and discipline (position count) to the composite lifts the in-sample out-of-fold Spearman rank correlation with PnL to +0.514. The intuition: a trader who is well-calibrated but spreads tiny bets across many markets does not capture much of the available edge; a trader who concentrates the right way at the right times tends to outperform. Edge Score does not tell you HOW to find the right concentration; it tells you what the historical pattern of the profitable cohort looks like, and how a given wallet ranks against that pattern.

The three pillars in plain English

Posture (calibration)

Posture is the standardized negation of baseline-adjusted Brier. Baseline-adjusted Brier is observed Brier minus the wallet's own marginal-frequency Brier (the trivial always-predict-the-base-rate baseline). The coefficient on this pillar is +0.79. Higher posture means worse calibration relative to the baseline-trivial alternative, which on this cohort empirically aligns with higher realized profit. The pillar does not measure forecasting accuracy in the traditional sense; it measures the sign-aligned contribution of calibration to PnL on this specific cohort. The renaming from "calibration" to "posture" in the V1 paper preserves the measured effect without overclaiming what the pillar tracks.

Conviction (concentration)

Conviction is the standardized share of total realized PnL attributable to the wallet's single largest event. The coefficient on this pillar is +2.72, the largest of the three. Higher conviction means a more barbell-concentrated profit profile: most of the wallet's return comes from one event. On the training cohort, the wallets that compound the most are not the ones that distribute risk evenly across the catalog; they are the ones that concentrate when the conviction trade appears.

Discipline (position count)

Discipline is the standardized log1p of resolved position count, with a negative sign in the composite. The coefficient on this pillar is -1.15. Higher discipline (in the composite-contribution sense) corresponds to fewer resolved positions: the most profitable wallets on the training cohort hold fewer, larger positions. A trader who places hundreds of small bets across the catalog tends to score lower on Edge Score even with comparable calibration.

What the score number actually means

Edge Score is on a 0-100 percentile scale by construction. A score of 50 is exactly the cohort median. A score of 90 means the wallet is in the top decile of the 8,656-wallet reference cohort by composite ranking. The current top of the daily Polymarket leaderboard sits around 95-100. Below 30 means the composite places the wallet in the bottom third by skill ranking, regardless of where their realized PnL lands.

Two important caveats. First, the percentile is computed against the frozen training cohort, not against any new wallet population the analyzer encounters. A new wallet that is materially different from the training cohort (e.g., a wallet that only bets on one category, or one that has very few resolved positions) is being scored by extrapolation. Second, the composite ranks wallets cross-sectionally; it does not bound expected returns for any individual wallet. Realized PnL on Polymarket is fat-tailed and individual outcomes vary widely.

What Edge Score does NOT do

Edge Score does not predict whether a particular wallet will profit on the next market. It does not separate skill from luck on a single wallet's realized PnL history (the per-wallet temporal holdout that addresses this is covered in the V1.5 follow-up paper). It does not transfer cleanly across venues: per the V1-M paper, the fitted coefficients diverge materially between Polymarket and Manifold, with the discipline pillar flipping sign at permutation p = 0.0001. And it does not substitute for category-specific calibration analysis, time-period analysis, or position-sizing diagnostics that depend on bankroll context.

Where the methodology lives

The V1 methodology paper (frozen coefficients, full validation suite, Fama-French bootstrap null at 10,000 permutations) is at /research/edge-score-methodology-v1. The V1-M cross-venue extension (15,106-user Manifold cohort, sweepcash within-user paired comparison plus the 2026-05-04 politics-excluded sensitivity update) is at /research/edge-score-methodology-v1m. The V1.5 deferred experiments (per-wallet temporal holdout + per-quarter Information Coefficient stability) were filed ex-ante before any analysis ran. The reproducibility data bundle, including 15,106 aggregated user records and a stdlib-only Python script that re-runs the analysis, is downloadable as a 1.2 MB tar.gz at /research/v1m/v1m-data-bundle.tar.gz.

Score a wallet

Paste any Polymarket wallet address at the analyzer. No signup, no signature, no private key. Reads public on-chain data only.

Score a Polymarket wallet

Convexly publishes new methodology research roughly every 6-8 weeks plus the /learn series on a rolling cadence. Get the next paper in your inbox when it ships:

Frequently asked

What is Edge Score?

Edge Score is a composite skill measure for prediction-market traders, fit on a frozen reference cohort of 8,656 Polymarket wallets. It combines three z-scored pillars: calibration (baseline-adjusted Brier score, coefficient +0.79), conviction (PnL concentration in the trader's single largest event, coefficient +2.72), and discipline (log of resolved position count, coefficient -1.15). The raw composite is mapped to a 0-100 percentile rank against the training cohort. Out-of-fold Spearman rank correlation with signed log PnL is +0.514, against +0.148 for calibration alone.

How is Edge Score calculated?

The raw composite is: 0.7876 * z(-skill_brier) + 2.7220 * z(concentration) - 1.1508 * z(log1p(n_positions)), where each pillar is z-scored against the frozen training cohort moments. The raw score is then mapped to a 0-100 percentile rank against the training cohort distribution. Coefficients are frozen and never refit at inference time. Full derivation, OLS validation, and the Fama-French bootstrap null (p < 0.0001 on 10,000 permutations) are in the V1 methodology paper at /research/edge-score-methodology-v1.

What is a good Edge Score?

Edge Score is on a 0-100 percentile scale by construction, so a score of 50 is exactly the cohort median. A score above 80 puts a wallet in the top 20% of the 8,656-wallet reference cohort by composite skill ranking. Top decile is 90+. The current top of the daily Polymarket leaderboard sits around 95-100. Note that high Edge Score does not guarantee future profit: the composite ranks wallets cross-sectionally on the historical training cohort. Realized future PnL is fat-tailed (Hill alpha = 1.28) and individual outcomes vary widely.

Why three pillars instead of just calibration?

Because calibration alone is a weak predictor of profit on prediction markets. Across the full 8,656-wallet Polymarket cohort, Spearman rank correlation between Brier score and realized PnL is only +0.148. Adding conviction (PnL concentration) and discipline (position count) lifts the composite Spearman to +0.514 out-of-fold. The intuition: Polymarket PnL is dominated by a few large concentrated positions; a trader who is well-calibrated but spreads tiny bets across many markets does not capture much of the available edge. The three-pillar composite captures the joint behavior that empirically tracks profit on this cohort.

Is Edge Score the same as PnL or rank by realized profit?

No. PnL ranks wallets by historical realized profit, which is a backward-looking number heavily distorted by fat tails. Edge Score is a cross-sectional skill ranking that captures the behavioral profile (calibration + concentration + discipline) which correlates with realized profit on the training cohort, with the explicit intent of being a more stable signal than PnL alone. V1.5 explicitly tested per-wallet temporal predictive power and reported that both primary tests failed at their ex-ante thresholds; V3b is therefore supported as a cross-sectional ranker, not a per-wallet forecast oracle. The PnL-vs-Edge comparison at /truth-leaderboard shows the same Top-50 cohort ranked both ways side-by-side; the columns disagree on roughly half the entries.

Where is the methodology published?

The V1 paper is at /research/edge-score-methodology-v1. The V1-M cross-venue extension is at /research/edge-score-methodology-v1m. The V1.5 follow-up paper covers per-wallet temporal holdout and per-quarter Information Coefficient stability. Code and reproduction scripts are in the public Convexly repository.

Can I score my own wallet?

Yes, free and without signup. Paste any Polymarket wallet address at /tools/polymarket-wallet-analyzer to see the wallet's Edge Score, the per-pillar breakdown, the percentile against 8,656 reference wallets, and a plain-English narrative verdict. The analyzer reads public on-chain data only; no wallet signature, no private key, no personal data collected.

What does Edge Score NOT do?

Edge Score is a cross-sectional ranking instrument on a survivor cohort. It does not bound expected returns for any individual wallet, predict returns under a different incentive regime (per the V1-M cross-venue findings, fitted coefficients differ materially across Polymarket vs Manifold), or substitute for category-specific or time-period-specific analysis. It also does not separate skill from luck on a single wallet's history; the per-wallet temporal holdout that would address this ran in the V1.5 follow-up paper and failed its ex-ante threshold (see /research/edge-score-methodology-v1-5).

Related explainers

/learn/calibration: what a Brier score actually measures, baseline-adjusted Brier (skill-Brier), and why calibration alone barely predicts profit on Polymarket
/learn/conviction: what concentration means, how to read a barbell PnL profile (coming soon)
/learn/discipline: why position count predicts profit inversely on Polymarket and oppositely on Manifold (coming soon)
/learn/kelly: fractional Kelly under fat tails, why full-Kelly is unsafe when alpha < 2 (coming soon)