Data hooks for journalists · April 2026

Three Polymarket findings

Three empirical findings from the Convexly V1 (8,656 Polymarket wallets) and V1-M (15,106 Manifold users plus a 1,647-paired sub-cohort) reference datasets. Each is reported with the underlying number, the cohort, the test, the confidence interval where applicable, and a link to the methodology source. Reproducible against the public data bundle at /research/v1m/v1m-data-bundle.tar.gz. Use of these findings does not require attribution beyond a Convexly link, but a citation to the V1 paper for any methodology claim is requested.

Finding 01

PnL leaderboards are not skill rankings

On a Hill tail-index distribution with alpha = 1.28, variance is formally infinite and ranking by realized PnL is dominated by tail-event winners. The top 1% of ranked wallets in the Convexly V1 cohort captures 36.2% of signed profit. Within the top 100 by profit, calibration (Brier score) correlates with signed log PnL at Spearman r = +0.42 with the WORSE-calibrated wallets earning more, not better-calibrated ones. PnL leaderboards on Polymarket measure unit of luck on a fat-tailed distribution, not unit of skill. The Truth Leaderboard at convexly.app/truth-leaderboard shows the two rankings side by side on the same cohort.

Cohort8,656 Polymarket wallets, V1 frozen 2026-04-25
TestHill tail-index alpha = 1.28; Spearman within top-100 PnL = +0.42
Visual referenceTruth Leaderboard

Quotable

On Polymarket, PnL leaderboards order wallets by unit of luck on a Hill alpha = 1.28 distribution, not unit of skill. The top 1% of 8,656 ranked wallets captures 36.2% of signed profit; within the top 100 by profit, worse-calibrated whales earn more, not better- calibrated ones.

Finding 02

Real money rewires concentration

In a within-user paired sample of 1,647 Manifold traders (333 with concentration delta defined on both sides of the play-money / real-money switch), real money reduced median per-trader concentration by 8.9 percentage points. The 95% confidence interval is [-17.0pp, -1.1pp]; the bootstrap p-value against the null of no concentration change is < 0.0001. The same Edge Score V3b methodology that ranks Polymarket wallets has its discipline-pillar coefficient flip sign on Manifold (p = 0.0001). The cross-venue divergence is a feature of how skill is venue-specific, not a contradiction in the methodology; the V1-M paper documents the divergence and its statistical significance.

Cohort1,647 paired Manifold traders (n=333 with concentration delta defined)
TestWithin-user paired bootstrap, 10,000 reps
Effect sizeMedian delta -8.9pp (95% CI [-17.0, -1.1], p < 0.0001)

Quotable

In a within-user natural experiment on 1,647 paired Manifold traders (n=333 with concentration delta defined), switching from play money to real money reduced median portfolio concentration by 8.9 percentage points (95% CI [-17.0, -1.1], p < 0.0001). Skin in the game changes risk-taking; it does not change calibration.

Finding 03

Calibration explains 1.5% of who profits

Across 8,656 Polymarket wallets, Brier score (lower is better-calibrated) correlates with signed log realized PnL at Spearman r = +0.148. That is, calibration explains roughly 1.5% of the variance in who profits on Polymarket. The Convexly V1 frozen-coefficient composite (Edge Score V3b) brings the out-of-fold Spearman to +0.514 by weighting calibration with two other pillars (conviction and discipline) at frozen weights of +0.7876, +2.7220, and -1.1508. The Fama-French (2010) bootstrap null on 10,000 PnL permutations rejects the null of no association at p < 0.0001. Calibration alone is necessary but not sufficient to predict forward PnL on Polymarket; conviction and discipline measurably matter, and both are observable in the public position tape.

Cohort8,656 Polymarket wallets (full V1 cohort)
TestSpearman rank correlation; Fama-French (2010) bootstrap null
Calibration onlySpearman r = +0.148
Edge Score V3b (in-sample, cross-wallet)OOF Spearman = +0.514, p < 0.0001
Forward-validation (rolling, 26 windows)Mean Spearman = +0.391; all 26 above +0.147 baseline
V1.5 per-wallet temporal holdoutSpearman = +0.111 [+0.046, +0.175], FAILED +0.30 threshold

Quotable

Across 8,656 Polymarket wallets, calibration alone (Brier score) correlates with profit at Spearman r = +0.148, explaining roughly 1.5% of who profits. A three-pillar composite (calibration, conviction, discipline) with frozen coefficients reaches out-of- fold Spearman +0.514, p < 0.0001 against a Fama- French bootstrap null. Calibration is necessary but not sufficient; conviction and discipline measurably matter.

How to use these

Each finding has been pre-checked against the audit rules in the Convexly codebase: every point estimate ships with a confidence interval or significance test, every sub-sample (such as the n=333 with concentration delta) is disclosed in the same sentence as the headline N, and no claim of pre-registration is made about the V1 cohort itself. The V1.5 deferred experiments (per-wallet temporal holdout, IC temporal stability) are externally pre-registered at AsPredicted #287368.

For sourcing in print, "Convexly Research" is the byline. Numbers come from the V1 paper at /research/edge-score-methodology-v1 and the V1-M paper at /research/edge-score-methodology-v1m. The convergence with the Gomez-Cram, Guo, Jensen, Kung SSRN paper (April 20, 2026, revised April 25, 2026) is documented at /research/convexly-v1-vs-gomez-cram-comparison.