Data hooks for journalists · April 2026
Three Polymarket findings
Three empirical findings from the Convexly V1 (8,656 Polymarket wallets) and V1-M (15,106 Manifold users plus a 1,647-paired sub-cohort) reference datasets. Each is reported with the underlying number, the cohort, the test, the confidence interval where applicable, and a link to the methodology source. Reproducible against the public data bundle at /research/v1m/v1m-data-bundle.tar.gz. Use of these findings does not require attribution beyond a Convexly link, but a citation to the V1 paper for any methodology claim is requested.
Finding 01
PnL leaderboards are not skill rankings
On a Hill tail-index distribution with alpha = 1.28, variance is formally infinite and ranking by realized PnL is dominated by tail-event winners. The top 1% of ranked wallets in the Convexly V1 cohort captures 36.2% of signed profit. Within the top 100 by profit, calibration (Brier score) correlates with signed log PnL at Spearman r = +0.42 with the WORSE-calibrated wallets earning more, not better-calibrated ones. PnL leaderboards on Polymarket measure unit of luck on a fat-tailed distribution, not unit of skill. The Truth Leaderboard at convexly.app/truth-leaderboard shows the two rankings side by side on the same cohort.
Quotable
On Polymarket, PnL leaderboards order wallets by unit of luck on a Hill alpha = 1.28 distribution, not unit of skill. The top 1% of 8,656 ranked wallets captures 36.2% of signed profit; within the top 100 by profit, worse-calibrated whales earn more, not better- calibrated ones.
Finding 02
Concentration differs across the sweepcash window
The original public bundle reported median per-trader concentration 8.9 percentage points lower in the sweepcash window than the pre-sweepcash bridge window (95% bootstrap CI [-17.0pp, -1.1pp]; Wilcoxon signed-rank p = 0.0137 on n=333 of 1,647 paired users with concentration delta defined). The 2026-05-04 recovered-cohort politics-excluded rerun reports -9.2pp, 95% CI [-12.8, -3.6], p=0.0021 on n=515 of 1,208 paired users with concentration delta defined. The concentration conclusion survives; causal attribution remains guarded because the sweepcash window overlapped the 2024-US-election catalog. The same Edge Score V3b methodology that ranks Polymarket wallets has its discipline-pillar coefficient flip sign on Manifold (permutation p = 0.0001).
Quotable
In the original V1-M bundle, median portfolio concentration was 8.9 percentage points lower in the sweepcash window than in the pre-sweepcash bridge window (95% CI [-17.0, -1.1], Wilcoxon signed-rank p = 0.0137 on n=333 of 1,647 paired users with concentration delta defined). A 2026-05-04 politics-excluded recovered-cohort rerun reports -9.2pp, 95% CI [-12.8, -3.6], p=0.0021 on n=515 of 1,208 paired users. Treat this as a paired-window result, not a clean causal estimate of sweepcash alone.
Finding 03
Calibration explains 1.5% of who profits
Across 8,656 Polymarket wallets, Brier score (lower is better-calibrated) correlates with signed log realized PnL at Spearman r = +0.148. That is, calibration explains roughly 1.5% of the variance in who profits on Polymarket. The Convexly V1 frozen-coefficient composite (Edge Score V3b) brings the out-of-fold Spearman to +0.514 by weighting calibration with two other pillars (conviction and discipline) at frozen weights of +0.7876, +2.7220, and -1.1508. The Fama-French (2010) bootstrap null on 10,000 PnL permutations rejects the null of no association at p < 0.0001. Calibration alone is necessary but not sufficient to predict forward PnL on Polymarket; conviction and discipline measurably matter, and both are observable in the public position tape.
Quotable
Across 8,656 Polymarket wallets, calibration alone (Brier score) correlates with profit at Spearman r = +0.148, explaining roughly 1.5% of who profits. A three-pillar composite (calibration, conviction, discipline) with frozen coefficients reaches out-of- fold Spearman +0.514, p < 0.0001 against a Fama- French bootstrap null. Calibration is necessary but not sufficient; conviction and discipline measurably matter.
How to use these
Each finding has been pre-checked against the audit rules in the Convexly codebase: every point estimate ships with a confidence interval or significance test, every sub-sample (such as the n=333 with concentration delta) is disclosed in the same sentence as the headline N, and no claim of pre-registration is made about the V1 cohort itself. The V1.5 deferred experiments (per-wallet temporal holdout, IC temporal stability) are externally pre-registered at AsPredicted #287368.
For sourcing in print, "Convexly Research" is the byline. Numbers come from the V1 paper at /research/edge-score-methodology-v1 and the V1-M paper at /research/edge-score-methodology-v1m. The convergence with the Gomez-Cram, Guo, Jensen, Kung SSRN paper (April 20, 2026, revised April 25, 2026) is documented at /research/convexly-v1-vs-gomez-cram-comparison.