Research
Frozen-coefficient empirical research on prediction-market skill, calibration, concentration, and position sizing. Every paper is reproducible against a frozen cohort and links to its raw data or computation.
What is proven?
Wallet-rating research has frozen coefficients, reproducible cohorts, and published negative-result follow-ups. It supports behavior ranking, not a profit promise.
What is still pending?
Coherence Signals and Market Trust need resolved outcomes, stable freshness windows, and matched controls before stronger performance language is allowed.
What can I cite?
Cite externally verified receipts and dated papers. Treat manifest-only preregistrations as filed but not externally verifiable until title and filing date resolve.
Why does this matter?
The research is the trust layer: it decides what Convexly is allowed to claim in product, sales, X posts, and enterprise conversations.
- Hantavirus cluster: cross-market mispricing scan on five linked Polymarket events
Ad-hoc cross-market belief-decomposition scan of five linked Polymarket hantavirus events (US case by May 15, PHEIC by June 30, lab leak by June 30, pandemic in 2026, FDA vaccine in 2026), all outside the daily CME pipeline universe. One MEDIUM flag: implied P(pandemic | PHEIC by year-end) ~38-55% vs historical 2 of 8 PHEICs (point 25%, 95% Clopper-Pearson 4-65%). The short-horizon 'US case by May 15' at 27.5% sits ~5-15 pp above a 48-hour Poisson baseline of ~15-20% (P(≥1) = 1 − e^(−λ·t), λ = 0.082-0.110/day, t = 2 days); gap is within thin-liquidity friction at $14.7K. Earlier draft used the annual base rate and overclaimed an ~68-72 pp gap; corrected inline.
- Macro Coverage + Coherence v0.2: source registry, mapping queue, and daily bundle outputs
A method freeze and live substrate for macro prediction-market coverage. Adds official benchmark-source registry rows, reviewed macro equivalence keys, cross-venue mapping debt, daily Macro CME bundle-readiness rows, and macro-specific Market Trust caveats without claiming complete coverage, venue-approved mappings, tradable expected value, or macro forecasting skill.
- Macro Market Trust v0.1: prediction markets as macro infrastructure
A focused taxonomy and live ledger-mapping surface for macro prediction markets across inflation, labor, rates, growth, volatility, policy/fiscal, and crypto-macro. Defines macro-specific Market Trust fields, candidate CME bundles, data-rights caveats, and low-hype public framing without claiming complete coverage or tradeable alpha.
- Cross-venue consensus probability PoC: co-listed markets with visible tolerances
An inspectable proof page for venue-neutral consensus probability. Reads venue_markets and venue_market_price_snapshots, requires either an explicit cross_venue_key or conservative normalized question match, then publishes spread-weighted probability, venue dispersion, match basis, caveats, and row hash. This is a diagnostic substrate for Market Trust and CME, not a skill-weighted alpha claim.
- V3b over V1: model selection under market-selection bias
Why the V3b Edge Score variant is preferred over V1 when the training cohort is selected on the outcome under study. Fold-local refit and a Fama-French bootstrap null make V3b and V1 empirically distinguishable.
- Coherence Signals: daily three-gate filter on Polymarket cross-market clusters
Live daily feed of cross-market structural-mispricing candidates surfaced by the Coherent Markets Engine. Each Polymarket event group is screened against probability axioms (binary additivity, mutually-exclusive sums, conditional Bayes), then through three statistical gates: BH-FDR-corrected significance, cost-adjusted expected value, and UMA-oracle resolution-risk filter. Public surface shows aggregate counts + per-signal characterizations with event identities redacted. Researcher tier ($99/mo) unlocks the full feed at /research/cme/feed with event slugs, trade construction, day-over-day signal delta, and a daily email digest. Each archive declares its pipeline_version; the V0.2 methodology is tracked in manifest #288046, with public receipt not yet verified and the backtest scheduled around 2026-07-29.
- Coherent Markets Engine V0.1: constraint-projection structural inference
An observation-side port of the Pennock-Lahaie-Kroer LCMM coherence projection, applied to Polymarket. Reads gamma metadata, builds an event-relationship graph (binary additivity, mutually-exclusive sums, conditional Bayes), projects observed prices onto the constraint-feasible region, and emits Coherence Signals through a three-gate filter: BH-FDR-corrected statistical significance, cost-adjusted expected value (Polymarket fee + slippage + capital lockup), and UMA-oracle resolution risk score. Each signal carries a SHA-256 hash-chain audit-log row. Per the V2.8.2 result tracked in manifest #287983, the Coherent Markets Engine uses market prices as the inference input, not skill-weighted aggregation; the public receipt for that manifest entry is not yet verified. Closest published peer: Saguillo et al. 2025 (IMDEA Networks, AFT 2025, arXiv 2508.03474), retrospective per-pair analysis of $39.6M Polymarket arbitrage; Convexly extends that framing to scheduled live-market n-market polytope projection.
- MarketAlpha V2.8.2: 24 frozen aggregators, none beat the market on V1-M
Frozen V2.8.2 methodology test tracked in the preregistration manifest (#287436 + #287442 + V1-M cohort substitution amendment #287714) on the V1-M Polymarket cohort; public receipt URLs for those entries are not yet verified. PRIMARY hypothesis (W-EXP beta=4 + a=2.0 + S-RAW skill-weighted aggregation) does not reject H0 vs market-implied baseline (delta_market = +0.179, 95% CI [+0.164, +0.193]) on 6,256 held-out markets. 24-aggregator sweep extends the result: zero of 24 specifications beat the market on the held-out window. Substantive finding: PnL-skill on Polymarket is not a usable forecast-aggregation weight. Reproduces and reinforces V1-M's null finding. Wash-filter equivalence test is tracked in manifest #287983, also with public receipt not yet verified.
- Edge Score Methodology V1.5: pre-registered E2 + E7 deferred-experiment results
Pre-registered V1.5 follow-up to the V1 paper, AsPredicted #287368, ran 2026-04-27 on the V1-M position tape (8,778 wallets, 542,241 resolved positions). Both primary tests failed ex-ante pre-reg thresholds: E2 per-wallet temporal holdout produced Spearman = +0.111 (95% CI [+0.046, +0.175], N=805) vs threshold +0.30; E7 per-quarter IC stability produced median ρ = +0.038 with 3 of 5 quarters positive vs the 5/6 requirement. Five supplementary analyses surface where V3b holds up (S4 partial Spearman +0.494 controlling for log_capital; S7 cross-category transfer ρ +0.17 to +0.33) and where it does not (S6 persistent-wallet inversion ρ = -0.31 for 229 wallets active 4+ quarters). Honest reframing: V3b is a cross-sectional ranker of wallet behavior, not a per-wallet temporal predictor.
- Forward-validation: rolling Spearman of Edge Score V3b on the V1-M cohort
Live monitor of the V3b composite against forward signed-log PnL across 26 rolling 30-day windows on the V1-M reference cohort, October 2025 through April 2026. Mean rolling Spearman +0.391 with all 26 windows finishing above the Brier-only baseline of +0.147. Each window paired with a 95% bootstrap CI; reference lines mark the in-sample OOF Spearman of +0.514 and the Brier-only baseline.
- Convexly V1 vs Gomez-Cram et al.: methodology comparison
Two independent papers, two different statistical tests, both find concentration of skill on Polymarket within 48 hours of each other. Side-by-side comparison of Convexly V1 (frozen-coefficient OLS, OOF Spearman +0.514, Fama-French (2010) bootstrap null p < 0.0001 on 8,656 wallets, published 2026-04-18) and Gomez-Cram, Guo, Jensen, Kung 'Prediction Market Accuracy: Crowd Wisdom or Informed Minority?' (sign-randomization on 1.72M Polymarket accounts, SSRN 6617059, posted 2026-04-20 revised 2026-04-25).
- Three Polymarket findings: data hooks for journalists
Three quotable empirical findings from the V1 (8,656 Polymarket wallets) and V1-M (15,106 Manifold users with a 1,647-paired sub-cohort) reference cohorts. PnL leaderboards rank by luck on a Hill alpha 1.28 distribution; concentration differs across the Manifold sweepcash window (causal attribution pending politics-excluded rerun); calibration alone explains 1.5% of who profits. Each reported with confidence interval, statistical test, and methodology link.
- Posture, not calibration: aligning a pillar with its coefficient
Method-explainer note. The Edge Score posture pillar was originally labelled 'calibration', but its fitted coefficient points the other way. Renaming to 'posture' preserves the measured effect without renaming the math.
- Edge Score V1-M: methodology extension and cross-venue invariance measurementPDF
18,105,158 Manifold bets, 15,106 users, and a within-user paired comparison across the sweepcash window. The original bundle reported -8.9pp median concentration delta; the 2026-05-04 recovered-cohort politics-excluded rerun reports -9.2pp, 95% CI [-12.8, -3.6], Wilcoxon p = 0.0021 on n=515 of 1,208 paired users with concentration delta defined. The calibration secondary claim is superseded. Pillar coefficients diverge categorically between Polymarket and Manifold, with the discipline pillar flipping sign (permutation p = 0.0001). Hill alpha = 0.86 on Manifold per-user PnL vs 1.28 on Polymarket, non-overlapping 95% CIs.
- Edge Score Methodology V1: a composite skill measure for prediction-market tradersPDF
A composite skill measure fit on 8,656 Polymarket wallets. Posture plus conviction plus discipline with frozen OLS coefficients, fold-local refit, and a Fama-French bootstrap null (p < 0.0001 on 10,000 permutations).
- 10,000 Polymarket wallets scored. Calibration barely predicts profit.
A calibration audit of the full Polymarket profit leaderboard. 8,656 wallets, 582,921 resolved positions. Spearman r = +0.148 across the full sample; the top 1% by absolute PnL captures 36.2% of signed profit. Sizing and concentration dominate.
- Testing the insider-trading claim on Polymarket with Taleb-proof methods
A calibration audit of the top 100 Polymarket profit wallets. The worst-calibrated quartile earns 2.02x more than the best. Spearman r = +0.42, stable under outlier removal. Eight wallets placed the same bet in an eight-day window.
Method-explainer notes ship on a rolling Monday cadence. New papers appear above as they publish.