For external methodology changes filed after 2026-04-25, Convexly attempts to lock hypotheses before analysis and then publishes the receipt status here. External links are shown as verified only when the linked page contains the expected AsPredicted ID, title, and filing date; otherwise they are marked pending, stale, or broken. The verdict + run date + effect-size CI are reported within 24 hours of the test running. The original V1 and V1-M papers were not retroactively pre-registered; they remain frozen-coefficient methodology with ex-ante version-controlled commitment via the SHA-256 audit chain. Failed methodology tests land in the negative-result registry; the audit chain is verifiable in your browser at /research/verify.
Last updated 2026-06-01T18:00:00Z.Receipts checked 2026-06-01T18:00:00Z.10 entries
Receipt health: 1 externally verified/9 pending public URL/0 broken or stale
AsPredicted #287368
Filed 2026-04-25 · Ran 2026-04-27
V1.5 follow-up experiments E2 + E7
Failed (rejected)
External receipt verified
E2 per-wallet temporal holdout: ρ = +0.111 [+0.046, +0.175], well below the +0.30 pre-reg threshold. E7 per-quarter IC stability: median ρ = +0.038, only 3 of 5 quarters positive vs ≥5/6 required. Both failed.
Initial in-sample test of skill-weighted aggregation as per-market price prior. 24 aggregator variants tested; all rejected. Cohort substitution amendment filed as #287714.
Cohort substitution from V1 (8,656 wallets) to V1-M (8,778 wallets) to verify the negative result is not cohort-specific. All 24 aggregator variants rejected on V1-M as well; consistent with the original finding.
V2.8.2 wash-filter TOST equivalence test on V1-M Polymarket cohort (Sirolly-adapted)
Passed
Public receipt not verified
Wash-filter robustness check on the V2.8.2 negative result. Brier delta CI [+0.16028, +0.19287] sits inside the pre-registered TOST equivalence range [+0.154, +0.204]. Movement after wash filtering: +0.00243 Brier (1.4% relative). The V2.8.2 finding (skill-weighted aggregation rejected) is robust to wash-trader filtering at composite-z >= 3.0.
CME V0.2 backtest: 90-day walk-forward on Polymarket constraint-projection signals
Pending
Public receipt not verified
90-day walk-forward backtest of the CME V0.2 constraint-projection pipeline pre-registered. Hyperparameters frozen ex-ante (thresholds, sizing, cost model, performance metrics). No hyperparameter tuning based on backtest results allowed by the pre-reg.
V2-Perps Edge Score: skill ranking with CRPS + funding-capture pillars
Pending
Public receipt not verified
Pre-registers the form (4 pillars: CRPS-posture, conviction, discipline, funding-capture) + 7 validation gates for the V2-Perps Edge Score composite. Form locked at freeze commit 8c86dd4; coefficients TBD pending Hyperliquid 90-day cohort fit. Composite reduces to V1 / V3b on binary outcomes (Brier-equivalence identity) and extends across crypto perps, equity perps, compute futures, AI benchmark markets, valuation futures, and prediction markets per spec Section 6.
Strictly-prospective realized-vs-control validation of CME signals. H1: realized PnL of the CME-chosen side at the USD 1,000 capacity tier exceeds the mean of K=20 matched-noise controls, one-sided paired permutation at alpha = 0.025, AND the 95% bootstrap CI lower bound for mean paired difference is > 0. Evidence window = 92 calendar days beginning the first full UTC signal-emission day after the filing timestamp; pre-filing/same-day signals excluded. Reports insufficient_sample if fewer than 30 resolved signal/control pairs by the analysis date; no threshold tuning or window extension without a new pre-registration. CME methodology frozen for the window.
Strictly-prospective forward-persistence holdout for the in-sample FDR-cleared discretionary wallet set (178 of 3,871 wallets clear BH-FDR at q = 0.10 for positive realized edge over entry prices on the frozen 2026-04-25 tape; expected false discoveries among the cleared set at most ~17.8 -- in-sample skill-vs-luck separation, NOT validated forward skill). Frozen objects: realized-edge skill measure mean(won - vwap_prob); the 178-wallet candidate set; the 3,693-wallet control set; the micro-market exclusion rule. H1 (both legs required): (a) candidate-set pooled forward edge exceeds control-set pooled forward edge, one-sided wallet-label permutation at alpha = 0.025; (b) candidate-set pooled forward edge has a 95% BCa lower bound > 0. Evidence window = 90 calendar days of newly-resolved discretionary positions beginning the first full UTC day after the filing timestamp; 2026-04-25 in-sample positions excluded. Floors: >=10 forward positions/wallet, >=40 candidate wallets, >=1,000 candidate forward positions, else insufficient_sample. A pre-registered null (no_persistence) or insufficient_sample is a valid, publishable outcome.
Filing rule: For post-2026-04-25 methodology changes that affect external claims, Convexly either files a pre-registration before analysis runs or marks the item internal-only / pending-public-url until an external receipt can be verified.
Verdict update rule: When a pre-registered test runs, the verdict + run_at_utc + verdict_summary are updated within 24 hours of the run completing. Verdicts are PASSED, FAILED, or PENDING. Failed pre-registrations are added to the negative-result registry at /research/negative-results.
Supersession rule: When a pre-registration is superseded by an amendment, the original entry is kept (verdict noted as superseded) and the amendment is added as a separate entry. Original entries are never removed.
Audit-chain link: Every entry's audit_chain_anchor field references the SHA-256-hash-chained run identifier in apps/web/public/research/cme/audit_log.jsonl (or paper-specific provenance log). The /research/verify page walks the chain in client-side JavaScript and renders a green stamp if every prev_hash matches its parent's row_hash.