Entry #001
AsPredicted #287983
Skill-weighted aggregation as a per-market price prior
Twenty-four pre-registered skill-weighted aggregator variants (linear, exponential, rank-based, top-k, conviction-weighted, posture-weighted, calibration-weighted, multiplicative, et al.) were tested as forecasts. All twenty-four were worse than the market price. The wash-trader-filtered variant Brier delta CI [+0.16028, +0.19287] sits inside the pre-registered TOST equivalence range [+0.154, +0.204]; movement after wash filtering was +0.00243 Brier (1.4% relative).
- Specific claim disproved
- Aggregating Polymarket trader forecasts by any skill-weighted shape produces a per-market probability prior that is more accurate than the market price.
- Effect-size CI (Brier delta)
- +0.179 [+0.16028, +0.19287]
- Direction
- Aggregator was WORSE than market price by 17.9% Brier on average.
- Pre-registered TOST equivalence range
- [+0.154, +0.204]
- Filed / ran
- Filed 2026-04-25, ran 2026-04-26, wash-filter robustness 2026-04-29
What remains open
- ·Per-wallet ranking (V1, OOF Spearman +0.514, NOT closed by V2.8.2)
- ·Within-wallet z-score for insider-pattern detection MVP (NOT closed)
- ·Wash-trading detection (Sirolly methodology, NOT closed)
- ·Structural / constraint-projection inference (CME V0.1, uses MARKET PRICES not skill-weighted aggregation as forecast input, NOT closed)
Implications for product
Bright line: when evaluating any new commercial proposal, check first 'does it use skill-weighted aggregation as a per-market price prior?' If yes, reject unless there's a substantive reason to believe the prior shape was wrong vs the 24 already-rejected shapes.
Entry #002
AsPredicted #287368
V1 V3b composite survives strict per-wallet temporal holdout
V1.5 experiment E2 (pre-registered): refit V3b OLS on a 2024-01-01 to 2025-09-30 training window with a 14-day purge, score on a 2025-10-15 to 2026-04-15 held-out window. Frozen V1 coefficients applied to training-window pillars produced Spearman ρ = +0.111 with 95% CI [+0.046, +0.175] and p ≈ 0.001. Positive and the CI excludes zero, but well below the pre-registered ρ ≥ +0.30 pass threshold. The fail is reported per the pre-reg's no-post-hoc-re-analysis rule.
- Specific claim disproved
- The V1 V3b composite, fit on full-cohort data and applied to training-window pillars, predicts forward held-out PnL at Spearman ρ ≥ +0.30 on the V1-M cohort.
- Effect-size CI (Brier delta)
- Spearman ρ = +0.111, 95% CI [+0.046, +0.175], p ≈ 0.001
- Direction
- Positive but below the +0.30 pre-registered threshold.
- Pre-registered TOST equivalence range
- Pre-registered ρ ≥ +0.30 pass criterion; observed +0.111 fails the threshold.
- Filed / ran
- Filed 2026-04-25, ran 2026-04-27
What remains open
- ·V3b as a CROSS-SECTIONAL ranker (the per-period ranking + S4 partial Spearman +0.494 controlling for log capital remain open)
- ·V3b as a per-wallet TEMPORAL forecast: closed by E2; do not market it as such
Implications for product
V3b is positioned as a cross-sectional ranker, not a per-wallet temporal predictor and not a forecast-aggregation weight. External framing should never claim 'V3b predicts your forward PnL across windows' without citing this E2 rejection.
Entry #003
AsPredicted #287368
V1 V3b per-quarter IC stability across the pre-registered window
V1.5 experiment E7 (pre-registered): bucket V1-M positions by quarter, recompute V3b with quarter-local standardization, compute Spearman of frozen V3b vs signed log PnL in each quarter. Pre-registered window 2024Q1 to 2025Q2 (six quarters); 5 of 6 quarters present (2024Q1 had 29 wallets, below the ≥5-position cohort filter). Median per-quarter Spearman = +0.038, range across quarters [-0.164, +0.155]; 3 of 5 quarters positive. Pre-registered pass criterion required median ρ ≥ +0.30 AND ≥5/6 positive quarters. Both legs failed.
- Specific claim disproved
- The frozen V1 V3b composite produces a stable per-quarter Spearman with realized log PnL across the 2024Q1 to 2025Q2 window.
- Effect-size CI (Brier delta)
- Median per-quarter Spearman = +0.038, range [-0.164, +0.155] across 5 quarters
- Direction
- Per-quarter IC is unstable; one quarter (2025Q1) is meaningfully negative with CI [-0.221, -0.102].
- Pre-registered TOST equivalence range
- Pre-registered: median ρ ≥ +0.30 AND ≥5/6 positive quarters. Observed: median +0.038, 3 of 5 positive. Both legs fail.
- Filed / ran
- Filed 2026-04-25, ran 2026-04-27
What remains open
- ·Exploratory extension to 2025Q3 to 2026Q2 (NOT in pre-reg) shows 2026Q1 ρ = +0.281, CI [+0.249, +0.314]; whether IC stability returns in more recent windows is open and pre-registered for replication as #11 in the 12-month plan
- ·Cross-quarter autocorrelation of per-quarter ρ values is +0.366 lag-1, so when the V3b signal is on it tends to persist; the failed E7 test was on the level of the IC, not its persistence
Implications for product
V3b is not marketed as a stable per-quarter IC predictor. Forward-validation rolling-Spearman publication (#11 in the 12-month plan, ships 2026-10-01) is the next opportunity to revisit the per-quarter stability claim with a fresh pre-reg.