Edge Score Methodology V1.5
Pre-registered follow-up to the V1 paper. Two primary tests (E2 + E7) plus five supplementary analyses on the V1-M position tape (8,778 wallets, 542,241 resolved positions). Pre-registration: AsPredicted #287368, filed 2026-04-25, before any code was written.
Headline result
Both pre-registered primary tests failed the ex-ante pass criteria set in the V1 paper §5.3 / §5.7. Per the pre-registration's no-post-hoc-re-analysis rule, this is reported as a partial falsification of V1's stability claims when the composite is run on the V1-M position tape with strict temporal splits.
The five supplementary analyses give a fuller picture of where V3b holds up (cross-category transfer; partial-correlation survives controlling for log_capital) and where it does not (forward-temporal; high-volume wallets; persistent-wallet PnL alignment).
E2 - Per-wallet temporal holdout (PRE-REGISTERED, FAIL)
Training window 2024-01-01 to 2025-09-30, 14-day purge, held-out 2025-10-15 to 2026-04-15. Refit V3b OLS on the training-window pillar values and score paired wallets on their training-window pillars; Spearman of refit V3b vs held-out signed log PnL.
Refit V3b: Spearman = -0.082, 95% CI [-0.149, -0.006], p = 0.020, N = 805 paired wallets. The pre-reg pass criterion was Spearman ≥ +0.30 with CI lower bound > 0; the result is on the wrong side of zero. The refit posture coefficient flipped sign vs the frozen V1 coefficient (+0.79 in V1, -1.29 in the V1.5 training fold).
Secondary (frozen V1 coefficients): Spearman = +0.111, 95% CI [+0.046, +0.175]. Positive and significant, but well below +0.30.
E7 - IC temporal stability (PRE-REGISTERED, FAIL)
Per-quarter Spearman of frozen V3b vs signed log PnL, pillars recomputed within each quarter.
- 2024Q2 (N=70): ρ = +0.155, CI [-0.103, +0.401]
- 2024Q3 (N=321): ρ = +0.056, CI [-0.066, +0.162]
- 2024Q4 (N=924): ρ = +0.038, CI [-0.026, +0.100]
- 2025Q1 (N=1230): ρ = -0.164, CI [-0.221, -0.102]
- 2025Q2 (N=1254): ρ = -0.040, CI [-0.103, +0.018]
Median per-quarter Spearman = +0.038. Quarters with positive ρ: 3 of 5. The pre-reg pass criterion was median ρ ≥ +0.30 AND ≥5 of 6 quarters positive; both are missed.
Per-wallet quarter-over-quarter V3b ranking is stable (lag-1 Spearman median +0.31 across 8 consecutive-quarter pairs, all p < 0.01). V3b measures something persistent about wallet behavior, but its alignment with PnL is unstable across quarters.
Supplementary findings
- S3 (sample-size stratification): V3b strengthens at low position counts and weakens at high ones. 5-29 positions: OOF ρ = +0.371 [+0.339, +0.402]. 30-100: +0.150. 101+: +0.013 (CI crosses zero). Opposite of the naive "more data is better" expectation.
- S4 (partial correlation): V3b vs PnL marginal ρ = +0.32; controlling for log_capital it rises to +0.494 [+0.477, +0.512]. V3b is not a capital proxy; capital is a suppressor.
- S5 (half-life across horizons): No conventional decay pattern. Median refit ρ across three cutoffs: -0.03 (30d), +0.01 (60d), +0.06 (90d), +0.11 (180d). Forward predictive power is essentially zero at short horizons.
- S6 (per-wallet autocorrelation): Lag-1 V3b ranking ~stable (median ρ = +0.31). For 229 persistent wallets active 4+ quarters, mean V3b vs multi-quarter signed log PnL: ρ = -0.31 [-0.42, -0.20]. Striking inversion: top-ranked persistent wallets underperform on cumulative PnL.
- S7 (cross-category transfer, heuristic): Refit V3b transfers across Sports / Crypto / Politics with Spearmans +0.17 to +0.33, comparable to within- category baselines. The composite is more stable across categories than across time.
How V1.5 reframes the V1 claims
V3b is a consistent ranker of wallet behavior whose alignment with PnL is sample-size and time-window dependent. This is a more limited claim than the V1 paper's framing of V3b as a forward predictor, and it is the claim the V1.5 data supports.
Concretely: V3b reliably picks out wallets with a particular behavioral fingerprint (lower Brier on their entry probabilities, higher single-event concentration, fewer positions). That fingerprint is correlated with PnL in some cohorts (low-volume sports / crypto traders, V1 cohort, recent-quarter cohorts) and uncorrelated or negatively-correlated with PnL in others (high-volume market-makers, multi-quarter persistent wallets, short-horizon out-of-sample).
Pre-registration audit trail
- AsPredicted #287368, filed 2026-04-25 before any code was written. Pre-reg URL: https://aspredicted.org/4gn42g.pdf
- Code:
services/api/scripts/v1_5_analyses/(committed before analysis ran). - Inputs:
services/api/scripts/output/wallet_all_positions_20260425_201800.csv(V1-M position tape). - Outputs:
services/api/scripts/v1_5_analyses/output/(one JSON per experiment). - Random seed: 42 (matches V1 §8 reproducibility note).
- Both primary tests reported as falsifications. No post-hoc re-analysis applied to recover a positive result. The supplementary analyses (S3-S7) were planned in the V1.5 workplan as exploratory and are reported as such.
Limitations
- Resolution timestamp is proxied by
last_fill_tsin the position tape; true resolution events are not in the V1-M extract. For most Polymarket binary markets the last fill is within hours-to-days of resolution, but for long-running markets there could be drift. - Train and test cohorts in E2 differ in composition. Many wallets are active in only one of the two windows.
- S7 categorization is heuristic (regex on slug + title); Polymarket's official category metadata is not in the extract.
- The V1-M position tape is capped at 2,000 positions per wallet, which under-samples the highest-volume wallets' full history.
References (internal)
- V1 paper: /research/edge-score-methodology-v1
- Pre-reg doc: docs/research/preregistrations/2026-04-25-v1-5-experiments-aspredicted.md
- Results doc: docs/research/v1_5/v1_5_analyses_results_20260427_191717.md
- Lopez de Prado 2018 Ch. 7 (purging + embargo protocol)
Citation: Convexly Research. (2026). Edge Score Methodology V1.5: Pre-registered E2 + E7 deferred experiments. AsPredicted #287368. convexly.app/research/edge-score-methodology-v1-5. Published 2026-04-27.