# Edge Score V1-M: public data bundle

This bundle accompanies the Convexly Research paper *Edge Score V1-M:
methodology extension and cross-venue invariance measurement* (2026).
Paper: https://www.convexly.app/research/edge-score-methodology-v1m

## What is in this bundle

`users_aggregated.csv` – 15,106 rows, one per Manifold user who placed
at least 20 resolved bets on BINARY cpmm-1 or MULTIPLE_CHOICE
cpmm-multi-1 markets between December 2021 and April 2026. User
identifiers are SHA-256 hashed with a release-specific salt and
truncated to 16 hex characters. Raw bet records are not redistributed.

Columns:

  user_hash          – 16-hex-char hashed identifier
  n_resolved_bets    – count of user's resolved bets in scope
  n_binary           – count on BINARY cpmm-1 markets
  n_multi            – count on MULTIPLE_CHOICE cpmm-multi-1 markets
  n_unique_events    – count of (contract, answer) pairs
  avg_brier          – mean per-bet Brier score (lower is better)
  baseline_brier     – user's marginal baseline Brier
  skill_brier        – baseline_brier minus avg_brier
  baseline_prob      – user's marginal YES-rate on binary bets
  realized_pnl       – total realized PnL in mana
  biggest_event_pnl  – signed PnL on the user's single largest event
  concentration      – |biggest_event_pnl| / |realized_pnl|
  win_rate           – share of the user's bets that resolved YES

`sweepcash_analysis.json` – Window-level cohort summaries and
within-user deltas across the four windows used in Section 5.5 of the
paper: bulk (pre-July 2024), gap_pre_sweepcash (July-September 2024),
sweepcash (September 2024 through March 2025), post_sweepcash (after
March 2025). Within-user records are anonymized.

`refit_coefficients.json` – OLS fitted coefficients for the Manifold
V1-M cohort, with bootstrap 95% CIs (2,000 resamples) and permutation
null p-values (10,000 shuffles). Includes the frozen Polymarket V1
coefficients for comparison.

`window_stats.json` – Extracted window_summaries block from the
sweepcash analysis, for convenience.

`headline_numbers.json` – Canonical machine-readable version of the
headline statistics reported in the paper. Use this if you want to
cite exact numbers in a downstream analysis.

`reproduce.py` – Standalone Python script (no private dependencies)
that pulls the Manifold /v0 public API, recomputes the three pillars
and Hill-alpha tail index for a supplied set of user handles, and
writes a CSV in the same shape as `users_aggregated.csv`. Requires
Python 3.10+ and standard library only.

## License

Code in this bundle: MIT. See `LICENSE.md`.

Data in this bundle: aggregated summary statistics derived from the
Manifold Markets public bulk dump (https://docs.manifold.markets/data)
and the Manifold /v0 public API. Manifold publishes this data for
research use. Redistribution of aggregated summary data as done here
is permitted for research purposes; redistribution of raw per-bet
records is not performed here.

Kalshi trade-level Hill alpha summaries are derived from the Kalshi
public /trades API. Only aggregated statistics are redistributed.

## How to cite

Convexly Research (2026). Edge Score V1-M: methodology extension and
cross-venue invariance measurement. Working paper. Convexly.
https://www.convexly.app/research/edge-score-methodology-v1m

## Feedback

research@convexly.app
