Most “whales” are not skilled
Convexly’s first weekly read on prediction-market wallets, signals, and methodology.
Headline finding · 2026-05-03 snapshot
Across the Top-50 Polymarket cohort, the rank-correlation between Edge Score and realized PnL is Spearman ρ = −0.32 (p = 0.026, n = 50).
A negative correlation. In this slice, higher Edge Score is associated with lowerrealized PnL, not higher. The dollar leaderboard and the skill leaderboard disagree about which wallets look skilled, and they disagree systematically — not by a small amount of noise.
This is the first issue of a recurring weekly post. Every Sunday we’ll share what changed in the Top-50 leaderboard, surface one wallet whose PnL overstates skill, summarize any CME signals that resolved that week, and explain one methodology choice in plain language.
The Brief is built off the same public artifacts the rest of Convexly operates on: the edge-score-top100.json snapshot, the CME signal ledger, and the public claim registry at /evidence. Every number has a receipt.
What Edge Score V3b says today
The Top-50 leaderboard at /research/wallet-rankings ranks 50 Polymarket wallets with at least 20 resolved positions, by the Edge Score V3b composite (calibration / conviction / discipline). The 2026-05-03 snapshot had a Polygon block anchor at height 86,337,463.
The current top 5:
| Rank | Address | Edge Score | Realized PnL | n positions |
|---|---|---|---|---|
| 1 | 0x0e9be66f…ce4e | 99.5 | $5,183 | 22 |
| 2 | 0xf386fb70…f22c | 99.4 | $5,788 | 22 |
| 3 | 0x48200256…1c86 | 99.4 | $6,024 | 23 |
| 4 | 0x8988e1d0…5dd5 | 99.3 | $9,530 | 26 |
| 5 | 0xd68cd8fe…f3e9 | 99.1 | $11,070 | 20 |
The numbers above are headline-friendly but they are not the interesting finding. The interesting finding is what happens if you re-rank by raw PnL.
The PnL-rank vs Edge-rank gap
If you sort the same Top-50 cohort by realized PnL alone, the order shuffles substantially. The top 5 by raw PnL:
| PnL rank | Address | Realized PnL | Edge Score | Edge rank |
|---|---|---|---|---|
| 1 | 0xc2e7800b…be51 | $1,532,710 | 94.4 | 36 |
| 2 | 0x2cc5404d…91d7 | $132,265 | 59.2 | 44 |
| 3 | 0xffa43496…1f7f | $124,549 | 60.8 | 42 |
| 4 | 0xbf49d1ad…5608 | $82,905 | 96.5 | 20 |
| 5 | 0x682e3886…70bd | $78,351 | 55.7 | 45 |
Three of the top-five PnL wallets in the cohort rank in the bottom quartile by Edge Score. Their dollar outcomes are real; their prospective skill signal is not particularly strong.
False whale of the week
0x682e3886…70bd made $78,351 in realized PnL on Polymarket. Edge Score V3b puts it 45th out of 50in the cohort — near the bottom of the calibration-weighted skill ranking we surface publicly.
The disconnect is the framing: a $78K realized return on the timeframe of a typical Polymarket positions cycle (months) is a great-looking absolute number; the Convexly composite reads the same wallet’s posture, conviction, and discipline percentiles and finds that the return is more consistent with high-conviction bets that happened to work than with repeatable skill. Cohort median PnL is $13,357, so $78K is real outperformance — but Edge Score ranks the wallet near the bottom of the cohort because the underlying components are.
If you want to copy a wallet’s behavior, copy from the Edge Score top — not the PnL top. Most “whales” are not skilled in the calibration-and-conviction sense; some are lucky in the small sample of resolved positions they have.
The negative-correlation finding from the lede is the cohort-wide version of this: the false whale isn’t an outlier, it’s a pattern. Edge Score is picking up a different signal than the dollar column. The ρ = −0.32 result is one-snapshot descriptive — meaningful for “is the disagreement real” but not a forward-validation claim. The forward-validation work lands in Brief #2 once enough daily snapshots accumulate.
The Edge Score V1-M paper at /research/edge-score-methodology-v1m documents the construction, and the public reproduce.py script lets anyone re-run the ranking on the underlying dataset.
CME signals this week
The Coherent Markets Engine (CME) emitted signals through the daily 06:30 UTC pipeline this week as usual. The realized track record at /research/cme/track-record shows the cumulative count of emitted signals; the per-signal table is gated to the Researcher tier so paying customers retain exclusive access to the methodology disclosure.
We do not yet have enough resolved signals to publish a hit-rate point estimate this week. The first batch of resolutions is expected in late July 2026 once the V0.2 backtest target window closes, per the methodology referenced on the track-record page.
When the resolved-signal count clears 10, we’ll publish a rolling realized scorecard with confidence intervals. Until then, the value of the signal feed is the methodology + audit chain, not the realized numbers.
One methodology note: why we don’t rank by PnL
Edge Score V3b is a frozen-coefficient composite of three pillars (calibration / conviction / discipline). Realized PnL is not a pillar. There are two reasons:
- Sample size. A wallet with 22 resolved positions and a $33K realized PnL has noisy enough returns that a single high-stakes correct call can dominate the dollar number. Edge Score weights calibration across positions, less dominated by any single bet.
- Forward predictability.A wallet’s PnL rank in cohort N is a weak predictor of its PnL rank in cohort N+1. A wallet’s calibration percentile is a stronger predictor — the V1-M paper documents out-of-fold Spearman +0.514 on the V1 cohort versus +0.147for the Brier-only baseline (permutation p < 0.0001, n = 8,656).
Edge Score is what we’d want to copy. Realized PnL is what we’d want to follow once Edge Score has already made the pick. The leaderboard ranks by the first; the per-wallet detail surfaces both so you can compare.
What’s coming next
- A daily wallet-rank persistence metric: how often a wallet in the top decile this week is in the top decile next week. The Convexly hypothesis is that calibration-weighted ranks move less week-over-week than PnL ranks. We will publish the side-by-side test once enough daily snapshots accumulate (starting in two weeks).
- An alert when one of the wallets in your saved watchlist crosses a meaningful threshold (rank move of 5 positions or more, edge-score delta of 10 percentile points or more). Email-channel arrives this week; Telegram and Discord land after.
- A “high-Edge wallet entered this market” signal that fires when one of the top-decile wallets opens a new position above a notional threshold. Gated behind upcoming Ledger 2 work.
Next steps if you’re new here
- Browse the Top-50 leaderboard
- Check any Polymarket wallet’s Edge Score
- Save a wallet for the Monday digest (no signup; one-line email when behavior changes)
- Read the Edge Score V1-M methodology paper
- Verify any claim on this page at /evidence
The Brief is published Sundays. Reply to research@convexly.app with what would be useful in next week’s issue.