Back to Blog

April 22, 2026 · 8 min read · Operator's guide

Stop Optimizing Calibration. Start Optimizing Conviction.

The 10,000-wallet study told you what is not true. This post tells you what to do about it. Five operational rules, grounded in the data. Three moves to make this week. Three to stop.

Convexly scored 8,656 Polymarket wallets and published the methodology paper this weekend. Both artifacts answered a research question: is calibration the profit driver on prediction markets? The answer is no. Across the full leaderboard, Spearman rank correlation between Brier score and realized PnL is +0.148. Among the top 100 wallets by profit, the correlation flips negative: worse-calibrated whales earn 2.02 times the median profit of the best-calibrated ones.

That is a research finding. It tells you what is not true. It does not tell you what to do. A Polymarket trader reading the 10K study learns that spending more time getting better calibrated is probably not going to make them more money. Then the post ends. The trader closes the tab, no better equipped than before.

This post closes that gap. Every finding in the V1 research stack implies a specific operational change. Here they are, one at a time, with the numbers that back them and the action that follows.

Rule 1: Stop optimizing your Brier score.

The evidence. Across 8,656 ranked Polymarket wallets, Spearman correlation between Brier score and signed log PnL is +0.148. That is not a strong predictor. It explains roughly 2% of the rank variance in profit. More damning: 62% of the best-calibrated quartile on the leaderboard is net negative. Being in the top quarter of forecasting accuracy puts you in a pool where most people lose money.

The implication. If your practice on Polymarket is taking calibration quizzes, journaling probabilities, and checking your Brier score weekly to improve it, you are tuning the wrong dial. Calibration is table stakes. It prevents you from blowing up on overconfident bets. It does not make you money. The traders making money are not the most careful. They are the most concentrated on asymmetric single events.

What to do instead. Measure calibration once (the wallet analyzer does this in 30 seconds). Confirm you are not systematically overconfident. Move on. Put the hours you would have spent on calibration drills into identifying the one or two market categories where you have structural edge.

Rule 2: Size for the one bet you are right about, not the ten you might be.

The evidence. In the Edge Score V3b composite, the conviction pillar weight is 2.7220 on standardized concentration. The posture (calibration) pillar weight is 0.7876. Conviction dominates posture by a factor of roughly 3.5. Empirically: the median wallet on the 8,656-wallet cohort derives 66.2% of its realized PnL from a single event. Separately, 69.9% of wallets have more than 50% of PnL from one event. This is not diversification. This is one directional bet that paid off.

The implication. The winning shape on Polymarket is barbell concentration, not spread-your-bets portfolio construction. Most of your capital should sit in cash or the smallest possible positions. A small fraction should sit in one or two events where you have real edge. This pattern is what the leaderboard rewards. It is also what classical forecasting advice would tell you to avoid.

What to do instead. Identify one to three market categories per quarter where your view differs from consensus by a margin you can defend (use your existing category knowledge, not broad Polymarket reading). Size 5-15% of bankroll on each qualifying bet in those categories. Do not participate in the other ninety categories regardless of how interesting the headlines are.

Rule 3: Barbell first. Quarter-Kelly is a secondary constraint inside it.

The evidence. The Hill tail index estimator on realized Polymarket PnL returns α = 1.28 (95% CI 1.20 to 1.36). For any α below 2, the variance of the underlying distribution is formally infinite. The Kelly criterion assumes known true probability and finite-variance returns. On Polymarket, neither assumption holds. Full Kelly sizing under those conditions is mathematically ruin-optimal as soon as your probability estimate is wrong by any amount.

The Taleb frame. Under non-ergodic single-path realizations, Kelly is not the right reference at all. Taleb's barbell (Antifragile, Chapter 11 "Never Marry the Rock Star") is the operational primitive: 80 to 90 percent of bankroll in cash or the safest instrument available, 10 to 20 percent in maximally convex speculative positions, and nothing in the middle. The speculative leg is where Polymarket lives. Within that leg, Peters (2019) on ergodicity rules out full Kelly: the expected-value-maximizing bet size is not the time-average-survival-maximizing bet size. Silent Risk (§3.2) and Statistical Consequences of Fat Tails argue the same from the fat-tail side: the variance estimator that Kelly depends on is itself unstable under power-law payoffs, and fractional Kelly inherits the instability.

The implication. Start with the barbell cap: never put more than 10 to 20 percent of total bankroll into Polymarket-type positions collectively. That is the primary constraint and it is non-negotiable. Inside that speculative leg, quarter-Kelly is a compromise between Kelly-optimal and Taleb-optimal: Kelly wants more, Taleb's barbell wants the bet sized so that the speculative leg can be wiped out without impairing the safe leg. Quarter-Kelly approximates that. It is not the Taleb position; it is the Kelly-family concession to fat tails inside a barbell structure. Yoder's lecture on capital management recommends half-Kelly as the defensible regime for normal-tail markets. Under Hill α = 1.28, even half-Kelly is aggressive inside a barbell and quarter-Kelly is the floor.

What to do instead. First, set your total Polymarket exposure cap at 10 to 20 percent of total bankroll (barbell speculative leg). Second, inside that cap, compute the Kelly-optimal fraction per position using your subjective probability and take a quarter of that as the per-position ceiling. Third, apply a hard per-event cap at 10 percent of the speculative leg. Never treat any single position as one you cannot survive losing completely. This loses you 75 percent or more of the theoretical geometric growth rate in exchange for surviving fat-tail draw-downs that would otherwise end your career.

Rule 4: Do not copy whales unless you understand their edge.

The evidence. The 10K study found that the top 100 wallets by profit cluster in the worst-calibrated quartile. Their Edge Score is driven overwhelmingly by the conviction pillar (concentration of PnL in a single event), not by forecasting accuracy. A recent public example: Convexly scored one such wallet at Edge Score 83.7 on an Ethereum March event; 86.4% of that wallet's realized $56,970 PnL came from a single market position. The wallet is structurally not copy-tradeable because the edge was specific to that trader, that market, and that sizing judgement at a specific moment.

The implication. Copy-trade bots and services that promise "replicate whale X" replicate the wrong thing. They copy the positions but not the judgement. The positions are the artifact of an edge the whale had; the edge is not the positions. If a whale bet 10% of their bankroll on ETH March at the right moment and you copy that trade next week at 10% of your bankroll, you are not replicating their edge. You are replicating their concentration risk without their reason for taking it.

What to do instead. Do not copy-trade whales unless you are running at least five parallel copies of different wallets to diversify the fat-tail single-winner risk. Or, if you must copy a single whale, understand why that wallet's concentration pattern is repeatable for the next twelve months. If you cannot explain it in two sentences, skip it. Run the Edge Score breakdown on the wallet first. If the high score is driven by conviction percentile 90+, that is a concentrated single-event story, not a repeatable edge.

Rule 5: Expected value is not your friend here.

The evidence. Under Hill α = 1.28 on realized PnL, the first moment of the distribution (the expected value) may exist as an integral, but the second moment (the variance) does not. This means the law of large numbers does not converge in the way most traders implicitly assume. If you plan around expected value ("this bet has +EV of $50 so I'll do it 100 times and make $5,000"), the math is deceiving you. One tail event can wipe out any finite sample of +EV decisions.

The implication. Planning around expected value under infinite variance is the same category of mistake as planning a sailing trip based on the average wave height. The average is accurate and useless. The rogue wave is what matters. For prediction-market sizing, the operational anchor is not expected value; it is median outcome and worst-case drawdown survival.

What to do instead. When sizing a position, do not ask "what is my expected value?" Ask two questions: "What is the median outcome if I make this bet 20 times?" and "What is the worst outcome if it resolves against me three times in a row?" If the answer to the second question is "I blow up," the position is too large regardless of positive EV.

Three moves to make this week

  1. Audit your biggest realized PnL event from the last six months. Was it a decision you can repeat? Or was it a lucky macro call you have no procedure for identifying next time? If the latter, treat it as the outlier it probably was. Do not extrapolate strategy from a single favourable fat-tail draw.
  2. Identify the one to three market categories where you understand the domain better than the median Polymarket participant. For most traders with any domain expertise at all, this list is short. Politics in your home country, sports you follow closely, specific regulatory outcomes in an industry you work in. Write them down. Those are the only markets you should be betting in meaningfully.
  3. Cut your active position count by half. Look at your open Polymarket positions right now. Close anything that is not in one of your identified edge categories, or where you cannot defend your probability estimate against someone who knows more than you. The discipline pillar of Edge Score rewards fewer, larger, concentrated bets. The data is clear that the winning shape on this platform involves restraint, not breadth.

Three moves to stop making

  1. Stop using weekly calibration drills as a trading-improvement strategy. They do not move the profit needle. They move the don't-blow-up needle, which is important but different. Check your calibration once. If it is in the middle of the distribution or better, move on to sizing and concentration.
  2. Stop copy-trading whales without running their Edge Score pillar breakdown first. A high Edge Score driven by conviction is structurally different from one driven by posture or discipline. Conviction- driven whales are not copy-tradeable at retail sizes without the fat-tail diversification that retail sizes cannot afford.
  3. Stop sizing by feel. "This one feels like a 10% position" is the most common way traders destroy the geometric-growth advantage that careful Kelly-adjusted sizing would give them. Either compute a quarter-Kelly fraction, or use a flat percentage rule (e.g., 2% of bankroll per qualifying bet) and stick to it.

What Taleb would still push back on

This post leans heavily on positions that Nassim Taleb has argued publicly for decades: fat tails break Gaussian-era sizing, Kelly breaks under power laws, calibration is a proxy that does not guarantee survival, and concentration with convexity beats diversification with mediocrity. The recommendations above are consistent with Taleb's work. They are not complete from his standpoint. Three critiques a careful reader of Taleb would still raise:

First, ergodicity. The cohort result (top 1% captures 36.2% of signed profit) is a cross-sectional observation. Taleb and Peters would push back: a cross-sectional average tells you what happens across traders at one moment; it does not tell you what any single trader experiences over time. The time-average path of a representative trader is worse than the cross-sectional average suggests, because the traders who blew up in year one are not in the year-three sample. This post's advice about concentration is defensible because it is framed as "survive the fat tail to stay in the game," but the statistical framing should be read with that caveat in mind.

Second, the Peso problem. Polymarket is a young venue. The full tail of possible outcomes has not materialized in sample. A catastrophic event (venue hack, regulatory shutdown, market manipulation at scale) is inherently absent from the 8,656-wallet historical cohort. Any sizing rule learned from history underestimates the drawdown it needs to survive. Taleb would require a position-sizing floor below what purely statistical analysis of the historical cohort would suggest. This is why the hard cap at 5 to 10 percent per event matters more than the Kelly math.

Third, skin in the game. The Edge Score composite measures pattern-fit to historical profit. It does not measure whether the trader has exposure commensurate with the bets they make. A trader playing with someone else's capital, or with small sums that would not materially affect their life, is not in the same risk regime as a trader whose livelihood depends on the outcome. Edge Score does not adjust for this. Neither does this post's advice. Take your own skin-in-the-game as an input no metric can capture.

These three critiques are part of what V1.5 of the methodology paper is designed to address directly. In the meantime: read the rules above as operational defaults, not as substitutes for the basic Taleb discipline of surviving before optimizing.

Who this is for, and who this is not for

This post is for active Polymarket traders with at least a few thousand dollars deployed who want an operational framework grounded in data rather than intuition. It is for anyone who has read the methodology paper or the 10K-wallet study and asked "so now what?"

This post is not for casual bettors who enjoy prediction markets as entertainment. Nothing here is wrong for them, but the cost-benefit of running quarter-Kelly optimization on a $50 bankroll is low. It is also not for anyone looking for picks. There are no picks here. The only specific markets mentioned are historical examples, not forward bets.

The bottom line. Most Polymarket traders lose money. The ones who make money do not win by being more calibrated than the crowd. They win by being more concentrated, more disciplined, and more patient than the crowd, on the specific markets where they have edge. The research proves this. The practice follows.

Run the Edge Score breakdown on your own wallet.

See your posture, conviction, and discipline percentiles against the 8,656-wallet training cohort. Free, no signup, 15 to 30 seconds.

Analyze a wallet

Related reading

The underlying research: 10,000 Polymarket Wallets Scored and the full 13-page Edge Score Methodology V1 paper.

On why the Calibration pillar was renamed to Posture after the OLS fit showed the direction of the effect was opposite to the label: Posture, not Calibration.

On the concentration pattern at the top of the leaderboard: the top-100 whale calibration audit.