Learn

What is the false discovery rate?

Test enough wallets and some look skilled by chance alone. The false discovery rate correction is how a leaderboard stops manufacturing winners. Plain-English mechanics plus the worked example from our own published cohort.

The answer first

The false discovery rate (FDR) is the expected share of your "positive" findings that are actually chance. Every statistical test has a false-positive rate; run one test and it is a footnote, run dozens and it is a factory. At a 2.5 percent one-sided threshold, about one test in forty comes back positive on pure noise. A wallet leaderboard is implicitly one test per wallet, so an uncorrected screen of thousands of wallets will surface skilled-looking records that are nothing of the kind, at a predictable rate. Controlling the FDR at a level q (Convexly's primary screen uses Benjamini-Hochberg at q = 0.10) bounds the expected share of chance findings among everything reported as positive.

Worked example: our own top-50 cohort

In the frozen 2026-06-09 scan of Convexly's own published top-50 Polymarket cohort, 35 of the 50 wallets had at least 30 resolved positions and could be tested. The arithmetic before any correction:

35 tests × 0.025 one-sided threshold = 0.875 expected false positives under a null of zero skill

Observed: exactly 1 of 35 intervals cleared zero on the positive side before correction, which is almost exactly what chance predicts. After the Benjamini-Hochberg correction at q = 0.10, that single positive does not survive: 0 of 35cleared. The same wallet's net PnL on the board is negative, which is its own caution against reading one uncorrected interval as a verdict. The full per-wallet table is at /research/top50-skill-scan.

For scale: the platform-wide corrected screen at q = 0.10 cleared 178 of 3,871 wallets, and none of the published top-50 cohort are among them. Had that screen run uncorrected at 2.5 percent, roughly 97 of 3,871 would have cleared on chance alone even with no skill anywhere in the population.

How Benjamini-Hochberg works

Sort all the p-values from smallest to largest. Find the largest rank k such that the k-th p-value is at or below (k / m) · q, where m is the number of tests. Every test up to rank k clears; everything after does not. The intuition: the smallest p-value in a batch of 35 has had 35 chances to be small, so it must beat roughly q / 35, not q, to count. The bar tightens automatically with the number of tests. In enterprise cohort work Convexly also reports the survivor count at q = 0.05 and q = 0.20 as a sensitivity sweep, so a reader can see whether a result depends on the choice of q.

What FDR-cleared does NOT mean

FDR-cleared is a statistical property: the read survives being one test among many. It is not a certification, not an endorsement, and not an instruction to act. It does not forecast future performance, and it says nothing about whether the record would survive a different screen (the concentration flag, for example, demotes records that a p-value alone would pass). The frozen wording lives in the lexicon, and the same correction is one pillar of the enterprise cohort audit.

See the corrected read on any wallet

Paste a Polymarket wallet address into the free analyzer to get its realized-edge read with its interval, the same statistic the corrected screen is built on.

Convexly publishes new methodology research roughly every 6-8 weeks plus the /learn series on a rolling cadence. Get the next paper in your inbox when it ships:

Frequently asked

What is the false discovery rate in plain English?
The expected share of your 'positive' findings that are actually chance. If you run enough tests, some come back positive even when nothing real is there: at a 2.5 percent one-sided threshold, about one in forty tests does. Controlling the false discovery rate at, say, q = 0.10 means that of everything you report as positive, no more than about 10 percent is expected to be noise.
How does the Benjamini-Hochberg correction work?
Sort the p-values from all tests smallest to largest. Walk down the list and find the largest rank k where the k-th p-value is at or below (k divided by the number of tests) times q. Everything up to that rank clears; everything after does not. The bar tightens automatically as you run more tests, which is exactly the discipline a leaderboard of many wallets needs.
Why do wallet leaderboards need a multiple-testing correction?
Because a leaderboard is implicitly running one test per wallet. Screen 3,871 wallets for a positive edge and, even if none had real skill, roughly 97 would clear an uncorrected 2.5 percent threshold by chance. Without a correction the leaderboard manufactures skilled-looking wallets at a predictable rate. Convexly's platform-wide corrected screen (Benjamini-Hochberg, q = 0.10) cleared 178 of 3,871 wallets; none of its own published top-50 cohort are among them.
What does FDR-cleared mean on a Convexly surface?
That the wallet's realized-edge read survives being one test among many at the corrected bar. It is a statistical property of the resolved record, not a certification, not an endorsement, and never an instruction to copy a wallet. A past read is not a forecast.
Did any wallet in Convexly's own top-50 cohort clear the corrected bar?
No. In the frozen 2026-06-09 scan, 35 of the 50 had enough resolved positions to test, exactly one interval cleared zero before correction (about 0.9 expected by chance across 35 tests), and zero survived the Benjamini-Hochberg correction at q = 0.10. That null is published in full at /research/top50-skill-scan.

Related explainers

Related reading

AnswersWhat is a good win rate on Polymarket

BlogBase rate neglect

LearnBrier score

LearnCalibration