What is a negative control?
Run the same skill test on inputs that should show nothing. If random, size-matched wallet cohorts light up the machinery at the same rate as the cohort under review, the finding was never a finding.
The answer first
A negative control is the placebo arm of a measurement. You run the identical test on inputs where the effect should be absent, and whatever the pipeline reports there is your chance baseline. A drug trial gives sugar pills to a control group; a lab assay runs the reagents with no sample; a wallet-cohort audit runs the identical realized-edge test, with the identical FDR correction, on random cohorts of the same size as the cohort under review. A real finding has to beat that baseline, not just beat zero.
The reason this matters for wallet cohorts specifically: cohorts are almost always selected after outcomes are known. "Audit these 20 winners" is an outcome-conditioned sample, and outcome-conditioning inflates the apparent skilled-rate all by itself. The negative control is the anchor that separates "this cohort is unusual" from "any random handful of wallets looks like this".
How the baseline is built
In Convexly's enterprise cohort audits, the control is constructed like this:
- Draw 500 random cohorts, each exactly the size of the cohort under review, from a pool of scoreable wallets, with the reviewed wallets excluded. The PRNG seed is recorded, so the draws are reproducible.
- Run each draw through the identical pipeline: realized entry edge with a BCa 95 percent interval, the concentration screen, and the Benjamini-Hochberg correction.
- Report the mean skilled-rate across the 500 draws (the chance baseline) and an empirical p: the fraction of random draws whose FDR survivor count is at least the reviewed cohort's.
One disclosure travels with the number. The random pool is restricted to data-rich, scoreable wallets (at least 30 resolved positions with a usable interval), because a wallet with too few resolved positions cannot be put through the same test. The baseline is therefore a baseline over scoreable wallets, not the full address space, and that restriction is conservative: it makes the random draws clear the test at least as often as a full-address-space draw would, so the reviewed cohort's separation gets harder to claim, never easier.
Worked example: the chance arithmetic on our own cohort
The simplest negative-control logic needs no simulation at all. In the frozen 2026-06-09 scan of our own published top-50 cohort, 35 wallets were testable at a 2.5 percent one-sided threshold, so under a null of zero skill the expected number of uncorrected positives is 35 × 0.025 = 0.875, call it about 0.9. Observed: exactly 1 of 35, which is what noise predicts, and it did not survive the correction at q = 0.10. The cohort's result matches its own chance baseline, and that is exactly what we published at /research/top50-skill-scan. The 500-draw control generalizes the same question to cohorts where the answer is less clean.
The honest-null rule
When the control cannot be computed (no random pool was supplied to a run, or the pool is smaller than the cohort), it is reported as null with the reason stated, never fabricated. A synthesized baseline would quietly poison every number that leans on it, and a reader who cannot see which runs had a real control cannot trust any of them. The same rule governs the rest of the pipeline: an inconclusive result is a publishable result.
Where it is used
The size-matched negative control is a standing component of the enterprise cohort audit, where it anchors the skilled-rate and survivor count of every client-supplied cohort against chance. The underlying per-wallet statistic and the correction it pairs with are covered in the realized-edge and false-discovery-rate explainers.
Convexly publishes new methodology research roughly every 6-8 weeks plus the /learn series on a rolling cadence. Get the next paper in your inbox when it ships:
Frequently asked
What is a negative control in statistics?
Why does a wallet cohort audit need one?
How does Convexly compute the negative control?
What is the pool restriction, and why is it disclosed?
What happens when the control cannot be computed?
Related explainers
- /learn/false-discovery-rate: the correction each control draw runs through
- /learn/realized-edge: the per-wallet statistic being controlled
- /learn/luck-share: the published cohort reading that matched its own chance baseline
Related reading
ResearchNegative results
LearnBrier score
LearnCalibration
LearnConcentration flag