Posture, not Calibration: aligning a pillar with its coefficient

The pillar that contradicted its own name

The Edge Score V3b composite is a weighted sum of three z-scored features. The V1 release referred to its three pillars as Calibration, Conviction, and Discipline. The feature underlying the Calibration pillar was z(-skill_brier), the negated standardized difference between a wallet's baseline Brier and its observed Brier.

Skill-Brier is positive when a wallet's realized Brier beats the Brier of always predicting the wallet's own marginal frequency. Higher skill-Brier is the marker of classical calibration: the trader assigns probabilities that match outcome frequencies more precisely than a trivial baseline. Low Brier and high skill-Brier are therefore what the word “calibration” labels in common usage.

The OLS fit on 8,656 Polymarket wallets produced a coefficient of +0.7876 on z(-skill_brier). The negation is load-bearing. A higher composite contribution comes from a lower skill-Brier, not a higher one. Wallets whose realized profit is not driven by beating their own marginal frequency scored higher on the composite. Well-calibrated traders were actively pulled down.

The mistake the old label created

The V1 display logic flipped the sign of the Calibration pillar for “user intuition.” A well-calibrated user saw a high Calibration percentile. That display was orthogonal to how the feature entered the composite. The net effect: a user could read a Calibration percentile of 85 next to an Edge Score in the bottom decile and correctly conclude that the labels contradicted each other. The product had inherited a pre-empirical prior about what calibration should do to a trader's score and left the display wired to that prior after the data said otherwise.

The single most defensible fix was to align the label with the direction of the effect. Posture reads honestly. It captures how a trader positions relative to their own base-rate calibration without asserting that precision calibration is the mechanism by which the top of the leaderboard produces profit. On this cohort, it is not.

What the coefficient actually measures

Interpreted literally: a one-standard-deviation move in z(-skill_brier) adds 0.7876 standardized points to a wallet's raw composite score. The raw score is then mapped to a 0-100 percentile against the frozen training cohort. The pillar is real and statistically nonzero across the five pre-registered out-of-sample experiments reported in the V1 methodology paper. It is simply weaker than Conviction (+2.7220) and opposite in direction to what most readers assume the word “calibration” implies for prediction-market profit.

The operational read for a trader: precision calibration is not what separates the top of this leaderboard from the middle. Conviction and sizing do. A trader who optimizes purely for lower Brier is optimizing for a pillar with a 3.5x smaller weight than Conviction and with a sign that rewards distance from classical calibration, not proximity to it. The finding does not say calibration is worthless. It says calibration is not the binding constraint on Polymarket profit among wallets with five or more resolved positions.

Why the rename and not a sign flip

Two reversible fixes were available. The first was to flip the sign of the composite coefficient and relabel the feature as z(skill_brier). That move preserves the word “calibration” on the display. The second was to keep the feature as z(-skill_brier), keep the positive coefficient, and rename the pillar. The second option was chosen because the paper's finding is the claim that calibration is not the profit mechanism on this cohort. Flipping the sign of the feature would have let the label survive at the cost of burying the empirical claim. Renaming kept the finding visible.

The deprecated calibration_percentile and calibration_contribution fields remain in the API response envelope for backwards compatibility with the handful of callers that were already pinned to the old names. They are populated with the same values as posture_percentile and posture_contribution. The aliases will be removed in a later major API version; the engine module constant COEF_CALIBRATION is a similar backwards-compat alias to COEF_POSTURE.

The honest generalization

If a product has a metric with a name that asserts what the metric means, and the data shows the metric does something else, the rename is usually cheaper than the sign flip. Names are load-bearing in a way that coefficients are not. A user who sees a metric behave opposite to its label will trust the next metric less. A user who sees a deliberate rename with a documented reason will trust the next rename more. The cost is a week of deprecation shims. The benefit is the next methodology paper gets read with less suspicion than the current one.

See your own Posture, Conviction, and Discipline percentiles.

Free Polymarket wallet analyzer. No signup, no private key, 20 seconds of analysis. Returns the full V3b composite against the 8,656-wallet training cohort.

Analyze a wallet

Related: Edge Score Methodology V1 (full paper) and 10,000 Polymarket Wallets Scored (empirical foundation).