Research / Market-quality rating

Market Trust v0.2 methodology contract.

Experimental

Prediction-market odds are becoming easier to quote than to diligence. The Market Trust Card is Convexly's first public market-level rating surface. This page is the binding v0.2 math contract: weights, thresholds, evidence-depth gates, hard caps, intervals, evidence confidence, and robustness diagnostics.

Market Trust is experimental. It is not investment advice, not a compliance certification, not a claim that a market will resolve correctly, and not yet a calibrated rating model. v0.2 remains a canary-preview contract until outcome-ledger calibration and founder-approved promotion gates clear.

Experimental rating caveat

Market Trust Cards are experimental market-quality diagnostics, not compliance certifications or validated credit ratings.

What the score means

The card turns a market into a bounded quality judgment: odds are more useful when the market is coherent, liquid, deep, resolution-clean, supported by credible flow, low on integrity risk, and reproducible from artifacts. v0.2 separates the quality score from evidence confidence so the score cannot outrun the substrate.

Measured

Coherence violations from the Coherent Markets Engine daily snapshot.

Measured

Liquidity, volume, orderbook spread, top-of-book depth, and visible full-ladder depth from Gamma/CLOB artifacts.

Heuristic

Modeled depth at size from visible CLOB depth and the gamma liquidity proxy. v0.1 does not measure executable fill against a real-size trade and the score is not a tradable capacity claim.

Heuristic

Resolution reliability inverts the v0.1 resolution-risk score. Inputs are subjective wording, liquidity floor, time-to-resolution proximity, and category keyword priors; thresholds are MVP heuristics not yet calibrated against UMA dispute history.

Heuristic

Manipulation risk reads structural risk indicators (spread, volume, depth) and does not look at per-market wash trading or insider flow. A high score is the absence of structural flags, not a measurement that the market is manipulation-free.

Pending

Participant-quality rollups from wallet activity snapshots are not joined into public v0.1 cards yet, so the pillar renders unscored to keep the composite from implying informed-flow evidence we do not have.

Rating cutpoints

The composite is bucketed against four candidate cutpoints: Use at composite ≥ 80, Use with caveats at ≥ 65, Discount at ≥ 45, and Do not cite below 45. These cutpoints are not calibrated yet. v1 requires outcome-ledger calibration and an approved methodology promotion.

Measured-only score

Each card surfaces a secondary measured-only reading next to the headline composite. It is the same weighted formula but renormalized over only the pillars whose status is measured. When the headline composite and the measured-only reading disagree by more than 5 points, the heuristic mix is materially shaping the rating; the card surfaces the direction in plain language. The measured-only score is null whenever fewer than two measured pillars exist, because a one-pillar reading is not a composite.

v0.2 candidate math envelope

v0.2 separates the market-quality point estimate from the evidence confidence behind it. This is still a canary contract, not a calibrated stable rating, but the public payload now carries the fields needed to inspect uncertainty, robustness, missingness, and promotion blockers.

Market quality

market_quality_score: weighted pillar composite, separate from evidence confidence.

Uncertainty

market_quality_interval: current candidate interval around the quality score.

Confidence

evidence_confidence_score: coverage, freshness, missingness, source health, and artifact depth.

Robustness

robustness: leave-one-pillar-out range, weight-perturbation range, and sensitivity verdict.

External context

external_context: non-scoring baseline families such as macro release calendars, polling averages, futures consensus, or sportsbook consensus. These are review targets, not score inputs.

Manipulation screens

manipulation_screen: diagnostic concentration, unknown-flow, and low-edge-flow flags. These are integrity screens, not proof that manipulation exists or is absent.

Caps

hard_caps_applied: fail-closed reasons such as dropped events, missing rule text, blocked participant data, or unstable sensitivity.

Promotion

promotion_blockers: the current reasons v0.2 stays canary instead of stable v1.

v0.2 pillar weights

Coherence

20%

Liquidity

15%

Depth

15%

Resolution reliability

15%

Participant quality

20%

Integrity risk screen

10%

Audit completeness

5%

These weights sum to 100 and remain candidate weights until outcome-ledger calibration supports a v1 promotion.

Evidence-depth gate

A card can only show Use when it has at least 4 measured pillars, at least 6 measured-or-heuristic pillars, and no more than 1 unresolved pending or data-rights-blocked pillar. When the point score reaches 80 but this gate fails, the verdict is capped at Use with caveats.

Interval and confidence formulas

market_quality_interval starts from the composite, then widens downward for pending pillars, data-rights blocks, heuristic status, source-health failure, and leave-one-pillar-out sensitivity. The low end of the interval can cap the rating bucket.

evidence_confidence_score combines measured-pillar share, artifact completeness, source health, sample size, participant coverage, and freshness, then subtracts penalties for pending pillars, blocked data rights, and dropped source events. Labels are high at 80+, medium at 60+, low at 35+, and insufficient below 35.

Hard caps and robustness

Hard caps can only lower a verdict. Dropped source events or source-health failure cap at Do not cite; missing source health, missing resolution-rule text, high unknown flow, and low heuristic integrity scores cap lower buckets as appropriate. Robustness re-scores the card by dropping one pillar at a time and by perturbing each pillar weight ±20%. If the verdict changes under those sensitivity checks, the rating is capped at Use with caveats.

Refresh cadence and near-real-time data

v0.1 cards are public daily snapshots. The build runs once per day and writes the canonical bundle at apps/web/public/research/market-trust/latest.json. Each card carries a snapshot_at_utc timestamp; the card view classifies the artifact as fresh, public-current, aging, or stale based on lag from that timestamp.

v0.1 is intentionally not a real-time rating product. Streaming inputs (CLOB websockets for orderbook depth, live wallet activity for participant rollups, live UMA resolution events) land in v0.2.x once the streaming pipeline ships, and the rating model that consumes them will be pre-registered before any "near-real-time" claim appears on the card. Until then, treat the snapshot age as authoritative and avoid citing the rating as a current live signal.

Research grounding

The scorecard model borrows from two literatures: prediction markets as information aggregators, and market microstructure as the discipline of whether quoted prices are deep, costly to move, and surveillable.

Prediction-market prices can aggregate dispersed information, but the usefulness of a quoted probability depends on market design, contract wording, and who supplies liquidity. See Wolfers and Zitzewitz (2004) and Arrow et al. (2008).

Depth and price impact belong in the card because thin books can make prices easy to move and hard to cite. The market microstructure anchor is Kyle (1985); the simple empirical intuition behind illiquidity and return impact is captured by Amihud (2002).

The regulatory framing is also concrete: CFTC materials for designated contract markets emphasize contracts not being readily susceptible to manipulation and ongoing market surveillance. FINRA surveillance guidance highlights wash sales, spoofing, layering, front-running, and correlated trading as manipulation patterns worth monitoring. The data path is feasible because Polymarket exposes public market, orderbook, spread, midpoint, trade, and websocket endpoints.

Current public cards live on the product surface

This page is the methodology note. The searchable public card archive lives at /market-trust so users do not have to read the research page before inspecting current market-quality ratings.

Open Market Trust cards

Claim discipline

The only public wording Market Trust allows today is: experimental market-quality card, independent market-quality reading with visible caveats, candidate v0.2 diagnostic, and source-linked Convexly artifact. The claim ceiling below is intentionally stricter than ordinary marketing copy because a market-quality score can be misread as authority if the caveats do not travel with it.

Not claiming: certified market
Not claiming: approved market
Not claiming: compliant market
Not claiming: investment advice
Not claiming: guaranteed outcome
Not claiming: proven manipulation-free
Not claiming: validated live rating
Not claiming: calibrated rating
Not claiming: best-in-class rating
Not claiming: predictive of resolution
Not claiming: real-time market trust outside the Trader pipeline
Not claiming: measured manipulation-detection finding

The next research step is the outcome ledger: freshness, stability, and coherence labels can accrue before market settlement, while final resolution labels arrive later. Until those outcome rows support calibration, Market Trust is a transparent product artifact and data-collection harness, not a finished ratings business.