Confidence Scoring

Every field in a ShopGraph response includes a confidence score from 0 to 1. This lets agents and applications make informed decisions about data quality.

How It Works

Confidence scoring happens in three layers:

  1. Tier baseline — The extraction method sets a starting confidence.
  2. Field modifiers — Individual fields are adjusted based on extraction signal strength.
  3. Threshold enforcement — The strict_confidence_threshold parameter filters low-confidence fields from the response.

Tier Baselines

Each extraction method has a different baseline confidence:

Extraction MethodBaselineDescription
schema_org0.93Structured data from the page (JSON-LD, Microdata). Highest reliability.
llm0.70LLM extraction from raw HTML. Good coverage, variable precision.
hybrid0.85Auto-heal merge: Schema.org partial + LLM fills gaps.
playwright0.75Full browser rendering for JS-heavy pages, then LLM extraction.

Field Modifiers

Within each extraction, individual fields receive adjustments based on signal quality:

SignalModifierExample
Structured data match+0.05Price found in JSON-LD offers.price
Cross-validated+0.03Title matches both <title> and Schema.org
Single source only+0.00Description from meta tag only
LLM inferred-0.10Brand guessed from page context
Format mismatch-0.15Price extracted but currency ambiguous
Stale / missing signal-0.20Availability not found, defaulted

strict_confidence_threshold

Set this parameter to filter out fields below a given confidence level. Fields that do not meet the threshold are omitted from the response (not set to null).

Request with threshold
{
  "url": "https://www.allbirds.com/products/mens-tree-runners",
  "strict_confidence_threshold": 0.85
}

With strict_confidence_threshold: 0.85, any field with confidence below 0.85 will be excluded from the response. This is useful for applications that require high data quality and prefer missing data over uncertain data.

Response Example

Response with field_confidence
{
  "product": {
    "title": "Men's Tree Runners",
    "price": 98,
    "currency": "USD",
    "brand": "Allbirds",
    "availability": "InStock",
    "image": "https://cdn.allbirds.com/image/fetch/...",
    "description": "Lightweight, breathable sneakers made with FSC-certified..."
  },
  "_shopgraph": {
    "extraction_method": "schema_org",
    "confidence_score": 0.93,
    "field_confidence": {
      "title": 0.98,
      "price": 0.97,
      "currency": 0.95,
      "brand": 0.94,
      "availability": 0.91,
      "image": 0.93,
      "description": 0.88
    },
    "fields_omitted_by_threshold": []
  }
}

When fields are filtered

If the threshold is set to 0.95, the response changes:

Filtered response
{
  "product": {
    "title": "Men's Tree Runners",
    "price": 98,
    "currency": "USD"
  },
  "_shopgraph": {
    "extraction_method": "schema_org",
    "confidence_score": 0.93,
    "field_confidence": {
      "title": 0.98,
      "price": 0.97,
      "currency": 0.95
    },
    "fields_omitted_by_threshold": ["brand", "availability", "image", "description"]
  }
}
Tip: Start with a threshold of 0.7 and increase as needed. Most Schema.org extractions return fields above 0.9.