Confidence Scoring

Every field in a ShopGraph response includes a confidence score from 0 to 1. This lets agents and applications make informed decisions about data quality.

Design Principle: Transparent uncertainty. Every field tells you how confident the system is. You decide what to trust.

How It Works

Confidence scoring happens in three layers:

  1. Tier baseline — The extraction method sets a starting confidence.
  2. Field modifiers — Individual fields are adjusted based on extraction signal strength.
  3. Threshold enforcement — The strict_confidence_threshold parameter filters low-confidence fields from the response.

Tier Baselines

Each extraction method has a different baseline confidence:

Extraction MethodBaselineDescription
schema_org0.93Structured data from the page (JSON-LD, Microdata). Highest reliability.
llm0.70LLM extraction from raw HTML. Good coverage, variable precision.
hybrid0.85Auto-heal merge: Schema.org partial + LLM fills gaps.
playwright0.75Full browser rendering for dynamic sites where JavaScript execution is required for data access, followed by LLM extraction.

Field Modifiers

Within each extraction, individual fields receive adjustments based on signal quality:

SignalModifierExample
Structured data match+0.05Price found in JSON-LD offers.price
Cross-validated+0.03Title matches both <title> and Schema.org
Single source only+0.00Description from meta tag only
LLM inferred-0.10Brand guessed from page context
Format mismatch-0.15Price extracted but currency ambiguous
Stale / missing signal-0.20Availability not found, defaulted

strict_confidence_threshold

Set this parameter to filter out fields below a given confidence level. Fields that do not meet the threshold are omitted from the response (not set to null).

Request with threshold
{
  "url": "https://www.allbirds.com/products/mens-tree-runners",
  "strict_confidence_threshold": 0.85
}

With strict_confidence_threshold: 0.85, any field with confidence below 0.85 will be excluded from the response. This is useful for applications that require high data quality and prefer missing data over uncertain data.

Choosing Your Confidence Strategy

ShopGraph supports two patterns for handling data quality:

Threshold filtering

Set strict_confidence_threshold to omit fields below your quality bar. Best for autonomous pipelines where missing data is preferable to uncertain data. Fields below the threshold are excluded from the response entirely.

Confidence routing

Omit strict_confidence_threshold to receive all fields with their scores. Build routing logic in your application: high-confidence fields go to automation, low-confidence fields go to human review. Best for human-in-the-loop workflows where partial data is still actionable.

These patterns are mutually exclusive. If you set a threshold, fields below it will not appear in the response and cannot be routed.

Response Example

Response with field_confidence
{
  "product": {
    "title": "Men's Tree Runners",
    "price": 98,
    "currency": "USD",
    "brand": "Allbirds",
    "availability": "InStock",
    "image": "https://cdn.allbirds.com/image/fetch/...",
    "description": "Lightweight, breathable sneakers made with FSC-certified..."
  },
  "_shopgraph": {
    "extraction_method": "schema_org",
    "confidence_score": 0.93,
    "field_confidence": {
      "title": 0.98,
      "price": 0.97,
      "currency": 0.95,
      "brand": 0.94,
      "availability": 0.91,
      "image": 0.93,
      "description": 0.88
    },
    "fields_omitted_by_threshold": []
  }
}

When fields are filtered

If the threshold is set to 0.95, the response changes:

Filtered response
{
  "product": {
    "title": "Men's Tree Runners",
    "price": 98,
    "currency": "USD"
  },
  "_shopgraph": {
    "extraction_method": "schema_org",
    "confidence_score": 0.93,
    "field_confidence": {
      "title": 0.98,
      "price": 0.97,
      "currency": 0.95
    },
    "fields_omitted_by_threshold": ["brand", "availability", "image", "description"]
  }
}

When using format: ucp, confidence data appears under _extensions.shopgraph.field_confidence. See UCP Output for the full mapping.

Example: Procurement Agent

An agent receives a purchase request with a supplier URL. It extracts product data, checks per-field confidence scores, auto-fills verified fields into the purchase order, and flags uncertain fields for human review.

JavaScript
const result = await shopgraph.enrich(url, { include_score: true });

for (const [field, confidence] of Object.entries(
  result._shopgraph.field_confidence
)) {
  if (confidence >= 0.85) {
    po.setField(field, result.product[field]);       // Auto-fill
  } else if (confidence >= 0.60) {
    po.flagForReview(field, result.product[field], confidence);
  }
  // Below 0.60: omit from PO
}

Per-field confidence routing. The agent makes field-level trust decisions instead of accepting or rejecting the entire response.

What this unlocks: Per-field routing in agent workflows. Autonomous action on high-confidence fields, human review on uncertain ones, rejection of unreliable data. Your agent makes field-level decisions instead of all-or-nothing accept/reject.
Tip: Start with a threshold of 0.7 and increase as needed. Most Schema.org extractions return fields above 0.9.