Confidence Scoring
Every field in a ShopGraph response includes a confidence score from 0 to 1. This lets agents and applications make informed decisions about data quality.
How It Works
Confidence scoring happens in three layers:
- Tier baseline — The extraction method sets a starting confidence.
- Field modifiers — Individual fields are adjusted based on extraction signal strength.
- Threshold enforcement — The
strict_confidence_thresholdparameter filters low-confidence fields from the response.
Tier Baselines
Each extraction method has a different baseline confidence:
| Extraction Method | Baseline | Description |
|---|---|---|
schema_org | 0.93 | Structured data from the page (JSON-LD, Microdata). Highest reliability. |
llm | 0.70 | LLM extraction from raw HTML. Good coverage, variable precision. |
hybrid | 0.85 | Auto-heal merge: Schema.org partial + LLM fills gaps. |
playwright | 0.75 | Full browser rendering for JS-heavy pages, then LLM extraction. |
Field Modifiers
Within each extraction, individual fields receive adjustments based on signal quality:
| Signal | Modifier | Example |
|---|---|---|
| Structured data match | +0.05 | Price found in JSON-LD offers.price |
| Cross-validated | +0.03 | Title matches both <title> and Schema.org |
| Single source only | +0.00 | Description from meta tag only |
| LLM inferred | -0.10 | Brand guessed from page context |
| Format mismatch | -0.15 | Price extracted but currency ambiguous |
| Stale / missing signal | -0.20 | Availability not found, defaulted |
strict_confidence_threshold
Set this parameter to filter out fields below a given confidence level. Fields that do not meet the threshold are omitted from the response (not set to null).
{
"url": "https://www.allbirds.com/products/mens-tree-runners",
"strict_confidence_threshold": 0.85
}
With strict_confidence_threshold: 0.85, any field with confidence below 0.85 will be excluded from the response. This is useful for applications that require high data quality and prefer missing data over uncertain data.
Response Example
{
"product": {
"title": "Men's Tree Runners",
"price": 98,
"currency": "USD",
"brand": "Allbirds",
"availability": "InStock",
"image": "https://cdn.allbirds.com/image/fetch/...",
"description": "Lightweight, breathable sneakers made with FSC-certified..."
},
"_shopgraph": {
"extraction_method": "schema_org",
"confidence_score": 0.93,
"field_confidence": {
"title": 0.98,
"price": 0.97,
"currency": 0.95,
"brand": 0.94,
"availability": 0.91,
"image": 0.93,
"description": 0.88
},
"fields_omitted_by_threshold": []
}
}
When fields are filtered
If the threshold is set to 0.95, the response changes:
{
"product": {
"title": "Men's Tree Runners",
"price": 98,
"currency": "USD"
},
"_shopgraph": {
"extraction_method": "schema_org",
"confidence_score": 0.93,
"field_confidence": {
"title": 0.98,
"price": 0.97,
"currency": 0.95
},
"fields_omitted_by_threshold": ["brand", "availability", "image", "description"]
}
}