Routing Engine
Intelligent extraction method selection that picks the cheapest path to quality data.
Pipeline
API REQUEST
→
CACHE CHECK
→
TOOL ROUTER
→
EXTRACTION
→
RESULT
Extraction Method Selection
The tool router evaluates each URL and selects the optimal extraction strategy:
- Cache check — If a recent extraction exists (within TTL), return it immediately.
- Schema.org probe — Fetch the page and check for JSON-LD / Microdata. If found, extract structured data (cheapest method).
- LLM extraction — If Schema.org is absent or incomplete, run LLM extraction on the HTML.
- Auto-heal merge — If Schema.org is partial (e.g. missing price), merge with LLM-extracted fields to create a hybrid result.
- Playwright fallback — For JS-heavy pages that return empty HTML, render with a headless browser first, then extract.
Cost per Method
| Method | Latency | Cost | Confidence Baseline |
|---|---|---|---|
cache | ~50ms | $0.000 | Original score |
schema_org | ~800ms | $0.001 | 0.93 |
llm | ~3s | $0.008 | 0.70 |
hybrid | ~3.5s | $0.009 | 0.85 |
playwright | ~8s | $0.015 | 0.75 |
Auto-Heal Merge
When Schema.org extraction returns partial data (e.g. title and price but no description), the router automatically:
- Keeps all Schema.org fields at their high confidence baseline
- Runs LLM extraction on the same HTML
- Fills in missing fields from the LLM result at LLM confidence levels
- Marks the result as
hybridwith per-field provenance
This gives you the best of both worlds: high-confidence structured data where available, with LLM-powered gap filling.
The routing engine is automatic. You do not need to specify an extraction method. If you want to force a specific method, use the
method parameter in the REST API.