Drop PDF here or click to browse
Hungarian food product specification document
How does it work?
Extracts structured product data from Hungarian food specification PDFs using a vision language model (Qwen3-VL-32B) through a multi-step pipeline.
Load PDF
Convert each page into a high-resolution image (200 DPI) using PyMuPDF for the vision model to read.
Classification
The VLM reads the first page to determine the product category (dairy, meat, frozen, fish, dry goods, etc.). Low-confidence results trigger a re-classification using all pages.
Per-Field Extraction
For every page and every field relevant to the detected category, the VLM extracts a value, evidence quote, and confidence score. This is the most intensive step.
Evidence Merge & Retry
Per-page values are merged: agreement yields a final value, disagreement flags a conflict, and missing fields get up to 3 retry attempts with focused prompts.
Conflict Resolution
Fields with contradicting values are resolved in tiers: no real evidence uses a default, single-source values are accepted, and multi-source conflicts are re-read by the VLM.
Verification & Finalization
Resolutions are applied, defaults set for Y/N fields, range validation enforced (fat 0-100%, salt 0-50%), and the final structured result with coverage stats is built.
Supported Categories
Uploading...
Extraction Results
Extracted Fields
| Field | Value | Confidence |
|---|---|---|
| No data | ||