Food Spec Extraction

Drop PDF here or click to browse

Hungarian food product specification document

How does it work?

Extracts structured product data from Hungarian food specification PDFs using a vision language model (Qwen3-VL-32B) through a multi-step pipeline.

PDF Upload → Queue → 5-Step Pipeline → Structured JSON

Load PDF

Convert each page into a high-resolution image (200 DPI) using PyMuPDF for the vision model to read.

Classification

The VLM reads the first page to determine the product category (dairy, meat, frozen, fish, dry goods, etc.). Low-confidence results trigger a re-classification using all pages.

Per-Field Extraction

For every page and every field relevant to the detected category, the VLM extracts a value, evidence quote, and confidence score. This is the most intensive step.

Evidence Merge & Retry

Per-page values are merged: agreement yields a final value, disagreement flags a conflict, and missing fields get up to 3 retry attempts with focused prompts.

Conflict Resolution

Fields with contradicting values are resolved in tiers: no real evidence uses a default, single-source values are accepted, and multi-source conflicts are re-read by the VLM.

Verification & Finalization

Resolutions are applied, defaults set for Y/N fields, range validation enforced (fat 0-100%, salt 0-50%), and the final structured result with coverage stats is built.

Supported Categories

Dairy Butcher Products Meat Frozen Fruit/Veg Frozen Mixtures Frozen Breaded Frozen Stuffed Fish Dry Goods

Uploading...

Something went wrong

Extraction Results

Category - -

File -

Pages -

Fields Found -

Conflicts -

Processing Time -

Extracted Fields

Field	Value	Confidence
No data