Food Spec Extraction

Drop PDF here or click to browse

Hungarian food product specification document

How does it work?

Extracts structured product data from Hungarian food specification PDFs using a vision language model (Qwen3-VL-32B) through a multi-step pipeline.

PDF Upload Queue 5-Step Pipeline Structured JSON
0

Load PDF

Convert each page into a high-resolution image (200 DPI) using PyMuPDF for the vision model to read.

1

Classification

The VLM reads the first page to determine the product category (dairy, meat, frozen, fish, dry goods, etc.). Low-confidence results trigger a re-classification using all pages.

2

Per-Field Extraction

For every page and every field relevant to the detected category, the VLM extracts a value, evidence quote, and confidence score. This is the most intensive step.

3

Evidence Merge & Retry

Per-page values are merged: agreement yields a final value, disagreement flags a conflict, and missing fields get up to 3 retry attempts with focused prompts.

4

Conflict Resolution

Fields with contradicting values are resolved in tiers: no real evidence uses a default, single-source values are accepted, and multi-source conflicts are re-read by the VLM.

5

Verification & Finalization

Resolutions are applied, defaults set for Y/N fields, range validation enforced (fat 0-100%, salt 0-50%), and the final structured result with coverage stats is built.

Supported Categories

Dairy Butcher Products Meat Frozen Fruit/Veg Frozen Mixtures Frozen Breaded Frozen Stuffed Fish Dry Goods

Uploading...

Something went wrong

Extraction Results

Category - -
File -
Pages -
Fields Found -
Conflicts -
Processing Time -

Extracted Fields

Field Value Confidence
No data