Every business handling invoices, purchase orders, or quotes deals with the same pain: humans manually typing numbers from PDFs into spreadsheets. It's slow, error-prone, and a terrible use of people's time. Existing document parsing tools are expensive, brittle on scanned documents, and inflexible when format changes.
01 , Case Study
[ LAB ]
DocExtract
AI · DATA ENGINEERING · DOCUMENT PROCESSING · 2026
Teams were burning hours manually transcribing invoices, purchase orders, and scanned PDFs into structured data.
TIMELINE
4 weeks
ROLE
Design + Engineering
STACK
Next.js, OCR pipeline, LLM extraction
OUTCOME
Live tool, iterative development
02 , The Problem
課題Why this needed
building.
03 , Approach
手法How we
built it.
We built a lightweight tool that combines OCR with LLM-based field extraction. Upload any document — scanned or digital — and get structured data out. The LLM handles format variation without needing templates for every supplier. Output goes straight to CSV or an API.
“Every business has a pile of PDFs they wish were spreadsheets.”
Gallery

04 , Outcome
成果What it does
now.
DocExtract is a lab project — live and usable, continuing to evolve. It's been tested across dozens of document types.
NEXT