How to scrape table data from a pdf into json
- Step 1Upload the PDF containing the tables — Drop the document into the table extractor.
- Step 2Extract all tables to JSON — The tool identifies and extracts all tables in the document.
- Step 3Review and select the relevant tables — Check the JSON output and filter to the tables you need.
- Step 4Load into your database or analysis tool — Import the JSON into PostgreSQL, MongoDB, or a Python DataFrame for analysis.
Frequently asked questions
Does this work for PDF files from government data portals?+
Yes — standard digitally-created government PDFs extract well. Scanned or image-only PDFs from older government sources require OCR first.
Can I extract from password-protected PDFs?+
Remove the password first using the PDF Remove Password tool, then extract the tables.
How do I handle tables with merged cells?+
Merged cells are split into individual cells in the JSON output. You may need to post-process the JSON to reapply the intended value to all cells covered by the merge.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.