How to chunk a pdf for ingestion into a knowledge base
- Step 1Upload the PDF — Drop the document into the chunker.
- Step 2Configure chunk parameters — Set chunk size and overlap for your knowledge base retrieval strategy.
- Step 3Download the chunks — Save the JSON chunk output.
- Step 4Ingest into your knowledge base — Use LlamaIndex, LangChain, or OpenAI's file upload API to index the chunks.
Frequently asked questions
Can I use this to populate an OpenAI Assistant's knowledge base?+
Yes — upload the PDF directly to the OpenAI Assistants API. OpenAI's file search handles chunking internally. Use this tool for custom pipelines where you need to control chunking.
How should I handle tables in PDF chunks?+
For tabular data, use the PDF Table to JSON tool to extract tables separately and ingest them as structured JSON alongside the text chunks.
What is the best retrieval strategy for chunked PDFs?+
Hybrid search (combining vector similarity and BM25 keyword search) outperforms pure vector search for most document retrieval use cases.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.