How to chunk a pdf document to fit llm context windows
- Step 1Check your LLM's context window size — Note the token limit: GPT-4o is 128K tokens; Claude 3 is 200K; Gemini 1.5 Pro is 1M.
- Step 2Upload the PDF — Drop the document into the chunker.
- Step 3Set chunk size below the model's limit — Leave headroom for the system prompt and response — e.g., 100K tokens for GPT-4o.
- Step 4Process each chunk sequentially — Send each chunk to the LLM with appropriate instructions (summarise, extract, etc.) and aggregate the results.
Frequently asked questions
Should I include overlapping context between chunks?+
Yes — include the last paragraph of the previous chunk at the start of the next to preserve narrative continuity across chunk boundaries.
How do I aggregate LLM responses from multiple chunks?+
Collect each chunk's response, then send a final prompt asking the LLM to synthesise all partial responses into a single coherent output.
Is there a risk of losing context when chunking?+
Yes — information that spans chunk boundaries may be missed. Use generous overlap and include document title/section metadata in each chunk prompt.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.