Split a PDF into LLM-Ready Semantic Chunks

Break a PDF into overlapping semantic chunks optimized for RAG pipelines, vector databases, and LLM context windows.

Loading tool…

How it Works

1
Upload the PDF to chunk
2
Set max chunk size (default 500 words)
3
Download chunks as JSON with page numbers and token estimates

PDF to Semantic Chunks limits by plan

Free is enough for most one-off jobs. Pro raises the file and batch caps; Pro + Media unlocks GB-scale streaming and unlimited duration.

See all plans

Free

No signup needed

File size: 2 MB
Pages per PDF: 50
Files per batch: 1

Pro

£7/mo — 50× larger files

File size: 50 MB
Pages per PDF: 500
Files per batch: 5

Pro + Media

Stream multi-GB files

File size: 500 MB
Pages per PDF: 2,000
Files per batch: 50

Larger files supported on Developer (5 GB CSV) and Enterprise (unlimited). All processing happens in your browser — files never reach a server.

Privacy Audit

0 bytes uploaded. PDF to Semantic Chunks runs entirely in your browser using pdf-lib and pdfjs-dist. Your file stays on your device at all times. No data is sent to any server.

Frequently Asked Questions

What is RAG?

Retrieval-Augmented Generation — a technique where LLMs retrieve relevant document chunks before generating answers.

How are chunks sized?

Chunks split on word boundaries, respecting the max size setting. Each chunk includes page source and token estimate.

Split a PDF into LLM-Ready Semantic Chunks

How it Works

PDF to Semantic Chunks limits by plan

Privacy Audit

Frequently Asked Questions

Related PDF Tools

PDF to Markdown

PDF to Plain Text

PDF Table to JSON

Split a PDF into LLM-Ready Semantic Chunks

How it Works

PDF to Semantic Chunks limits by plan

Privacy Audit

Frequently Asked Questions

Related PDF Tools

PDF to Markdown

PDF to Plain Text

PDF Table to JSON