Break a PDF into overlapping semantic chunks optimized for RAG pipelines, vector databases, and LLM context windows.
Upload the PDF to chunk
Set max chunk size (default 500 words)
Download chunks as JSON with page numbers and token estimates
Free is enough for most one-off jobs. Pro raises the file and batch caps; Pro + Media unlocks GB-scale streaming and unlimited duration.
Larger files supported on Developer (5 GB CSV) and Enterprise (unlimited). All processing happens in your browser — files never reach a server.
0 bytes uploaded. PDF to Semantic Chunks runs entirely in your browser using pdf-lib and pdfjs-dist. Your file stays on your device at all times. No data is sent to any server.
Retrieval-Augmented Generation — a technique where LLMs retrieve relevant document chunks before generating answers.
Chunks split on word boundaries, respecting the max size setting. Each chunk includes page source and token estimate.
Extract text from a PDF and format it as clean Markdown with page headers. Perfect for documentation workflows.
Open toolExtract all text content from a PDF file. Clean, page-separated plain text output ready for processing.
Open toolDetect and extract tables from PDF documents into structured JSON. First row becomes keys, subsequent rows become objects.
Open tool