The hard part of RAG isn't finding the right chunk!
When you chunk a document for RAG, each chunk lands in the index on its own, disconnected from the section it came from. So two chunks with identical text, but from completely different parts of the document, look exactly the same to a flat index.
That becomes a problem the moment a question needs context from more than one section. The chunks come back, but how they relate to each other gets lost.
ADE Section fixes this. It reads the parsed document, builds the actual hierarchy, and figures out where every chunk falls within it. That gets attached to the chunk before it's embedded.
Once that's in place, a broad question can stay at the section level. A specific one can drop into a sub-chunk. You can scope a search to one part of a document instead of the whole thing.
Citations get more accurate too. The model knows exactly which section a fact came from.
Run ADE Parse, then run ADE Section.
显示更多
Parse password-protected documents!
Agentic Document Extraction (ADE) accepts a password parameter directly in the Parse or Parse Jobs call. Pass the document's password, and ADE decrypts and parses it in the same request.
Works across PDF, DOC, DOCX, ODT, PPT, PPTX, and XLSX. Available in the REST API and in the Python and TypeScript libraries.
Requires Zero Data Retention (ZDR) to be enabled on your organization.
显示更多