Document extraction for RAG pipelines. Loads PDF, DOCX, CSV, HTML, and web pages into a normalized Document format for chunking and embedding.
Required Ruby Version
>= 3.0.0
Authors
Johannes Dwi Cahyo
Document extraction for RAG pipelines. Loads PDF, DOCX, CSV, HTML, and web pages into a normalized Document format for chunking and embedding.
>= 3.0.0
Johannes Dwi Cahyo