Document extraction for RAG pipelines. Loads PDF, DOCX, CSV, HTML, and web pages into a normalized Document format for chunking and embedding.

Required Ruby Version

>= 3.0.0

Authors

Johannes Dwi Cahyo

Versions

  1. 0.2.0 March 17, 2026 (12.5 KB)
  2. 0.1.1 March 10, 2026 (10 KB)

SHA 256 checksum