Kreuzberg is a high-performance document intelligence library with a Rust core and native Ruby bindings via Magnus. Extract text, metadata, and structured data from 75+ file formats including PDF, DOCX, PPTX, XLSX, HTML, RTF, images (with OCR), email, archives, and more. Features async/sync APIs, text chunking, language detection, and keyword extraction.
Required Ruby Version
>= 3.2.0, < 5.0
Authors
Na'aman Hirschfeld
Versions
- 5.0.0.pre.rc.35 June 24, 2026 (44.6 MB)
- 5.0.0.pre.rc.34 June 24, 2026 (44.8 MB)
- 5.0.0.pre.rc.32 June 23, 2026 (43.7 MB)
- 5.0.0.pre.rc.30 June 23, 2026 (44.7 MB)
- 5.0.0.pre.rc.29 June 22, 2026 (44.7 MB)