Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
-
Updated
May 21, 2026 - Python
A parser turns its input (often text in form of a file) into a more advantageous representation (usually a certain data structure in memory) to perform a specific task.
Common examples include:
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
A markdown parser and compiler. Built for speed.
Rust-based platform for the Web
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
An incremental parsing system for programming tools
⚓ A collection of high-performance JavaScript tools.
A high-performance 100% compatible drop-in replacement of "encoding/json"
jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
Rust parser combinator framework
🗜 JavaScript parser, mangler and compressor toolkit for ES6+
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Unified querying, transformation, and modification of JSON, TOML, YAML, XML, INI, HCL, KDL and CSV.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.