Tag
A deep dive into the limitations of regex for parsing HTML, inspired by the famous Stack Overflow answer, discussing formal language theory and the power of industrial regex engines.
Parse-Flow is an open-source visual workflow designer that composes document intelligence primitives (parsing, extraction, classification, splitting) into reusable pipelines, backed by LlamaIndex and a Python worker.
A discussion about parsing XML EXIF data from AVIF files, including a technical rant on the topic.
LlamaIndex released ParseBench, a comprehensive benchmark for evaluating document understanding in AI agents, covering complex enterprise documents with tables, charts, and layouts. A live webinar will discuss the benchmark methodology and results.
This blog post details the implementation of .tres file parsing and resource graph walking in Rust for the Asset Hoard asset manager, enabling external dependency resolution and drag-and-drop export for Godot projects.
Infinity releases two open-weight models, Infinity-Parser2-Pro (35B) and Infinity-Parser2-Flash (2B), which top the ParseBench leaderboard for document understanding, leveraging a synthetic data engine and a novel joint RL algorithm.
The article explains why Tree-sitter is unsuitable for deep program analysis, highlighting how it discards critical tokens like operators and keywords. It advocates for using the Cubix framework as a more robust alternative for building semantic analysis and refactoring tools.
Gecko is a new embeddable C library that delivers GLR parsing for any context-free grammar with automatic syntax-error recovery and YACC-level speed.
DSPy 3.2.0 improves dspy.RLM parsing, tool execution, and failure recovery, plus ongoing work to decouple from LiteLLM.