Tag
Canonical's dqlite team used TLA+ to model and understand a 16-year-old SQLite bug in WAL checkpointing that can cause database corruption, then verified whether dqlite is affected.
This paper formalizes four concurrency anomalies in multi-agent LLM systems, mechanically verifies a consistency hierarchy, and provides verified Rust runtimes with bounded prevention costs, including a fix for ByteDance's deer-flow and tool-effect reordering in LangGraph.
Discusses two failure modes in multi-agent systems with shared state—concurrent lost updates and zombie writers—and presents a solution with fenced writers and model-checked guarantees.
Researchers from the Specula team created SysMoBench, a benchmark evaluating whether LLMs can faithfully model real-world computing systems in TLA+ or merely recite textbook specifications. The benchmark tests 11 systems across four phases and reveals systematic gaps in current LLMs' ability to accurately model system implementations versus reference papers.
TLA+ semantics guarantee nonordered updates, but the TLC model checker breaks these guarantees by requiring ordered assignments and adding effectful operators like PrintT, causing confusion for beginners.
Hillel Wayne discusses how LLMs, while popular for writing formal specifications like TLA+ and Alloy, often produce shallow, tautological properties that fail to capture subtle bugs, based on analysis of community projects.