formal-verification

#formal-verification

Precise Verification of Transformers through ReLU-Catalyzed Abstraction Refinement

arXiv cs.AI ↗ · 6h ago Cached

This paper proposes a novel transformer verification approach that uses ReLU to represent precise but non-linear bounds for dot products, enabling precise and efficient verification. The method outperforms state-of-the-art baselines on sentiment analysis models.

0 favorites 0 likes

#formal-verification

MathAtlas: A Benchmark for Autoformalization in the Wild

arXiv cs.AI ↗ · 6h ago Cached

MathAtlas is a large-scale benchmark for autoformalization of graduate-level mathematics, containing ~52k theorems and definitions extracted from 103 textbooks, with a mathematical dependency graph of ~178k relations. Experiments show state-of-the-art models achieve at most 9.8% correctness, highlighting the difficulty.

0 favorites 0 likes

#formal-verification

@logic_int: Aleph, our fully autonomous AI agent system for formal verification, aced all major theorem proving benchmarks includin…

X AI KOLs Following ↗ · 18h ago Cached

Aleph, a fully autonomous AI agent system for formal verification, achieved top performance on major theorem proving benchmarks including PutnamBench, VeriSoftBench, and Verina.

0 favorites 0 likes

#formal-verification

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

arXiv cs.LG ↗ · yesterday Cached

This paper proposes Embedding Temporal Logic (ETL), a temporal logic that monitors perception-based autonomous systems directly in learned embedding spaces, enabling specification of high-level perceptual concepts and achieving strong empirical agreement with ground-truth semantics.

0 favorites 0 likes

#formal-verification

Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces Vertex-Softmax, a method for tight Transformer verification by proving that exact softmax optimization over interval constraints occurs at vertices of the constraint box. It improves certified accuracy and efficiency in CROWN-style verifiers for attention models on standard datasets.

0 favorites 0 likes

#formal-verification

Synergistic Simplex: Cooperative Runtime Assurance for Safety-Critical Autonomous Systems

arXiv cs.LG ↗ · 3d ago Cached

This paper introduces Synergistic Simplex, a new runtime assurance architecture for autonomous systems that allows safety monitors to use ML outputs while preserving formal safety guarantees. The authors demonstrate its effectiveness in improving performance for obstacle detection in autonomous vehicles.

0 favorites 0 likes

#formal-verification

@AnimaAnandkumar: TorchLean codebase is now available! TorchLean is a Lean 4 framework for verified neural-network software. It supports …

X AI KOLs Following ↗ · 3d ago Cached

TorchLean is a newly released Lean 4 framework that enables formal verification of neural network software, featuring typed tensors, verified autograd, PyTorch interoperability, and GPU execution. The release expands support to modern architectures like diffusion models, GPT-style transformers, and state-space models, bridging practical ML workflows with mathematical proof checking.

0 favorites 0 likes

#formal-verification

Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper introduces Shepherd, a functional programming model and runtime substrate for meta-agents that formalizes operations using Lean and records interactions in a Git-like execution trace. It demonstrates significant performance improvements in runtime intervention, counterfactual optimization, and RL training by enabling fast forking and replay of agent states.

0 favorites 0 likes

#formal-verification

Formalizing statistical learning theory in Lean 4 [R]

Reddit r/MachineLearning ↗ · 6d ago Cached

FormalSLT is a Lean 4 library that formally proves finite-sample statistical learning theory results (ERM, VC bounds, Rademacher bounds, PAC-Bayes, etc.) with explicit assumptions and zero sorry statements, providing a machine-checked foundation for ML theory.

0 favorites 0 likes

#formal-verification

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

arXiv cs.CL ↗ · 2026-05-08 Cached

The article introduces MANTRA, a framework for automatically synthesizing SMT-validated compliance benchmarks for tool-using LLM agents from natural language manuals. It demonstrates that this approach enables scalable and reliable evaluation of agent adherence to complex procedural rules.

0 favorites 0 likes

#formal-verification

LemmaScript: A Verification Toolchain for TypeScript via Dafny

Lobsters Hottest ↗ · 2026-04-22 Cached

LemmaScript is a new toolchain that compiles TypeScript to Dafny for formal verification without altering the runtime, demonstrated by proving a CVE fix in the Hono framework.

0 favorites 0 likes

#formal-verification

Types and Neural Networks

Hacker News Top ↗ · 2026-04-21 Cached

This article explores the theoretical and practical challenges of training LLMs to produce typed outputs natively, rather than relying on post-hoc typechecking, with a focus on formally typed languages like Idris, Lean, and Agda. It analyzes current ad-hoc approaches to enforcing types during inference and proposes rebuilding LLMs from the ground up to generate inherently typed outputs.

0 favorites 0 likes

#formal-verification

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.

0 favorites 0 likes

#formal-verification

Signal Shot: a project to verify the Signal protocol and its Rust implementation using Lean

Lobsters Hottest ↗ · 2026-04-21 Cached

Signal Shot is a major formal verification initiative to verify the Signal protocol and its Rust implementation using Lean, combining advances in Rust-to-Lean translation (Aeneas), mathematical foundations (Mathlib/CSLib), automated tactics (grind/SymM), and AI-assisted formalization. This represents a significant test of whether Lean can scale from pure mathematics to deployed real-world software systems.

0 favorites 0 likes

#formal-verification

Verus is a tool for verifying the correctness of code written in Rust

Hacker News Top ↗ · 2026-04-20 Cached

Verus is a static verification tool for Rust that uses SMT solving to prove full functional correctness of low-level systems code without runtime checks.

0 favorites 0 likes

#formal-verification

Creusot 0.11.0: VerifyThis winner

Lobsters Hottest ↗ · 2026-04-20 Cached

Creusot 0.11.0 released with the Creusot team winning the VerifyThis 2026 program verification competition. The release includes minor features like explicit binders for result variables and support for weak memory atomics, with major features in development.

0 favorites 0 likes

#formal-verification

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Discover and Prove (DAP), an open-source agentic framework for automated theorem proving in Lean 4 that tackles 'Hard Mode' problems where the answer must be discovered independently before formal proof construction. The work releases new Hard Mode benchmark variants and achieves state-of-the-art results while revealing a significant gap between LLM answer accuracy (>80%) and formal prover success (<10%).

0 favorites 0 likes

formal-verification

Submit Feedback