@geoffreyirving: New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance an…
Summary
A new paper surveying formal methods practitioners on the importance and tractability of applications to AI safety, accompanied by a broader plea for ambitious software verification.
View Cached Full Text
Cached at: 06/09/26, 08:59 AM
New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I’m excited this is out!
Here is a broader plea for people to be very ambitious about verifying software!
AI-assisted formal proofs (in particular in Lean) are getting very good! A worry I have is that people will insufficiently update about how powerful this stuff can be, and thus fail to tackle sufficiently big projects.
So, here are some predictions! By the end of 2027, we will have formal proofs* of all of
- The correctness of clang and gcc
- Lack of memory errors in Linux
- Internally, within at least one major hardware company (Intel, Apple, or Nvidia, say), correctness of an entire chip
*Of course, these proofs will be against specs, and there is insufficient time to write these specs via human effort alone. So the specs will be partially AI generated, and thus provide only partial confidence about actual correctness of the software.
A further subtlety is that why I didn’t say “correctness of Linux”. Two more detailed claims:
- Linux is sparsely close to a version of Linux with no memory errors
- Linux is sparsely far from any versions with no denial of service attacks
That is, correctness of an OS against certain bugs is unachievable without big switch to something like seL4 or the like (designed from the ground up to be much harder). For memory errors, we can find and fix them locally during proof construction; DDOS is more global.
Correct clang + correct gcc + memory-safe Linux would be very big! And there are certainly many other huge bets that would be valuable to take! Formal methods is beautifully defence-dominant; let’s scale it up!
Oops, failed to specifically probabilities on those predictions: I would put >80% on each of 1, 2, 3.
Similar Articles
@paulg: Interesting. AI will in effect increase both supply and demand for formal methods. You need them more, but you also hav…
Jane Street, previously skeptical about formal methods, is now building a team to use them, driven by AI and agentic coding that reduce costs and increase benefits for software verification.
Formal methods and the future of programming
Jane Street, previously skeptical of formal methods, announces a shift in perspective and plans to build a team focused on formal methods, driven by the emergence of agentic coding which changes the cost/benefit calculus by reducing the cost of verification and increasing the need for reliable code.
Characterizing initial human-AI proof formalization workflows
Researchers from Oxford, Cambridge, MIT, CMU and other institutions conduct a mixed-methods study examining how people integrate AI tools into mathematical proof formalization workflows, finding that participants generally achieve higher formalization accuracy with AI assistance while preferring to maintain high-level human control over the proof discovery process.
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
This paper proposes techniques that combine formal methods (Linear Temporal Logic) with LLMs for auditing, monitoring, and intervening in AI systems to ensure compliance with behavioral constraints, showing that even small-model labelers can match frontier LLM judges in detecting violations.
Improving verifiability in AI development
OpenAI publishes a report on mechanisms to improve verifiability in AI development, addressing how stakeholders can verify organizations' claims about AI system properties and safety practices.