@geoffreyirving: New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance an…

X AI KOLs Following 06/08/26, 09:46 AM Papers

Summary

A new paper surveying formal methods practitioners on the importance and tractability of applications to AI safety, accompanied by a broader plea for ambitious software verification.

New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I'm excited this is out! Here is a broader plea for people to be very ambitious about verifying software! 🧵 https://t.co/jZZ5N8ALbl

Original Article

View Cached Full Text

Cached at: 06/09/26, 08:59 AM

New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I’m excited this is out!

Here is a broader plea for people to be very ambitious about verifying software!

AI-assisted formal proofs (in particular in Lean) are getting very good! A worry I have is that people will insufficiently update about how powerful this stuff can be, and thus fail to tackle sufficiently big projects.

So, here are some predictions! By the end of 2027, we will have formal proofs* of all of

The correctness of clang and gcc
Lack of memory errors in Linux
Internally, within at least one major hardware company (Intel, Apple, or Nvidia, say), correctness of an entire chip

*Of course, these proofs will be against specs, and there is insufficient time to write these specs via human effort alone. So the specs will be partially AI generated, and thus provide only partial confidence about actual correctness of the software.

A further subtlety is that why I didn’t say “correctness of Linux”. Two more detailed claims:

Linux is sparsely close to a version of Linux with no memory errors
Linux is sparsely far from any versions with no denial of service attacks

That is, correctness of an OS against certain bugs is unachievable without big switch to something like seL4 or the like (designed from the ground up to be much harder). For memory errors, we can find and fix them locally during proof construction; DDOS is more global.

Correct clang + correct gcc + memory-safe Linux would be very big! And there are certainly many other huge bets that would be valuable to take! Formal methods is beautifully defence-dominant; let’s scale it up!

Oops, failed to specifically probabilities on those predictions: I would put >80% on each of 1, 2, 3.

@geoffreyirving: New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance an…

Similar Articles

@paulg: Interesting. AI will in effect increase both supply and demand for formal methods. You need them more, but you also hav…

Formal methods and the future of programming

Characterizing initial human-AI proof formalization workflows

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Improving verifiability in AI development

Submit Feedback

Similar Articles

@paulg: Interesting. AI will in effect increase both supply and demand for formal methods. You need them more, but you also hav…

Formal methods and the future of programming

Characterizing initial human-AI proof formalization workflows

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Improving verifiability in AI development