Formal Methods: The strictest, most powerful kind of Static Analysis

Reading Time: 10 minutes

Have you ever wished that you could find every bug in your software, fix each of them, and have a perfect piece of software? This article explains how this is possible with Formal Methods, why you might already use them without knowing it, and where they have limitations.

It’s time to talk about Formal Methods, a rather polarizing topic. You either love it or you hate it. It is worth knowing what they are, so you are not missing out on one of the sharpest tools in the belt of a developer (you can probably guess my opinion now). If you feel intimidated by the word “formal” or you only locate Formal Methods in purely academic settings, please bear with me. The times when you needed a PhD just to use them are long over. Today, many developers use Formal Methods routinely, on complex, real-world software.

Since I have discovered Formal Methods, I use them almost daily. And if I can’t use them (we will get to the limitations later), I feel insecure. They have truly changed my developer life.

Why are they important?

It is extremely hard to find bugs in software. In this blog, we have talked about Static Analysis, Dynamic Test, and Code Reviews. While they do find bugs, they cannot guarantee that you are not missing a test case or have False Negatives. Hence, even endlessly tested software can still fail.

Avoidable: Not your typical fireworks display – this is the Ariane 5 rocket a few seconds after a wrong type cast.

I would like to highlight one example, because I have a personal connection to it: On its maiden flight, the Ariane 5 rocket was practically self-destructing, just 37 seconds after liftoff. The reason turned out to be a small numeric overflow that caused a run-time exception. Then, a chain effect caused the rocket to destroy itself. A subtle bug that evaded extensive testing.

This is where Formal Methods come in — the software equivalent of engineering safety margins in bridges. They would have prevented this.

What are Formal Methods?

Formal Methods have the same goal as Static Analysis: Finding bugs by analyzing a model or code, or even requirements. We can see them as a special type of Static Analysis, since they don’t run the code, but analyze it statically. However, the big difference is how Formal Methods analyze things and what guarantees they can give us. In a nutshell:

Formal Methods find “all” bugs.

We also call that “sound” in Computer Science, or “Zero False Negatives”. In other words, if there is a bug in your software and Formal Methods are told to look for it, they will never miss it. And that is a game changer.

Are they new?

Not at all. Formal Methods have existed since the Babylonians built astronomical databases. In the world of computer science, we can perhaps blame Alan Turing for formalizing what computation means. But it was only between the 1970s and the 2000s, when researchers made Formal Methods scalable, and started using them on hardware and software designs.

When the Ariane disaster happened in 1996, some of the pioneers of Formal Methods were able to apply them to the code, and they found the bug. Today, nearly 30 years later, formal methods have grown up. Robust commercial tools exist, and they made their way into the open source community (good things just persist!). And I am humbled and lucky and enough to work with exactly those people who found the bug in the Ariane 5 with Formal Methods 🎉.

How do they work?

Like with Static Analysis, there is more than one way to implement a formal analysis method (hence the plural form in the name). All formal methods give a “zero bugs missed” guarantee by considering every possible input, every possible program state, and every possible execution path. Yes, every. And sometimes even more (I am not joking…we will come to that).

This already tells you that Formal Methods cannot be just pattern matching or linting++, but require a solid mathematical foundation, much deeper than ordinary Static Analysis. And like we discussed in earlier posts, it needs to be clever, because even 10 lines of code can have billions of internal execution paths.

Logical approach of Formal Methods: Everything starts as a bug; over time, they try to prove that there is no bug.

Assuming the worst

Regardless of method, all formal methods start with the assumption that everything is buggy. Then, step by step, they analyze the code with mathematical algorithms that try to prove that nothing bad happens. If the proof does not succeed for a certain operation (e.g., an array access), then we stay with the initial assumption of buggy. In other words, we err on the safe side. However, if the proof succeeds, we get an entirely new category of result: proven safe.

And that’s cool because it means “at this location, nothing can go wrong, whatever the inputs are, whenever they may come”. This allows us to fully eliminate certain types of bugs and reduces robustness testing (if something is mathematically proven robust, why try to crash it?).

Formal Methods consider every possible input, every possible software state, and every possible execution path.

For more details about the methods, see the bonus section further down. For everyone else, let’s continue here.

You are already using Formal Methods

If you think “this sounds rather theoretical, I don’t need that”, then it may be too late: I am 99% sure that Formal Methods are already in your development toolchain. Compilers typically use “lightweight” formal methods — e.g., the Rust compiler checks for memory safety, Java’s uses it for nullability analysis. Moreover, most Static Analyzers borrow from formal methods for detecting null dereferences.

However, in most applications, Formal Methods are used in a very limited way, and often they are allowed to be imprecise (e.g., if the Rust compiler cannot deduce that something is safe, it’s not a big deal; it will just add a run-time check). Let’s have a look at what Formal Methods, in their unleashed form, can do for our source code.

What kind of bugs can they find?

Formal Methods verify well-defined properties. By default, most formal methods automatically check standard properties like “there is no out-of-bounds array access” or “there is no numeric overflow” without requiring any work from you. But you can also use them to find custom types of bugs, as long as you find a way to describe them to the algorithm

Standard Properties – Automatic

Example from the Ariane 5: Here is the piece of code that produced the catastrophic overflow:

P_M_DERIVE(T_ALG.E_BH) := UC_16S_EN_16NS (TDB.T_ENTIER_16S
                                   ((1.0/C_M_LSB_BH) *
                                   G_M_INFO_DERIVE(T_ALG.E_BH)))

This converts a 64bit float number (”Horizontal Bias” E_BH, at right-hand side of the assignment) into a 16bit signed integer (left side). And this is fine as long as the variable E_BH was less than roundabout 64k. For the first few seconds of the flight, and during all lab tests, this worked flawlessly.

Unfortunately, the Ariane 5 had much more powerful engines than its predecessor (from which this piece of code was reused), so that the value became simply too big to fit into a 16bit integer. As a result, the integer variable overflowed, causing an exception. This eventually triggered a full nozzle deflection, which set the rocket on a course correction so rapid that aerodynamic forces have torn the rocket apart. More details this ESA report, if you insist.

Hence, one of the standard properties worth checking (regardless if you build rockets or not) is that numeric operations do not overflow. Others are correct memory access (read after write, within bounds), concurrency aspects (race conditions, deadlocks…), and correct use of libraries. Formal methods will automatically try to prove these properties for you.

Custom Properties – expert needed

Some formal methods allow going deeper by letting users write their own properties in special specification languages. With that, we can also verify functional properties, for example, is this algorithm actually a PID controller, or rather a bubble sort?

This is very powerful but requires describing the properties in a specific form, like, for example, first-order logic in ACSL. This is indeed tricky, but also rarely used. Most of my clients stick with the standard properties.

In summary, you only get what you ask for (but you still can’t always get what you want).

Formal Methods only find what you ask for – the verification properties.

When are Formal Methods used?

Formal Methods are typically used in safety-critical software: Aviation, rail, automotive, and medical. Such systems must follow rigorous safety standards, which mention Formal Methods under the keywords “Formal Verification”, “Abstract Interpretation”, and the like. A large part of my day job is to help people apply Formal Methods successfully to cars, airplanes, industrial automation devices, and sometimes even household appliances.

You can use them everywhere where bugs are too costly. Or if you want to challenge yourself to see how good your coding is. For example, I have used Formal Methods to prove that a certain nasty bug in openSSL is definitely fixed, which gave me a good night’s sleep. I have used it to prove that a custom-made electronic clock shows the time and date correctly. And I have used it to prove that my flight control software does not have overflows, unlike the Ariane 5. It is a really good feeling when you go testing, and you already know that robustness will not be an issue (unfortunately, other issues happened, but at least it was not a software error anymore).

Other examples where FM are used:

High-assurance security systems: cryptographic protocol verification.
Microsoft used model checking to verify the Windows file system’s consistency guarantees.
Banking: Ensuring transactions can’t lose money.
Aerospace: Proving control software won’t enter unsafe states.
Medical devices: Guaranteeing correct dosage computations.

Limitations and common objections

Formal Methods have weaknesses. Let’s have a look at frequent objections, and debunk some of them, but also admit where criticism is valid:

“The method/tool itself could be buggy” — This is a classic. First of all, these methods have a solid mathematical base. But typically, people mean “what if it is implemented incorrectly?”. Luckily, Formal Methods have mature tools which are 25+ years of “proven in use” (pun intended). Moreover, some of these tools have been used to verify themselves 🐕‍🦺. And I argue that it is better than what humans can do.

“It’s too academic, requires experts” – Seen as a niche for PhDs in math, not for “real-world” coding deadlines. But as we discussed above, this has changed. You don’t necessarily have to do any manual work. And most methods behave just like Static Code analysis.

“Takes too much time” – Perception that it slows delivery compared to writing tests and shipping. Any verification step takes time. The real question is, how much time does it save you if a bug has slipped through to production, and you need to fix it?

“Doesn’t scale to our systems” — Indeed, large software sometimes can take days to formally analyze. But crucially, this can be automated! Throw it on a big machine, and wait. If you need faster response times, then here is what you should do anyway: modularize your software, create well-defined interfaces, and give these chunks to the analysis.

“Too many False Positives” — While Formal Methods must have needless warnings by construction, it isn’t that bad. The problem is when you apply them too late, e.g., one day before your software release. Then it’s too late to change the code, and everything looks like a False Positives. And as we discussed before, humans are simply bad at deciding that is a true or false warning. If you want to get precise results, then you have to adapt your code and analysis settings, too.

How does it look in practice?

Not much different than Static Code Analysis. In fact, Formal Methods are hardly separable from Static Analysis if you don’t know what’s under the hood. Commercial static analyzer tools sometimes use Formal Methods, but at other times they go overboard with marketing.

Here is how you can identify a tool that uses Formal Methods properly:

It claims zero False Negatives – otherwise it is not sound, and will miss bugs.
It must have False Positives – otherwise, you are a victim of marketing, as we discussed here.
It shows you proven properties, not just bugs. Traditionally, this is green highlighting in the code for operations like multiply, array access, pointer derefs, and so on.
It takes longer to analyze – because it gives you a guarantee.

If in doubt, talk to the authors or vendor, and let them explain to you how the Formal Method works. If you get a lot of hand-waving, it could be a tool that only dresses in a formal gown, but is actually a pattern matcher. Next, I will give you some buzzwords that should come up in that conversation, as a marker of true formality.

Bonus: Types of Formal Methods and Buzzwords

Want more details? Here are the Big Three of Formal Methods, and some buzzwords to help you recognize them in the wild. If many of you are interested in a deep dive, let me know, and I might expand on them in separate articles:

Model Checking

Model checking explores all possible states of a system to ensure properties hold (e.g., no deadlocks). It expresses bad properties as a reachability problem on a graph, and then tests reachability with graph algorithms. Because graphs can be huge, it often can only give a bounded proof, i.e., it only goes to a certain depth. The 2007 “Nobel Prize of Computer Science” went to the people who invented it.

Properties:

very precise
fully automatable
really low scalability (”state-space explosion”, i.e., the graph can get too huge to store in memory, especially on loops)
typically bounded proof

Theorem Proving / Deductive Proof

Theorem proving uses computational logic to prove properties from specifications. It leverages techniques like Weakest Precondition Calculus to identify which preconditions must hold true for bad properties to NOT happen, and subsequently uses SMT solvers to check if the concrete inputs are fully included in this set – otherwise there is a bug.

Properties:

very precise
cannot be fully automated (needs human interaction, especially on loops)
low scalability

Abstract Interpretation

Abstract Interpretation is approximating program behavior to find possible runtime errors by modeling each program operation in an abstract space (that means focusing on the essential features of something, while ignoring the less relevant details). For example, instead of calculating the result of “a+b” for every combination of the two operands and predicting precise results, we could just calculate with the min/max ranges. We lose the precise number, but if we do it right, all possible numbers are still contained in the range, and we can prove/disprove many bugs. Advanced tools use set abstractions and relations in a polynomial space.

Properties:

loss of precision (especially on loops)
can be fully automated
highly scalable

Interestingly, they all have a common enemy: Loops! But that is a topic for another day.

Conclusion

We can’t (yet) remove humans from software development, but we can remove some of the risks that come with being human. If you are a user of Static Analysis, there is no reason why you should ignore Formal Methods. They give you the bragging rights to tell your co-workers that your software is perfect. Well, if there is such a thing as perfect.

Yes, Formal Methods have limitations, like everything in life. But we can work around that, and benefit from the “zero bugs missed” guarantee. For me, Formal Methods have not only saved thousands of debugging hours, but they have also changed the way I think about my own mistakes. Humans make more mistakes than they realize, but Formal Methods find them “all”.