Reducing the Noise: Teaching Static Analysis new facts

Reading Time: 6 minutes

In this article, I share a simple trick that reduces warnings in your Static Analysis tool, without compromising safety. You will learn how to inject new facts into your Static Analyzer, which can save you many hours of review work, as well as the pitfalls that must be avoided.

Static Analysis can create a lot of warnings, and some of them are unavoidable. This creates a common challenge in practice: Code review can cost a lot of time, and it is difficult to find the real bugs in the flood of warnings. But there are solutions.

In the previous article, we discussed path pruning, a.k.a. blocking semantics or error absorption. It is a “filtering effect” that can hide bugs, but also reduces redundant warnings. This opens up a possibility: We can leverage that to help Static Analysis rule out certain execution paths, and thereby remove warnings that we know are not true. This removes clutter and allows us to see the real bugs.

Here is how to make your life easier, and it works regardless of what static analysis tool you are using.

The Trick

As discussed, path pruning means that Static Analysis stops analyzing data flows that resulted in errors. Example:

int foo(int arg) {
    // x is in range [INT_MIN, INT_MAX]
    z = 1 / arg;  ///< warning "division by zero"
}

Here, Static Analysis knows nothing about x and will therefore warn of a division by zero. Maybe it didn’t find a caller, or precision was lost. Suppose, however, we have more information about x. In that case, we can “inject” this information via path pruning:

int foo(int arg) {
    // x is in range [INT_MIN, INT_MAX]
    assert(x > 5);  ///< new: explicit path pruning to inject information
    // x is in range [6, INT_MAX]
    z = 1 / arg;  ///< no warning anymore
}

We have inserted an assertion with the information x > 5. This assertion will fail for values less than or equal to five, and hence these paths are filtered out. After that point, Static Analysis “knows” that fact and will not report errors on any of the divisions anymore. Pretty neat!

Assertions can be used to inject knowledge into your Static Analysis.

Trading warnings and controlling propagation

Of course, now we get a new warning on the potentially failing assert. The good thing is that we have stopped wrong assumptions from propagating further into our program, where nobody can clearly tell anymore what value x is supposed to have. Additionally, there might be other variables that depend on x, which are now more precise.

The cool part: This trick often allows us to eliminate several warnings in exchange for one. Even if not, at least it gives us the power to provide information to the analysis at strategic locations that we know best. For example, function parameters are a good place to inject knowledge, since developers typically have a mental model of what the inputs should be.

Side note: assertions on function parameters are called “preconditions”, and are a stepping stone to contract-based programming. Many static analysis tools have features to apply preconditions to function calls, so that you actually don’t have to write assertions in the code.

It also helps to understand your Static Analyzer

Here is a tip from my daily work: You can use assertions not only to inject knowledge into our Static Analysis, but also to “probe” what it knows. For example, if you have a complex function in which you get warnings that you cannot explain, you can add intermediate assertions to see where the information is lost:

char buffer[]; // global buffer of length 10, allocated somewhere else

void function(unsigned idx, char b) {
    buffer[idx] = b; ///< no warning
    ...
    assert(idx <= 9);  ///< no warning
    ...
    log_info("new sample");
    assert(idx <= 9);  ///< warning "assertion may fail"
    ...
    buffer[idx] = 0; ///< unexplainable warning: out of bounds array access
}

Assume you don’t know why the second array access is reported as a bug, although the first one passes. After adding assertions, we can see that Static Analysis initially “knows” the value of idx, but somehow lost the value in between. In this case, it happens after the call to log_info, which suggests that this function does something to our buffer. Instead of concluding that the analysis is wrong, we will follow that hint and see if there is a bug in the logging function.

This can make the difference between wrongly assuming a False Positive and finding a real bug. This is particularly helpful for Static Analysis based on Formal Methods, which can not only find possible bugs, but also prove their absence. If an assertion is proven correct, then you can be sure that your analyzer already knows this fact, and look for other explanations.

Use assertions to “probe” what your analyzer knows.

The Dangers

First, humans make mistakes. If you place assertions just to silence the tool, without really knowing if your information is correct, you may hide actual bugs. It is notoriously difficult for the human brain to consider all corner cases.

Second, don’t go overboard and use path pruning in the more general sense. Don’t do this:

// don't do this:
void foo(int x) {
    ret = 1 / x; ///< this is to remove the zero for static analysis
    ret = 1 / x;
}

In the previous article, I described the difference between run-time behavior and analysis behavior, and why you should aim to align them. Otherwise, you may hide bugs.

Third, don’t believe that this can eliminate all False Positives. While it is a powerful trick, Static Analysis cannot be perfectly precise, as we have discussed earlier. Know when to stop, so that you don’t waste too much time.

Lastly, each assertion is also seen by your compiler, and it may make similar assumptions like Static Analysis. If you inject wrong information, your compiler may create new bugs for you. I recommend this excellent post from Herb Sutter, to brush up on your knowledge of assertions.

Don’t assert what you don’t know.

How to get it right

Seven tips to avoid shooting yourself in the foot, and to really fight False Positives and find bugs:

Only use assertions for path pruning. This ensures that run-time behavior and analysis agree. Alternatively, if your Static Analysis tool has ways to specify constraints or contracts, you can use them too. But make sure that you have a process to verify these constraints later. Otherwise, they remain unchecked assumptions, which might silently hide bugs.
Avoid side effects in assertions. Otherwise, the program behavior might change, and new bugs might emerge or be hidden.
Apply assertions at strategic locations, where you have solid information about your program. Function interfaces, public class methods, and return values are excellent places. Don’t use it in intermediate places, except for “probing” analyzer knowledge.
Don’t suppress warnings on assertions. If Static Analysis warns about them, keep the warning as documentation. This documents that there is a risk in your software, and allows other people to help you (e.g., with more dynamic tests).
Leave assertions enabled during testing, so that you exercise them. If you have solid test cases with high coverage and the assertion never fails, it is a good sign. Note that some toolchains silently turn assertions off in release builds.
You can safely remove assertions if you use a Static Analysis tool based on Formal Methods, and the assertion is proven safe. It has no benefit for Static Analysis or execution anymore and only adds to your program’s execution time. If all assertions are proven correct, you may globally turn off assertions in your toolchain.
Turn assertions into defensive code to fully eliminate the risks. By explicitly handling the error cases, your software becomes more robust.

Bonus: Faster Static Analysis

If you apply assertions specifically on function entry and exit (“contracts”), then you can chunk your analysis into smaller pieces and get your results faster. In an extreme case, you can then analyze each function separately. The analysis context comes from the assertion on the arguments (“precondition”). There is no need to actually analyze the caller.

Whenever you see a call to a function, you can simply replace the actual function call with the assertion on its outputs (“postcondition”). There is no need to dive into the callee.

Some Static Analysis tools work that way, but typically not function-by-function. Each assertion (or, again, other filtering means that are provided by your tool) can become a potential cutting point.

If you are thinking about doing this, make sure that you don’t forget about side effects. This is definitely territory for experts, but neat to keep in mind.

Conclusion

Path pruning has helped me a lot to reduce False Positives, but even more so to better understand Static Analysis. Instead of justifying False Positives, hoping that our limited human brain has considered all possible data flows, we can reach higher confidence in our analysis and learn to love our tool. Often, while doing this, I have realized that it was not the tool that was wrong, but the fool who used it.

Now that you know about this trick, you are close to building a new skill: contract-based programming. It is a slightly more formal approach to what we did here, and a perfect match for Static Analysis.