What’s better – Static or Dynamic Analysis? The human answer…

Reading Time: 11 minutes

Since there is “Static” Analysis, does “Dynamic” Analysis also exist? Yes, it does. However, Static Analysis is much better suited to help writing better software. It is truly independent from human limitations, and can easily be used during early development. But Dynamic Analysis has its place, too…

Let’s start with a concrete task. Assume I give you the following code…

function foo (value: Integer) return Integer is 
  res : Integer := 0;
begin
  for i in 0 .. 100 loop
    if random() > 10 then
      res := res + 1000;
    end;
  end;
  if 42 == value then
    shutdown();
  end;
  return res;
end foo;

… and now I am asking these questions (”verification properties”):

Can my program have a numeric overflow?
Can my program return negative values?
Is the function “shutdown” ever called?
How fast is my program?

The Dynamic Approach

Dynamic means that we will run the program with certain inputs, and closely observe its behavior. If we want to be precise, we could differentiate between Dynamic Testing and Dynamic Analysis. Testing focuses on functional behavior and thus on the outputs of your program compared to a specification. In contrast, Analysis does not need a specification, and focuses on internal behavior like memory leaks and buffer overflows (the kind of thing for which people don’t write tests).

Crucially, all forms of Dynamic Analysis or Testing work with concrete inputs, and that’s all we need to know for this discussion. More specifically, they need extremely many inputs…

Dynamic analysis observes how your program behaves when running with specific inputs, and you have to choose these inputs.

Some questions are difficult to answer

To get started, let’s be naive and just call the above program with all possible inputs. Maybe something like this (”test harness”):

procedure test is 
  ret : Integer;
begin
  -- for all inputs ...
  for value in INT_MIN .. INT_MAX loop  

    -- ... call our function ...
    ret := foo(value);

  end loop;
end foo;

How do we answer our questions?

Can my program return negative values? Easy, we simply check the output of our function after each call.
Is the function “shutdown” ever called? Not so easy. With additional tools like coverage profilers, we can see how often our function has been called. However, if that function really shuts down our machine, it would interrupt our tests. In that case, we must replace that function with something else (”mock”), for the sake of testing.
Can my program have a numeric overflow? Not so easy. Some languages detect it (e.g., Ada and sometimes Rust) and throw exceptions. Others, like C++ don’t have built-in mechanisms. For such languages, we can use compiler extensions to achieve the same, like sanitizers.
How fast is my program? Not so easy. We can use profilers to answer this question. It will not be terribly precise, but at least we can identify our bottlenecks. If we need to know the precise timing, we either need to deploy hardware like external tracers, or performance counters in the CPU.

Hence, our test program could look like this:

procedure shutdown is
begin
  -- replaces the real shutdown
  print("I am shutting down");
end shutdown;

procedure test is
  ret : Integer;
begin
  -- for all inputs ...
  for value in INT_MIN .. INT_MAX loop  

    -- ... call our function ...
    ret := foo(value);

    -- and check the outputs:
    if ret < 0 then
      error("Negative number returned"); 
    end;
  end loop;
end foo;

In summary, we will get our answers, but depending on the questions which we are asking, we have need to write additional code, modify our program, and use additional tools. None of our questions can be answered without additional work.

Moreover, our answers are only “safe” if you were testing with the right inputs. We will come to that later…

It is quick and precise

The good thing: Whenever Dynamic Analysis finds a bug, then we can be sure (unless the test is wrong, which is unheard of) that our program really has a bug. In other words, the results from dynamic analysis are precise, and have no false alarms (we don’t talk about the exceptions here).

Additionally, we get really nice data to debug our program, for example a core dump:

stack trace and call context,
variable values,
…

Another advantage is, that we can write and run tests for pretty much every software and scenario, regardless of complexity and property. As long as we can express it in code, we can test it.

It is biased by human opinion

However, there is a big problem. Humans cannot write perfect tests. Here is why: If we write test cases for our own software, then we tend to write tests that confirm that our software works as we expect it. In other words, we don’t think enough about opposing facts that might challenge our opinion. This is a well-studied human weakness, called Confirmation Bias.

As a real life example: You are considering buying a new car, and you already think that a certain brand makes the best cars. Conformation Bias will lead you to:

Actively seek out positive reviews that confirm your belief.
If you find negative reviews (e.g., poor build quality, high repair costs), you dismiss them as biased or unimportant.
You pay more attention to friends who own and love this car and ignore those who had bad experiences.
You interpret neutral facts (e.g., “range is lower in cold weather”) in a way that supports your view (“It’s still better than gas cars!”).

Another human weakness is the False Consensus effect: When thinking about test cases, you implicitly assume that users will interact with the software in the same way than you do. Guess what, they are not:

Your edge case is my use case.

For that reason, safety standards for critical systems (e.g., ISO 26262 for cars and IEC 61508 for industrial controls) require independence between developers and testers, especially for critical software.

Does it solve the human condition? No, humans have invented many more ways to mess up. For example, there is also the Availability Heuristic: To estimate risk, we use a mental shortcut. If we can remember something easily, we consider it as likely. Real life example: If you see a news report about shark attacks, it is suddenly on your mind. If you see a second one in the same week, then your brain will probably overrate the risk and decide to skip swimming in the sea. In reality, it is more likely that you are hit by lightning. But that doesn’t bother us so much, since we don’t see much news coverage on electrified humans.

Similarly, for software testing, we tend to write tests about bugs they we have encountered recently or repeatedly, and we may easily miss what really matters.

You are wasting time – The complexity challenge

There is more bad news. If you use dynamic analysis to find robustness bugs, you are wasting time. On a 32-bit system, our native test loop would run 2^32 times. And only one input (value 42) answers the question “is shutdown() ever called”. That means 99.9999999767169% of all test cases have been useless.

Now imagine a real software with thousands of classes, hundreds of files, and millions lines of code. How many tests do you have to write, in order to find only 1% of bugs? And how in many different ways can your program run? To answer this question, there exist various coverage metrics. For example, function coverage (how many functions have been called), statement coverage (how many lines have been reached) and decision coverage.

Reaching sufficient coverage can be mind-numbing, and require a lot of tests. In avionic software, it is not uncommon to have several thousand test cases per 10,000 lines of code. The Boeing 777 has around 2 million lines of code, so developers should have written about 200,000 tests. Sounds like a busy Sunday afternoon.

Yes, there are powerful test frameworks which can instrument your code to automatically check for things like overflows and memory errors. And some even generate “interesting” inputs to only test corner cases, like Boundary Value Analysis. Moreover, there exist really advanced fuzz testing tools (e.g., this one) which use genetic algorithms. They are really good for robustness testing and worth a try, but there is another problem…

Complexity and Non-Determinism kill Testing

Did you see the random() thing in the original code?

...
for i in 0 .. 100 loop
  if random() > 10 then  -- attention here, please!
    ...
  end;
end;
...

This for-loop has 100 repetitions, and in every iteration we have two possibilities – either we enter the if-statement, or not, depending on the value that the function returns. Hence, this simple loop has 2^100 possible ways to run. That is about thousand octillion tests, a number with 30 digits that hopefully nobody of us should ever have to write on paper. Since I like to help out, here is how many test cases that would be:

1,267,650,600,228,229,401,496,703,205,376

And this assumes that you can somehow control the randomness, so that you never get the same input twice.

While you may not have random() in your code, you may likely have some data source like a sensor, that can return different values every time you read it. You would end up in a similarly high number of constellations. And this is one loop, and one input. Typical programs are larger. Hence, it is practically impossible to find all bugs by testing. If you still don’t believe me, maybe you believe Edsger Dijkstra, who won the “Nobel Price of Computer Science” of and backs me up on this.

The Static Approach

Static Analysis means we give the source code to a magic tool which answers the questions above. It is not requiring specific inputs, accepts non-determinism, and find also convoluted bugs. How the magic works, is explained in an earlier post.

Static Analysis considers all possible inputs, and predicts how your program will behave. Mis-predictions included.

Most questions are easy to answer

Like Dynamic Analysis, we cannot expect an answer without additional work. But amazingly, we get pretty close to automatic answers:

Can my program have a numeric overflow? Easy. Most Static Analysis tools check this automatically. Typically, they check for a bunch of standard properties like read-before-write, invalid pointers, incorrect use of library functions etc.
Can my program return negative values? It depends on your Static Analysis tool. Some tools give you additional data about variable values, others not. For the latter case, you can try to put an assertion in your code, and see if the tool is checking that for you.
Is the function “shutdown” ever called? Easy. This is also a standard check for Static Analysis. Most tools report on functions that are never called, or give show you the callers.
How fast is my program? This question typically remains open. Most Static Analysis tools focus on source code. While they have some knowledge on the target, they don’t model the processor, its caches and all the other things that influence timing. There are some Static Analysis tools for timing, but they require a lot of user information, and generally only give pessimistic answers (wild memories from my PhD thesis…).

Low Effort and Unbiased

The good news first: Unlike Dynamic, we don’t have to write any tests. Instead, we simply press a button, and Static Analysis does its job. That means it is free from human biases and shortcomings, and we get better answers. And because we don’t have to write test cases, it is lower effort.

Even if we change our code structure, rename functions, add more arguments etc, there is no additional work. Yes, some adaptations in setup may be needed, like setting the entry point. But we don’t have to re-write any test cases, or add new ones as we add new code.

It sees more than Dynamic Analysis

The biggest difference between Static and Dynamic is how many execution scenarios are tested. While Dynamic only runs the inputs that we specify, Static Analysis tries to consider all possible inputs. This way, it can even find bugs in rarely executed paths, for which humans forgot to write tests.

Static Analysis goes even deeper. While Dynamic only analyzes the behavior, Static can also judge your coding style. This is important because your style defines how error-prone your programs are, and how easily they can be updated without breaking. So for example, if your preferred C++ coding style is “be brave and avoid containers”, you are much more prone to memory errors. And if somebody is trying to contribute to your messy pointer project, then they are also more likely to make mistakes. Coding style matters.

Static Analysis can also see your coding style, not only the behavior.

No Problem: Non-Determinism and Missing Code

One more good thing: With Static Analysis, we don’t have to fear sensors, or randomness anymore. As discussed earlier, it is very difficult to cover all possible scenarios when we cannot control all inputs for our program. Static Analysis has a trick up its sleeve…

Whenever Static Analysis encounters a function for which it cannot predict the output (e.g., no code is available, or we read a sensor), it will assume all possibilities at once. Our for loop with the random value becomes something like this:

for i in 0 .. 100 loop
  if maybe() then
    ...
  else
    ...
  end;
end;

Now in every loop iteration, Static Analysis considers both possibilities for the control flow. At once. This sounds a bit like Schrödinger’s Cat, but it’s more animal-friendly and more useful.

How does Static Analysis know what are the possibilities? Well, that comes down to two things:

A good value analysis – see earlier post.
Knowing the function declaration. For example, if we look at the Ada standard library, we can see that random() has the following signature:

subtype Uniformly_Distributed is Float range 0.0 .. 1.0;
function Random (Gen : Generator) return Uniformly_Distributed;

Hence, Static Analysis knows that the returned value is always between zero and one, simply from the data types in the declaration of the function. Ironically, this means that I have made a mistake in my original program: The condition >10 is never true. A case in point, that I am a bad developer without Static Analysis.

Slow response time and sometimes imprecise

Now some bad news about Static Analysis: The complexity problem which exists for Dynamic, also applies here. If a program has many internal states, a lot of sensors, and complex algorithms, it can take a long time to analyze it. Even worse, most Static Analysis tools only provide the results when the analysis is completed. In practice, that means that we sometimes wait hours or even days to get the result. And if we are “unlucky”, the result is “nothing found”.

In other words, Static Analysis is challenging for large, monolithic software. And when I say “large”, I mean 1 million lines of codes and more. Yes, it can be done, but typically the analysis has to take shortcuts. For example, it may stop separating call contexts, or it may revert to a simplified value analysis. If you want to learn more about safe shortcuts, have a look at this great website. As a consequence, precision can get lost. And that means, unlike with Dynamic Analysis, that we may get false alarms.

As a consequence, we better use Static Analysis on small parts of our software.

Doesn’t know your intention

We have discussed this limitation before. Static Analysis simply doesn’t “know” what your program is supposed to do. It will happily check your code and find all bugs, but it cannot tell if your code does the right thing. That is something which is better done with Dynamic Testing, where we express our expectations explicitly in the test cases, albeit not necessarily correctly or completely.

Notably, there are some types of Static Analysis that can also do functional verification. But it boils down to you telling the analysis what your program is supposed to do. Often in cryptic language, which in itself carries the risk of having bugs (that happened to me…I have successfully verified the robustness of the program and its functional behavior, but my functional specification was wrong).

When to use which?

Both Static and Dynamic approaches have their strength and weaknesses. Here is the summary of what I have lengthily elaborated above (green is good, red is bad):

	Dynamic	Static
suitable for	check behavior & performance	check robustness & style/guidelines
unsuitable for	check robustness & style/guidelines	check performance & function
misses bugs	yes, many	yes, some
false warnings	no	yes
supports non-determinism	no	yes
human effort	high	low
human bias	yes	no
results latency	low	high
results per second	low	high
Martin likes it	yes	YES

Looking at the table above, it is almost as subtle as a sledgehammer: Static Analysis is best used early, especially when your code is not complete and when you have non-determinism or hardware-related code. It is an excellent tool to find robustness issues, and to overcome human weaknesses. Dynamic Analysis complementary. It is best used late, to check that the code is doing the right thing and with good performance. It requires human input.

If you try doing it the other way around, then good luck. This is a another good recipe to annoy your colleagues. Static Analysis will punish you with many findings, at a time when it’s too late to change your code. Meanwhile, early Dynamic Analysis will punish you with writing a lot of mocks, harnesses and maybe even simulation, because your software is not complete, lacks context and you need to control non-determinism.

Now you may have objections like “I am using test-driven development” or “I always analyze my program on release day”. Please do that, this is both most excellent. But in both of these situations you are not doing it to find bugs. You do early Dynamic Testing to verify that your program does the right thing (correctness), and you do late Static Analysis to confirm that you have no bugs (which ironically, most Static Analysis tools cannot do).

Static Analysis is best used early, to find and fix bugs, even when your software is incomplete. Dynamic Analysis is best used late, to check the functionality and performance of your software.

What’s next?

It is time to talk about False Positives, a.k.a. needless warnings. The Achilles Heel of Static Analysis.