What is Fuzzing?

Standard Glossary of Software Engineering Terminology, IEEE, defines fuzzing as:

The degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.

Breaking it down into simpler terms, fuzzing is a testing technique for applications in which we pass random, invalid input to the target application. The application is then monitored for unexpected behavior. The unexpected behavior could be the application crashing, memory leakage, etc. that occur for previously unknown niche test cases that go beyond the scope of manual testing.

One thing to keep in mind is that invalid inputs are supposed to be valid enough that they are accepted by the target application for processing and don’t make the application crash right away. Their task is to help us find the hidden exceptions that are yet to be found within the application.

Fuzzers

Fuzzers are tools that aid in automating the fuzzing process and allow us to fine-tune various parameters according to the application being targeted.

Fuzzers can be broadly classified into the following types:

  • Mutational
    Mutational Fuzzers are often called dumb fuzzers. It is so because they try to randomize the input by mutating or changing the seed input cases provided for fuzzing randomly.

  • Grammar
    Grammar Fuzzers are the ones where we define how we want the input to be changed by defining rules about how to mutate the seed input.

  • Feedback-based
    Feedback-based Fuzzers are smart fuzzers which observe how a particular input affects the target binary and then mutates the seed input accordingly to optimize fuzzing.

Why Fuzz Applications?

Fuzzing, as mentioned earlier, is a testing process to find bugs in an application. Hence, the first goal of fuzzing is to find bugs in the target application that lay outside the scope of manual testing by a human.

Fuzzing also helps to develop a robust approach to development by incorporating new development strategies to avoid niche bugs found previously by fuzzing of other applications. This can save time and effort in correcting an issue that can be avoided entirely as it has already been observed elsewhere, for example, in a previous iteration of the same application.

A lot of software vendors also run responsible disclosure programs where one can disclose previously unknown bugs and get rewarded. So, spending some time on finding bugs with automated fuzzers can also make you money by just running a fuzzer on such an application.

American Fuzzy Lop

American Fuzzy Lop, or AFL for short, is a smart fuzzer. It mutates the seed input, given at the start of fuzzing, to generate new test cases which it thinks will lead to the discovery of new paths.

Before I explain the above statement, let me introduce you to two terms - code coverage and path coverage. Code Coverage refers to the amount of code that was triggered by a particular test case. Path Coverage refers to the number of potential sequence of code statements (or paths) that were triggered by a test case.

Let’s take an example. Refer the pseudo-code below:

if <condition 1>:
    # Statements

if <condition 2>:
    # Statements

In the above pseudo-code, code coverage could be 100% when different test cases trigger the conditional statements separately. Path Coverage, however, would mean how many of the total execution paths were covered. For this example, the ways in which the conditional statements can be triggered are both, condition 1 and condition 2 individually and then one case when they execute in the same test run sequentially. That gives us a total of 3 paths. Assuming our test cases only trigger the two conditions individually, we’ll get 100% code coverage whereas the path coverage is only 23 as we do not have a test case that triggers both conditional statements in the same test run.

Coming back to the statement we began this section with, AFL takes a set of files (or test cases) that will serve as the seed input to start fuzzing the target. AFL then interacts with the target binary, while it’s processing the input passed to it, and monitors what segment of code was triggered in what sequence i.e. it keeps track of code paths being triggered. Based on the paths being triggered, it mutates the seed files to trigger new code paths, thus increasing path coverage.

The creator of American Fuzzy Lop, Michal Zalewski, wrote a blog describing how smart AFL is. He starts off by creating a file with the word “hello” in it and then tries to pass that as seed input to an application that expects a JPEG image. After initial crashes, AFL figures out what the application is expecting and starts mutating the seed input to produce valid JPEG image files, from what initially was a text file. This demonstrates how impressive AFL is. Do give Michal’s blog a read as it explains in great detail how AFL ended up pulling JPEGs out of thin air.

Fuzzing with AFL

AFL is extremely easy to use, as we shall see. There’s a set of steps that we need to go through before unleashing AFL on an application.

Here’s AFL’s workflow in brief:

  • Compiling the binary for the target application with AFL’s compilers to intrument it.
  • Building a test corpus (seed test cases) to start the fuzzing process.
  • Running AFL on the instrumented binary of the target application.
  • Lastly, analyzing results.

Installing AFL

Installing AFL is quite straight-forward but before we install it, we need to have some prerequisite installed on our system.

Note: This setup was tested with Ubuntu 16.04

Let’s start with installing the prerequisites. Follow the commands below to install gcc and clang:

sudo apt install gcc
sudo apt install clang

Now we’ll install AFL with the following commands:

wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
tar -xzvf afl-latest.tgz
cd afl-2.52b/
make
sudo make install

AFL comes with multiple compilers to instrument binaries (we’ll talk about what instrumenting a binary means) which include the traditional gcc as well as clang and hence, even though after the previous step we’re ready to fuzz applications but the default gcc compiler that comes with AFL is slower compared to other upstream compilers that come with it. AFL leverages LLVM capabilities to make the fuzzing process faster. We can enable LLVM mode in AFL with the following commands:

cd afl-2.51b/llvm_mode/
sudo apt-get install llvm-dev llvm
make
cd ..
make
sudo make install

Setting up a Target

Now, we’re ready to fuzz applications, all we need is a target. We’ll use fuzzgoat, which is an intentionally vulnerable application written to demonstrate fuzzing. We can clone fuzzgoat from its repository as follows:

git clone https://github.com/fuzzstati0n/fuzzgoat
cd fuzzgoat

Compiling with AFL

We start off by compiling the binary for the target application with AFL’s compilers. This is necessary because it allows AFL to add some additional code in the compiled binary which allows AFL to talk to it while it’s running so it can generate new inputs to discover new code paths. The process of including AFL’s additional code in the binary while compiling the application is called instrumentation, the term I promised I’ll explain.

Note: The following commands are being run inside the fuzzgoat’s root directory

So, to compile the application with AFL’s compilers, we have to explicitly mention which one to use. Generally, it’s best to stick to afl-clang-fast but one can also use afl-clang-fast++, afl-gcc or afl-g++ depending on the use case. We’ll use afl-clang-fast:

export CC=afl-clang-fast

Now, depending on the application, we need to compile the application into a binary. For fuzzgoat, we run the below command but there could be some application which requires us to run ./configure before we use the make command to build the binary. Let’s compile fuzzgoat to a binary:

make

Building Test Corpus

Test Corpus is what I’ve already been talking about since the beginning of this blog, they are ‘seed input files’. They are a set of files (could also be a single file) which are used as the initial input to test the binary. It also serves as the starting point for AFL to mutate it to generate new test files as it sees fit to discover new code paths.

Although AFL is smart enough to do a lot of heavy lifting for us, including figuring out what could be good test inputs as we saw in Michal’s blog but one should build a good test corpus simply because it makes the whole fuzzing process faster. By giving AFL good initial test cases, it starts off at, say level ‘X’. Now, AFL could very well still have reached ‘X’ starting from a blank text file but the time it took to reach there could’ve been saved. Hence, always try to build good test cases depending upon the target application.

That being said, I’ll still be using a not-so-good test case, created by piping some random data because:

  • This blog is to learn how to work with AFL to fuzz applications
  • Building test cases based on the characteristics of an application goes beyond the scope of this blog
  • Since, fuzzgoat is intentionally buggy, bad test cases will also yield crashes in a reasonably small period of time

We’ll first make a directory to keep all our test cases. You can name it anything you like. I’ll name it afl_in:

mkdir afl_in

Now, let’s add a test case by piping some random, garbage value to the directory we made above:

cp /bin/ps afl_in/

Running AFL on Target

One last thing before we start fuzzing, we need to make a directory for AFL to store the files that resulted in a crash or a hang. AFL makes three subdirectories inside this folder - crashes holds the test cases which made the application crash, hangs holds the test cases which made the application hang and queue holds the test cases that AFL is yet to test the application with. I’ll name the directory afl_out but again, it can be named anything:

mkdir afl_out

Finally, to fuzz the application we use the following command:

afl-fuzz -i afl_in -o afl_out -- ./fuzzgoat @@

Breaking the above command into parts:

  • -i afl_in specifies the directory to take the seed test cases from
  • -o afl_out specifies the directory where AFL can store all result files for crashes, hangs and queue
  • -- separates the target’s command structure. The left side of the separation is where AFL’s flags are passed and the right side is where the target’s run command is, in this case, ./fuzzgoat
  • @@ defines the position where AFL is supposed to insert the test file in the target application’s command structure

Note: Using @@ is not mandatory. AFL can also pass input to the target through STDIN.

Running AFL should yield the following interface on your terminal:

AFL Main Screen

The interface and the information are quite self-explanatory but I’d put some emphasis on certain segments that you should definitely keep an eye on:

  • Process Timing Section -

    • If last new path shows a big duration, it means AFL is unable to find new paths. In this case, make sure that you’re using the instrumented binary and not just a normally compiled binary.
  • Overall Results Section -

    • Let the fuzzer run till it has at least 50 cycles done

Analysing Results

All that’s left is looking at the results. Let’s navigate to the directory where AFL has kept all the test cases that resulted in crashes or hangs:

cd afl_out

Looking inside /crashes or /hangs directories should have files with names resembling (but not the same) as depicted below:

~/afl_out
    |- /crashes
        |- id:000000,sig:11,src:000000,op:int32,pos:52,val:be:+32
        |- id:000001,sig:11,src:000000,op:int32,pos:52,val:be:+64
        |- id:000003,sig:11,src:000000,op:havoc,rep:32

    --- snipped ---

    |- /hangs
    |- /queue

Now, we can take a look in these files to see what exactly AFL mutate it to and then figure out why it made the application crash or hang. Finally, it’s on us what we want to do with the bugs we found with AFL.

Conclusion

In essence, fuzzing is a way to discover undiscovered bugs, and AFL makes it as easy as it can be.

AFL is a very powerful tool while remaining almost effortless to use. It makes the fuzzing process a matter of a few steps while it takes care of everything in the background.

Since this blog was meant to be a quick start guide to using AFL, there’s a lot of customization, that AFL provides, which wasn’t covered in this blog. AFL also provides various mechanisms for optimizing the process of fuzzing the target, like parallel fuzzing, optimizing test cases as a set and individually. I’ll write a follow-up blog on how to optimize fuzzing with AFL and also talk about some built-in tools that AFL comes pre-loaded.

References