Battling the dynamic linker with lazy bindings and the AFL++ fuzzer

Notes on fuzzing with AFL and shared libraries can't resolve symbols

Battling the dynamic linker with lazy bindings and the AFL++ fuzzer
💡
This is a very specific solution to a very specific issue with AFL++ that not many will probably come across. I've written this as a means to help others who may by chance stumble upon it. There are some details on how generally linking is done, so the content may still be useful outside of the AFL++ issue.

TLDR; Set LD_BIND_LAZY=1 when running afl-fuzz where symbols which can’t be resolved (and are not required) at runtime as by default, afl-fuzz will tell the dynamic linker to do all relocations on initialization by forcing LD_BIND_NOW.

Dynamic woes

AFL (American Fuzzy Lop) is a mutational fuzzer which dynamically instruments code. It’s recommended that the instrumented binaries you throw at it are statically compiled:

Also - if possible - you should always configure the build system in such way that the target is compiled statically and not dynamically. How to do this is described below. The #1 rule when instrumenting a target is: avoid instrumenting shared libraries at all cost. Always compile libraries you want to have instrumented as static and link these to the target program!

Now what if you have in front of you libraries you want to fuzz that have a significant number of dependencies - a complex codebase with volumes of shared objects? What if some symbols - functions - that are never called can’t be resolved by the dynamic linker?

This blog post serves to help others who might be facing the issue as described below.

I recently ran into a challenge with afl-fuzz refusing to run due to a linking error that would only occur when the instrumented program was run through afl-fuzz, specifically Error while loading shared libraries:

$ ./main.o
$ echo $?
0

$ AFL_DEBUG=1 afl-fuzz -i i -o o ./main.o

<.. cut ..>
./main.o: error while loading shared libraries: liba.so: cannot open shared object file: No such file or directory
<.. cut ..>
[-] PROGRAM ABORT : Fork server handshake failed
         Location : afl_fsrv_start(), src/afl-forkserver.c:1422

(Note that with the original AFL, AFL_DEBUG is not available and hence afl-showmap can be used to emit linker errors.)

Notably, when ./main.o is run without afl-fuzz, there are no runtime linking errors. This symptom here is occuring only within afl-fuzz.

In the case of the software libraries I was instrumenting, the flag --allow-shlib-undefined was used in the linker flag in the respective Makefiles. Without this flag the build would fail due to undefined symbol references. These undefined references were the exact same ones the dynamic linker would complain about during execution of the instrumented binary within afl-fuzz.

Background

Consider we have two libraries, liba and libbliba calls funcB which is implemented in libb:

liba.c:

#include "libb.h"

void funcA(void) {
    funcB();
    return;
}

libb.c:

void funcB(void) {
    return;
}

Compiling both libraries as loadable shared objects:

$ gcc -shared -Wall -Werror -fpic liba.c -o liba.so
$ gcc -shared -Wall -Werror -fpic libb.c -o libb.so 

Notably, funcB is undefined (U) in liba.so:

$ nm liba.so  | grep func
0000000000001139 T funcA
                 U funcB

If we compile liba into a normal executable, the linker will complain that it can’t find funcBmain.c:

#include "liba.h"
int main(void) {
    funcA();
    return;
}
$ gcc -Wall -o main.o main.c -L. -la
/usr/bin/ld: ./liba.so: undefined reference to `funcB'
collect2: error: ld returned 1 exit status

That’s good! It means we need to link libb using -lb. e.g.

Or do we?

Enter: --allow-shlib-undefined.

By default, undefined symbols are allowed when compiling shared libraries linked against each other. That is why we could compile liba without it knowing anything about libb's function funcB(). The ld man page offers an explanation on this default behaviour:

The reasons for allowing undefined symbol references in shared libraries specified at link time are that:A shared library specified at link time may not be the same as the one that is available at load time, so the symbol might actually be resolvable at load time.There are some operating systems, eg BeOS and HPPA, where undefined symbols in shared libraries are normal.

This default does not apply when when linking a shared library with a normal object (our main.c):

$ gcc -Wl,--allow-shlib-undefined -Wall -o main.o main.c -L. -la

(note the omission of -lb above)

Now the relocation of funcB will be done at run-time - and the program will bail.

$ export LD_LIBRARY_PATH=$(pwd)
$ ./main.o
in libA: funcA()
        calling funcB()
./main.o: symbol lookup error: ./liba.so: undefined symbol: funcB

Setting environment variable LD_DEBUG=symbols we can see what’s happening:

$ LD_DEBUG=symbols ./main.o

      5503:     symbol=funcB;  lookup in file=./main.o[0]
      5503:     symbol=funcB;  lookup in file=./liba.so [0]
      5503:     symbol=funcB;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
      5503:     symbol=funcB;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
      5503:     ./liba.so: error: symbol lookup error: undefined symbol: funcB (fatal)
./main.o: symbol lookup error: ./liba.so: undefined symbol: funcB
💡
On a site note, libb.so can pre-loaded at run time which would make the linker happy:
LD_PRELOAD=./libb.so ./main.o

Now we come to the crux of the issue: funcB() was only called by main() under a branch condition? In the revised main.c below, funcA() is only called if the argument ‘1’ is passed in.

#include <string.h>
#include <stdio.h>
#include "liba.h"

int main(int argc, char **argv) {
    if (argc > 1 && strcmp(argv[1],"1") == 0) {
        funcA();
    }
    printf("Exiting cleanly\n");
    return 0;
}

Running with no parameter and no errors are returned:

$ ./main.o
Exiting cleanly.

Running with 1 as a parameter and an error is returned:

$ ./main.o 1
./main.o: symbol lookup error: /home/user/link_test/liba.so: undefined symbol: funcB

This is lazy binding in action - the symbol funcB is only resolved when it’s needed. As such, we have no idea if the symbol can be found until it’s needed during run-time.

So what does all this have to do with AFL / AFL++ ?

afl-fuzz

If we run afl-fuzz against our program it will fail to start, even if there is no call to funcB() (no 1 parameter passed).

$ AFL_DEBUG=1 ~/AFLplusplus/afl-fuzz -i i -o o ./main.o
[+] Enabled environment variable AFL_DEBUG with value 1

<.. cut ..>

[*] Validating target binary...
[*] Spinning up the fork server...
./main.o: symbol lookup error: /home/user/link_test/liba.so: undefined symbol: funcB

For some reason, afl-fuzz is forcing the dynamic linker to try and resolve all symbols on initial execution.

And the reason is here:

/AFLplusplus/src/afl-forkserver.c :

    /* This should improve performance a bit, since it stops the linker from
       doing extra work post-fork(). */

    if (!getenv("LD_BIND_LAZY")) { setenv("LD_BIND_NOW", "1", 1); }

If the env variable LD_BIND_LAZY is not set then afl-fuzz will set LD_BIND_NOW. The ld man page explains what LD_BIND_NOW does:

If set to a nonempty string, causes the dynamic linker to resolve all symbols at program startup instead of deferring function call resolution to the point when they are first referenced. This is useful when using a debugger.

Further explanation is provided in the System V ABI specification, page 75:

If its value is non-null, the dynamic linker evaluates procedure linkage table entries before transferring control to the program. That is, the dynamic linker processes relocation entries of type R_X86_64_JUMP_SLOT during process initialization. Otherwise, the dynamic linker evaluates procedure linkage table entries lazily, delaying symbol resolution and relocation until the first execution of a table entry.

Let’s use this as an opportunity to do a very quick revisit on this with a mashup of GDB and readelf on our liba.so library:

  1. The code at funcB@got.plt jumps to a relative address to where the current instruction is (0x105b + 0x2fdb = 0x4018) which lands in the relocation section table. This code was generated by the linker when we ran gcc.
  2. This is entry at 0x4018 is marked as type R_X86_64_JUMP_SLOT. As noted earlier, the dynamic linker does the relocation of this entry by changing the value to the address of where funcB is located. With lazy binding, this will be done the first time funcB is called when the program is run.

funcA calls funcB, although the address of funcB() isn’t known. The address instead points to code in the at offset 

0x1050 (`funcB@got.plt) in the Procedure Linkage Table.


Now back to the issue at hand - we have five options:

  • Fix up the application toolchain to ensure that all symbols can be resolved during build
  • Figure out every unresolved symbol and the corresponding library, then LD_PRELOAD them before running afl-fuzz
  • Strip down as much as we can and remove any problematic functions from the source code
  • Set the environment variable LD_BIND_LAZY when running alf-fuzz.

If your dealing with a massively complex piece of software that doesn’t have great hygiene with it’s toolchain, then LD_BIND_LAZY could save many frustrating hours of fixing up someone else’s build scripts.

Running afl-fuzz again, this time allowing lazy binding (also note LD_LIBRARY_PATH must be set)::

LD_LIBRARY_PATH=$(pwd) LD_BIND_LAZY=1 ~/AFLplusplus/afl-fuzz -i i -o o ./main.o

And the fuzzer is up and running!

Note:

 If a function is called that can’t be found, AFL won’t be reporting this - it’s not a recorded crash. So a big disclaimer: You maybe loosing coverage if essential libraries are missing, e.g. LD_LIBRARY_PATH/LD_PRELOAD is not set correctly. For this reason it’s a good idea to log a sample of the execution, e.g:

AFL_DEBUG=1 LD_BIND_LAZY=1 ./afl-fuzz -i i -o o ./main.o 2>stderr.log

Note in the original AFL and AFL++ documentation the following is stated:

By default, LD_BIND_NOW is set to speed up fuzzing by forcing the linker to do all the work before the fork server kicks in. You can override this by setting LD_BIND_LAZY beforehand, but it is almost certainly pointless.

I would argue that it is not “certainly pointless” when your faced with a particularly complex codebase that doesn’t follow best practices and lands you in library dependency hell.

AFL vs AFL++

I discovered that the original afl behaves differently to afl++ when AFL_NO_FORKSRV is set.

  • afl will not set LD_BIND_NOW if AFL_NO_FORKSRV is set.
  • afl++will always set LD_BIND_NOW, regardless if AFL_NO_FORKSRV is set

We really want to avoid AFL_NO_FORKSRV as it drastically slows down the instrumented fuzzing and there is no need to set it. When AFL_NO_FORKSRV is set, every time the program is “rerun” within afl-fuzz, the whole dynamic linking process is restarted which can increase the execution time significantly - by a factor of 3x or more.