Atheris: A Coverage-Guided, Native Python Fuzzer
Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.
Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.6-3.10.
You can install prebuilt versions of Atheris with pip:
pip3 install atheris
These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.
Building from Source
Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of
clang on your path, installation from source is as simple as:
# Build latest release from source pip3 install --no-binary atheris atheris # Build development code from source git clone https://github.com/google/atheris.git cd atheris pip3 install .
If you don't have
clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.
Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.
Installing Against New LLVM
# Building LLVM git clone https://github.com/llvm/llvm-project.git cd llvm-project mkdir build cd build cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" ../llvm make -j 10 # This step is very slow # Installing Atheris CLANG_BIN="$(pwd)/bin/clang" pip3 install <whatever>
#!/usr/bin/python3 import atheris with atheris.instrument_imports(): import some_library import sys def TestOneInput(data): some_library.parse(data) atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz()
When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.
Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:
You can instrument the libraries you import:
with atheris.instrument_imports(): import foo from bar import baz
This will cause instrumentation to be added to
bar, as well as any libraries they import.
Or, you can instrument individual functions:
@atheris.instrument_func def my_function(foo, bar): print("instrumented")
Or finally, you can instrument everything:
Put this right before
atheris.Setup(). This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.
Atheris can additionally instrument regular expression checks, e.g.
To enable this feature, you will need to add:
To your script before your code calls
Internally this will import the
re module and instrument the necessary functions.
This is currently an experimental feature.
Why am I getting "No interesting inputs were found"?
You might see this error:
ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.
You'll get this error if the first 2 calls to
TestOneInput didn't produce any
coverage events. Even if you have instrumented some Python code,
this can happen if the instrumentation isn't reached in those first 2 calls.
(For example, because you have a nontrivial
TestOneInput). You can resolve
this by adding an
atheris.instrument_func decorator to
atheris.instrument_all(), or moving your
TestOneInput function into an
Visualizing Python code coverage
Examining which lines are executed is helpful for understanding the
effectiveness of your fuzzer. Atheris is compatible with
coverage.py: you can run your fuzzer using
coverage.py module as you would for any other Python program. Here's an
python3 -m coverage run your_fuzzer.py -atheris_runs=10000 # Times to run python3 -m coverage html (cd htmlcov && python3 -m http.server 8000)
Coverage reports are only generated when your fuzzer exits gracefully. This happens if:
- you specify
-atheris_runs=<number>, and that many runs have elapsed.
- your fuzzer exits by Python exception.
- your fuzzer exits by
No coverage report will be generated if your fuzzer exits due to a
crash in native code, or due to libFuzzer's
-runs flag (use
If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will
attempt to generate a report but may be unable to (depending on your code).
For consistent reports, we recommend always using
If you'd like to examine coverage when running with your corpus, you can do that with the following command:
python3 -m coverage run your_fuzzer.py corpus_dir/* -atheris_runs=$(( 1 + $(ls corpus_dir | wc -l) ))
This will cause Atheris to run on each file in
<corpus-dir>, then exit.
Note: atheris use empty data set as the first input even if there is no empty file in
Importantly, if you leave off the
-atheris_runs=$(ls corpus_dir | wc -l), no
coverage report will be generated.
Using coverage.py will significantly slow down your fuzzer, so only use it for visualizing coverage; don't use it all the time.
Fuzzing Native Extensions
In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.
Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This has the advantage of not requiring any grammar definition for generating inputs, making its setup easier. The disadvantage is that it will be harder for the fuzzer to generate inputs for code that parses complex data types. Often the inputs will be rejected early, resulting in low coverage.
Atheris supports custom mutators (as offered by LibFuzzer) to produce grammar-aware inputs.
Example (Atheris-equivalent of the example in the LibFuzzer docs):
@atheris.instrument_func def TestOneInput(data): try: decompressed = zlib.decompress(data) except zlib.error: return if len(decompressed) < 2: return try: if decompressed.decode() == 'FU': raise RuntimeError('Boom') except UnicodeDecodeError: pass
To reach the
RuntimeError crash, the fuzzer needs to be able to produce inputs
that are valid compressed data and satisfy the checks after decompression.
It is very unlikely that Atheris will be able to produce such inputs: mutations
on the input data will most probably result in invalid data that will fail at
To overcome this issue, you can define a custom mutator function (equivalent to
This example produces valid compressed data. To enable Atheris to make use of
it, pass the custom mutator function to the invocation of
def CustomMutator(data, max_size, seed): try: decompressed = zlib.decompress(data) except zlib.error: decompressed = b'Hi' else: decompressed = atheris.Mutate(decompressed, len(decompressed)) return zlib.compress(decompressed) atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator) atheris.Fuzz()
As seen in the example, the custom mutator may request Atheris to mutate data
atheris.Mutate() (this is equivalent to
You can experiment with custom_mutator_example.py and see that without the mutator Atheris would not be able to find the crash, while with the mutator this is achieved in a matter of seconds.
$ python3 example_fuzzers/custom_mutator_example.py --no_mutator [...] #2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb #524288 pulse cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 262144 rss: 37Mb #1048576 pulse cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 349525 rss: 37Mb #2097152 pulse cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 299593 rss: 37Mb #4194304 pulse cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 279620 rss: 37Mb [...] $ python3 example_fuzzers/custom_mutator_example.py [...] INFO: found LLVMFuzzerCustomMutator (0x7f9c989fb0d0). Disabling -len_control by default. [...] #2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb #3 NEW cov: 4 ft: 4 corp: 2/11b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 1 Custom- #12 NEW cov: 5 ft: 5 corp: 3/21b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 7 Custom-CrossOver-Custom-CrossOver-Custom-ChangeBit-Custom- === Uncaught Python exception: === RuntimeError: Boom Traceback (most recent call last): File "example_fuzzers/custom_mutator_example.py", line 62, in TestOneInput raise RuntimeError('Boom') [...]
Custom crossover functions (equivalent to
LLVMFuzzerCustomCrossOver) are also
supported. You can pass the custom crossover function to the invocation of
atheris.Setup. See its usage in custom_crossover_fuzz_test.py.
Structure-aware Fuzzing with Protocol Buffers
libprotobuf-mutator has bindings to use it together with Atheris to perform structure-aware fuzzing using protocol buffers.
See the documentation for atheris_libprotobuf_mutator.
Integration with OSS-Fuzz
Atheris is fully supported by OSS-Fuzz, Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang.
atheris module provides three key functions:
In your source file, import all libraries you wish to fuzz inside a
with atheris.instrument_imports():-block, like this:
# library_a will not get instrumented import library_a with atheris.instrument_imports(): # library_b will get instrumented import library_b
Generally, it's best to import
atheris first and then import all other libraries inside of a
with atheris.instrument_imports() block.
Next, define a fuzzer entry point function and pass it to
atheris.Setup() along with the fuzzer's arguments (typically
sys.argv). Finally, call
atheris.Fuzz() to start fuzzing. You must call
include: A list of fully-qualified module names that shall be instrumented.
exclude: A list of fully-qualified module names that shall NOT be instrumented.
This should be used together with a
with-statement. All modules imported in
said statement will be instrumented. However, because Python imports all modules
only once, this cannot be used to instrument any previously imported module,
including modules required by Atheris. To add coverage to those modules, use
A full list of unsupported modules can be retrieved as follows:
import sys import atheris print(sys.modules.keys())
func: The function to instrument.
This will instrument the specified Python function and then return
is typically used as a decorator, but can be used to instrument individual
functions too. Note that the
func is instrumented in-place, so this will
affect all call points of the function.
This cannot be called on a bound method - call it on the unbound version.
This will scan over all objects in the interpreter and call
every Python function. This works even on core Python interpreter functions,
instrument_imports cannot do.
This function is experimental.
Setup(args, test_one_input, internal_libfuzzer=None)
args: A list of strings: the process arguments to pass to the fuzzer, typically
sys.argv. This argument list may be modified in-place, to remove arguments consumed by the fuzzer. See the LibFuzzer docs for a list of such options.
test_one_input: your fuzzer's entry point. Must take a single
bytesargument. This will be repeatedly invoked with a single bytes container.
internal_libfuzzer: Indicates whether libfuzzer will be provided by atheris or by an external library (see native_extension_fuzzing.md). If unspecified, Atheris will determine this automatically. If fuzzing pure Python, leave this as
This starts the fuzzer. You must have called
Setup() before calling this function. This function does not return.
In many cases
Fuzz() could be combined into a single function, but they are
separated because you may want the fuzzer to consume the command-line arguments it handles
before passing any remaining arguments to another setup function.
bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.
You can construct the FuzzedDataProvider with:
fdp = atheris.FuzzedDataProvider(input_bytes)
The FuzzedDataProvider then supports the following functions:
def ConsumeBytes(count: int)
def ConsumeUnicode(count: int)
Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.
def ConsumeUnicodeNoSurrogates(count: int)
Consume unicode characters, but never generate surrogate pair characters.
def ConsumeString(count: int)
ConsumeBytes in Python 2, or
ConsumeUnicode in Python 3.
def ConsumeInt(int: bytes)
Consume a signed integer of the specified size (when written in two's complement notation).
def ConsumeUInt(int: bytes)
Consume an unsigned integer of the specified size.
def ConsumeIntInRange(min: int, max: int)
Consume an integer in the range [
def ConsumeIntList(count: int, bytes: int)
Consume a list of
count integers of
def ConsumeIntListInRange(count: int, min: int, max: int)
Consume a list of
count integers in the range [
Consume an arbitrary floating-point value. Might produce weird values like
Consume an arbitrary numeric floating-point value; never produces a special type like
Consume a floating-point value in the range [0, 1].
def ConsumeFloatInRange(min: float, max: float)
Consume a floating-point value in the range [
def ConsumeFloatList(count: int)
Consume a list of
count arbitrary floating-point values. Might produce weird values like
def ConsumeRegularFloatList(count: int)
Consume a list of
count arbitrary numeric floating-point values; never produces special types like
def ConsumeProbabilityList(count: int)
Consume a list of
count floats in the range [0, 1].
def ConsumeFloatListInRange(count: int, min: float, max: float)
Consume a list of
count floats in the range [
def PickValueInList(l: list)
Given a list, pick a random value