NAME
App::Test::Generator::CoverageGuidedFuzzer - AFL-style coverage-guided fuzzing for App::Test::Generator
VERSION
Version 0.29
SYNOPSIS
use App::Test::Generator::CoverageGuidedFuzzer;
my $fuzzer = App::Test::Generator::CoverageGuidedFuzzer->new(
schema => $yaml_schema, # your existing parsed YAML schema
target_sub => \&My::Module::validate,
iterations => 200,
seed => 42,
);
my $report = $fuzzer->run();
$fuzzer->save_corpus('t/corpus/validate.json');
DESCRIPTION
Implements coverage-guided fuzzing on top of App::Test::Generator's existing schema-driven input generation. Instead of purely random generation, it:
1. Generates or mutates a structured input
2. Runs the target sub under Devel::Cover to capture branch hits
3. Keeps inputs that discover *new* branches in a corpus
4. Preferentially mutates corpus entries in future iterations
This is the Perl equivalent of what AFL/libFuzzer do at the byte level, but operating on typed, schema-validated Perl data structures.
HOW CORPUS FILES ARE USED
Overview
Each time extract-schemas --fuzz runs, it creates or updates one JSON file per fuzzed method under schemas/corpus/ (or --corpus-dir if specified). For example:
schemas/corpus/translate.json
schemas/corpus/lookup.json
These files are the fuzzer's memory. Without them, every run starts from scratch. With them, each run builds on the discoveries of every previous run.
What is stored in a corpus file
Each file is a JSON object with three keys:
{
"seed": 1234567890,
"corpus": [
{ "input": "en" },
{ "input": "12345678901" },
...
],
"bugs": [
{ "input": "...", "error": "..." }
]
}
The corpus array contains every input that was judged "interesting" during past runs. An input is interesting if it triggered at least one branch in the target code that no previous input had reached. These are the inputs that proved useful for exploring the method's behaviour - not just random values, but ones that actually exercised distinct paths through the code.
The bugs array records every input that caused the target method to die or throw an exception, along with the error message. This is preserved across runs so you have a permanent record of discovered failure cases even after fixing them.
The seed records the random seed of the run that created the file. This is informational only and is not reused on subsequent runs.
How the corpus is used at the start of a run
When extract-schemas --fuzz runs and finds an existing corpus file for a method, it calls load_corpus() before starting the fuzzing loop. This pre-populates the fuzzer's internal corpus with all the previously interesting inputs. They are loaded with an empty coverage hash (coverage => {}) because coverage state from a previous process cannot be restored - only the inputs themselves are persisted.
How the corpus influences the fuzzing loop
During the main fuzzing loop, on each of the --fuzz-iters iterations, the fuzzer makes a weighted random choice:
- 70% of iterations: mutate a corpus entry
-
A random entry is picked from the corpus (which now includes both the loaded entries from previous runs and any new entries discovered in this run so far). That entry's input is mutated - characters are flipped, numbers are nudged, strings are truncated or extended, array elements are duplicated or deleted - and the mutated value is run against the target method.
The key property here is that mutations are applied to inputs that are already known to reach interesting parts of the code. Rather than generating a fresh random string that will probably hit the same early conditional as everything else, the fuzzer is specifically probing the neighbourhood of inputs that previously pushed into new territory.
- 30% of iterations: fresh random generation
-
A completely new input is generated from the schema. This is the exploration budget - it ensures the fuzzer does not get permanently stuck mutating a narrow slice of the input space and occasionally tries something entirely new.
How new entries are added to the corpus during a run
After each input is run against the target method, the fuzzer checks whether it was interesting. With Devel::Cover available, an input is interesting if it hit at least one branch that no previous input in this session had hit. Without Devel::Cover, 20% of inputs are kept at random so the corpus continues to grow even without branch feedback.
Interesting inputs are appended to the in-memory corpus immediately, so they can be selected and mutated within the same run. They are also written to the JSON file at the end of the run via save_corpus().
How the corpus grows across multiple runs
On the first run, the corpus file does not exist. The fuzzer seeds itself with five randomly generated inputs, runs all iterations, and saves the interesting ones. A typical first run might produce a corpus of 15-30 entries.
On the second run, those 15-30 entries are loaded before any iteration begins. The fuzzer immediately starts mutating inputs that are already known to reach interesting branches, rather than spending iterations rediscovering them from scratch. It finds new interesting inputs on top of the existing ones, and the corpus grows further.
By the third, fourth and subsequent runs the corpus has stabilised for the easy-to-reach branches and is increasingly focused on harder-to-reach ones. The coverage plateau is reached more slowly each time, which is exactly the right behaviour - the fuzzer is spending its budget on genuinely new territory.
Practical implications
- Running once gives limited value; running repeatedly gives compounding value.
-
The first run with 100 iterations is roughly equivalent to
App::Test::Generatorwith 100 random iterations. By the fifth run, the corpus is directed at branches that purely random generation would almost never reach. - The corpus is human-readable and editable.
-
Because inputs are stored as plain JSON values, you can open a corpus file and add your own known-tricky inputs by hand. They will be picked up on the next run and mutated like any other corpus entry.
- Deleting a corpus file resets the fuzzer for that method.
-
If you significantly change a method's implementation, the old corpus may be less useful. Delete the relevant
schemas/corpus/method.jsonand the fuzzer will start fresh with the new code. - The bugs array is a regression record.
-
Even after you fix a bug that was found by fuzzing, the input that triggered it remains in the
bugsarray of the corpus file. You can use these as the basis for specific regression tests to ensure the fix holds.
Corpus file location
By default corpus files are written to schemas/corpus/, one file per method, named method_name.json. This can be changed with the --corpus-dir option:
extract-schemas --fuzz --corpus-dir t/corpus lib/MyModule.pm
It is recommended to commit the corpus directory to version control. This means every developer and every CI run benefits from the accumulated discoveries of all previous runs rather than starting from scratch each time.
run
Run the coverage-guided fuzzing loop. Returns a hashref summary report.
corpus
Returns the accumulated corpus as an arrayref of hashrefs with keys input and coverage.
bugs
Returns bugs found as an arrayref of hashrefs with keys input and error.
save_corpus( $path )
Serialises the corpus to a JSON file so it can be replayed or extended. Requires JSON::MaybeXS or JSON.
load_corpus( $path )
Loads a previously saved corpus JSON file, pre-seeding the fuzzer so it continues from where it left off.
AUTHOR
Nigel Horne, <njh at nigelhorne.com>
Portions of this module's initial design and documentation were created with the assistance of AI.
LICENCE AND COPYRIGHT
Copyright 2026 Nigel Horne.
Usage is subject to licence terms.
The licence terms of this software are as follows:
Personal single user, single computer use: GPL2
All other users (including Commercial, Charity, Educational, Government) must apply in writing for a licence for use from Nigel Horne at the above e-mail.