NAME

extract-schemas - Extract test schemas from Perl modules

SYNOPSIS

extract-schemas [options] <module.pm>

Options:
  --output-dir DIR    Output directory for schema files (default: schemas/)
  --strict-pod=off|warn|fatal
  --verbose           Show detailed analysis
  --fuzz              Run coverage-guided fuzzing on extracted schemas
  --fuzz-iters N      Iterations per method when fuzzing (default: 100)
                      (no short form, to avoid conflict with --fuzz/-f)
  --fuzz-all          Fuzz all methods, including those with no input schema
  --corpus-dir DIR    Directory to persist fuzz corpora (default: schemas/corpus/)
  --help              Show this help message
  --man               Show full documentation

Examples:
  extract-schemas lib/MyModule.pm
  extract-schemas --output-dir my_schemas --verbose lib/MyModule.pm
  extract-schemas --fuzz lib/MyModule.pm
  extract-schemas --fuzz --fuzz-iters 300 --corpus-dir t/corpus lib/MyModule.pm
  extract-schemas --fuzz --fuzz-all lib/MyModule.pm

QUICK START

Run extract-schemas --strict-pod=warn -v --fuzz lib/MyModule.pm to analyse your module and automatically probe each method with hundreds of fuzzed inputs, looking for crashes caused by inputs that should be valid. Anything suspicious is saved to schemas/corpus/.

If genuine bugs are found, run fuzz-harness-generator --replay-corpus schemas/corpus/ -o t/fuzz_replay.t to turn them into regression tests that will fail until you fix the underlying code and pass forever after. Run extract-schemas --fuzz regularly - each run builds on the last, probing deeper into your code each time.

Otherwise, for each of the functions in MyModule.pm, fuzz-harness-generator -r schemas/function.yml

DESCRIPTION

This tool analyzes a Perl module and generates YAML schema files for each method, suitable for use with App::Test::Generator using the fuzz-harness-generator program which will create the .t file to run through prove.

The extractor uses three sources of information:

1. POD Documentation

Parses parameter descriptions from POD to extract types and constraints.

2. Code Analysis

Analyzes validation patterns in the code (ref checks, length checks, etc.)

3. Method Signatures

Extracts parameter names from method signatures.

The tool assigns a confidence level (high/medium/low) to each schema based on how much information it could infer.

FUZZING

When --fuzz is specified, the tool will additionally run App::Test::Generator::CoverageGuidedFuzzer against each method after schema extraction.

By default all methods with at least one known input parameter are fuzzed, regardless of confidence level. Use --fuzz-all to also attempt fuzzing methods with no input schema (these will use purely random generation).

The fuzzer will:

  • Load and require the target module at runtime

  • Run coverage-guided fuzzing using the extracted schema as input spec

  • Report any crashes or unexpected errors found

  • Persist a corpus to --corpus-dir for incremental improvement across runs

Corpus files are named <corpus-dir>/<method>.json and are automatically loaded on subsequent runs, so each run builds on the last.

SCHEMA FORMAT

The generated YAML files have the following structure:

method: method_name
confidence: high|medium|low
notes:
  - Any warnings or suggestions
input:
  param_name:
    type: string|integer|number|boolean|arrayref|hashref|object
    min: 5
    max: 100
    optional: 0
    matches: /pattern/

CONFIDENCE LEVELS

high

Strong evidence from POD and code analysis. Schema should be accurate.

medium

Partial information available. Review recommended.

low

Limited information. Manual review required.

EXAMPLES

Basic Usage

extract-schemas lib/MyModule.pm

Fuzz methods with known inputs

extract-schemas --fuzz lib/MyModule.pm

Fuzz everything, 300 iterations, custom corpus dir

extract-schemas --fuzz --fuzz-all --fuzz-iters 300 --corpus-dir t/corpus lib/MyModule.pm

Incremental fuzzing (corpus grows across runs)

# First run: builds initial corpus
extract-schemas --fuzz lib/MyModule.pm

# Subsequent runs: loads corpus and extends it
extract-schemas --fuzz lib/MyModule.pm

Verbose Mode

extract-schemas --verbose lib/MyModule.pm

Pod Checking

--strict-pod=LEVEL
  off    - do not validate POD
  warn   - warn on mismatches (default)
  fatal  - abort on mismatches

NEXT STEPS

After extracting schemas:

1. Review the generated YAML files, especially those marked low confidence 2. Edit the schemas to add missing information or correct errors 3. Use the schemas with App::Test::Generator:

test-generator --schema schemas/my_method.yaml

SEE ALSO

App::Test::Generator, App::Test::Generator::CoverageGuidedFuzzer, PPI, Pod::Simple

AUTHOR

Nigel Horne