NAME

App::Test::Generator::SchemaExtractor - Extract test schemas from Perl modules

SYNOPSIS

use App::Test::Generator::SchemaExtractor;

my $extractor = App::Test::Generator::SchemaExtractor->new(
	input_file => 'lib/MyModule.pm',
	output_dir => 'schemas/',
	verbose	=> 1,
);

my $schemas = $extractor->extract_all();

DESCRIPTION

App::Test::Generator::SchemaExtractor automatically analyzes Perl modules and generates structured YAML schema files suitable for automated test generation. This module employs static analysis techniques to infer parameter types, constraints, and method behaviors directly from your source code.

Analysis Methods

The extractor combines multiple analysis approaches for comprehensive schema generation:

  • POD Documentation Analysis

    Parses embedded documentation to extract: - Parameter names, types, and descriptions from =head2 sections - Method signatures with positional parameters - Return value specifications from "Returns:" sections - Constraints (ranges, patterns, required/optional status) - Semantic type detection (email, URL, filename)

  • Code Pattern Detection

    Analyzes source code using PPI to identify: - Method signatures and parameter extraction patterns - Type validation (ref(), isa(), blessed()) - Constraint patterns (length checks, numeric comparisons, regex matches) - Return statement analysis and value type inference - Object instantiation requirements and accessor methods

  • Signature Analysis

    Examines method declarations for: - Parameter names and positional information - Instance vs. class method detection - Method modifiers (Moose-style before/after/around) - Various parameter declaration styles (shift, @_ assignment)

  • Heuristic Inference

    Applies Perl-specific domain knowledge: - Boolean return detection from method names (is_*, has_*, can_*) - Common Perl idioms and coding patterns - Context awareness (scalar vs list, wantarray usage) - Object-oriented patterns (constructors, accessors, chaining)

Generated Schema Structure

The extracted schemas follow this YAML structure:

function: method_name
module: Package::Name
input:
  param1:
    type: string
    min: 3
    max: 50
    optional: 0
    position: 0
  param2:
    type: integer
    min: 0
    max: 100
    optional: 1
    position: 1
output:
  type: boolean
  value: 1
new: Package::Name # if object instantiation required
config:
  test_empty: 1
  test_nuls: 0
  test_undef: 0
  test_non_ascii: 0

Advanced Detection Capabilities

  • Accessor Method Detection

    Automatically identifies getter, setter, and combined accessor methods by analyzing common patterns like return $self->{field} and $self->{field} = $value.

  • Boolean Return Inference

    Detects boolean-returning methods through multiple signals: - Method name patterns (is_*, has_*, can_*) - Return patterns (consistent 1/0 returns) - POD descriptions ("returns true on success") - Ternary operators with boolean results

  • Context Awareness

    Identifies methods that use wantarray and can return different values in scalar vs list context.

  • Object Lifecycle Management

    Detects instance methods requiring object instantiation and automatically adds the new field to schemas.

Confidence Scoring

Each generated schema includes detailed confidence assessments:

  • High Confidence

    Multiple independent analysis sources converge on consistent, well-constrained parameters with explicit validation logic and comprehensive documentation.

  • Medium Confidence

    Reasonable evidence from code patterns or partial documentation, but may lack comprehensive constraints or have some ambiguities.

  • Low Confidence

    Minimal evidence - primarily based on naming conventions, default assumptions, or single-source analysis.

  • Very Low Confidence

    Barely any detectable signals - schema should be thoroughly reviewed before use in test generation.

Use Cases

  • Automated Test Generation

    Generate comprehensive test suites with App::Test::Generator using extracted schemas as input. The schemas provide the necessary structure for generating both positive and negative test cases.

  • API Documentation Generation

    Supplement existing documentation with automatically inferred interface specifications, parameter requirements, and return types.

  • Code Quality Assessment

    Identify methods with poor documentation, inconsistent parameter handling, or unclear interfaces that may benefit from refactoring.

  • Refactoring Assistance

    Detect method dependencies, object instantiation requirements, and parameter usage patterns to inform refactoring decisions.

  • Legacy Code Analysis

    Quickly understand the interface contracts of legacy Perl codebases without extensive manual code reading.

Integration with Testing Ecosystem

The generated schemas are specifically designed to work with the App::Test::Generator ecosystem:

# Extract schemas from your module
my $extractor = App::Test::Generator::SchemaExtractor->new(...);
my $schemas = $extractor->extract_all();

# Use with test generator (typically as separate steps)
# fuzz-harness-generator -r schemas/method_name.yaml

Limitations and Considerations

  • Dynamic Code Patterns

    Highly dynamic code (string evals, AUTOLOAD, symbolic references) may not be fully detected by static analysis.

  • Complex Validation Logic

    Sophisticated validation involving multiple parameters or external dependencies may require manual schema refinement.

  • Confidence Heuristics

    Confidence scores are based on heuristics and should be reviewed by developers familiar with the codebase.

  • Perl Idiom Recognition

    Some Perl-specific idioms may require custom pattern recognition beyond the built-in detectors.

  • Documentation Dependency

    Analysis quality improves significantly with comprehensive POD documentation following consistent patterns.

Best Practices for Optimal Results

  • Comprehensive POD Documentation

    Write detailed POD with explicit parameter documentation using consistent patterns like $param - type (constraints), description.

  • Consistent Coding Patterns

    Use consistent parameter validation patterns and method signatures throughout your codebase.

  • Schema Review Process

    Review and refine automatically generated schemas, particularly those with low confidence scores.

  • Descriptive Naming

    Use descriptive method and parameter names that clearly indicate purpose and expected types.

  • Progressive Enhancement

    Start with automatically generated schemas and progressively refine them based on test results and code understanding.

The module is particularly valuable for large codebases where manual schema creation would be prohibitively time-consuming, and for maintaining test coverage as code evolves through continuous integration pipelines.

METHODS

new

Private methods are not included, unless include_private is used in new().

The extractor supports several configuration parameters:

my $extractor = App::Test::Generator::SchemaExtractor->new(
    input_file          => 'lib/MyModule.pm',  # Required
    output_dir          => 'schemas/',         # Default: 'schemas'
    verbose             => 1,                  # Default: 0
    include_private     => 1,                  # Default: 0
    max_parameters      => 50,                 # Default: 20
    confidence_threshold => 0.7,               # Default: 0.5
);

extract_all

Extract schemas for all methods in the module.

Returns a hashref of method_name => schema.

Pseudo Code

  FOREACH method
  DO
	analyze the method
	write a schema file for that method
  END

_extract_package_name

Extract the package name from the document.

_find_methods

Find all subroutines/methods in the document.

Returns an arrayref of hashrefs with the structure: { name => $name, node => $ppi_node, body => $code_text }

_extract_pod_before

Extract POD documentation that appears before a subroutine.

_analyze_method

Analyze a method and generate its schema.

Combines POD analysis, code pattern analysis, and signature analysis.

_analyze_pod

Parse POD documentation to extract parameter information.

Looks for patterns like: $name - string (3-50 chars), username $age - integer, must be positive $email - string, matches /\@/

_analyze_output

Analyze return values from POD and code.

Looks for: - Returns: section in POD - return statements in code - Common patterns like "returns 1 on success"

_parse_constraints

Parse constraint strings like "3-50 chars" or "positive" or "1-100".

_analyze_code

Analyze code patterns to infer parameter types and constraints.

Looks for common validation patterns: - defined checks - ref() checks - regex matches - length checks - numeric comparisons

_analyze_signature

Analyze method signature to extract parameter names.

_merge_parameter_analyses

Merge parameter information from multiple sources.

Priority: POD > Code > Signature

_calculate_confidence

Calculate confidence score for parameter analysis.

Returns: 'high', 'medium', 'low'

_generate_notes

Generate helpful notes about the analysis.

_write_schema

Write a schema to a YAML file.

_needs_object_instantiation

Determine if a method needs object instantiation and return the class name.

Returns the package name if this is an instance method, undef if it's a class method or constructor.

_log

Log a message if verbose mode is on.

NOTES

This is pre-pre-alpha proof of concept code. Nevertheless, it is useful for creating a template which you can modify to create a working schema to pass into App::Test::Generator.

SEE ALSO

  • App::Test::Generator - Generate fuzz and corpus-driven test harnesses

    Output from this module serves as input into that module. So with well documented code, you can automatically create your tests.

  • App::Test::Generator::Template - Template of the file of tests created by App::Test::Generator

AUTHOR

Nigel Horne, <njh at nigelhorne.com>

Portions of this module's initial design and documentation were created with the assistance of AI.