NAME

App::Test::Generator::SchemaExtractor - Extract test schemas from Perl modules

SYNOPSIS

use App::Test::Generator::SchemaExtractor;

my $extractor = App::Test::Generator::SchemaExtractor->new(
	input_file => 'lib/MyModule.pm',
	output_dir => 'schemas/',
	verbose	=> 1,
);

my $schemas = $extractor->extract_all();

DESCRIPTION

App::Test::Generator::SchemaExtractor automatically analyzes Perl modules and generates structured YAML schema files suitable for automated test generation. This module employs static analysis techniques to infer parameter types, constraints, and method behaviors directly from your source code.

Analysis Methods

The extractor combines multiple analysis approaches for comprehensive schema generation:

POD Documentation Analysis

Parses embedded documentation to extract: - Parameter names, types, and descriptions from =head2 sections - Method signatures with positional parameters - Return value specifications from "Returns:" sections - Constraints (ranges, patterns, required/optional status) - Semantic type detection (email, URL, filename)
Code Pattern Detection

Analyzes source code using PPI to identify: - Method signatures and parameter extraction patterns - Type validation (ref(), isa(), blessed()) - Constraint patterns (length checks, numeric comparisons, regex matches) - Return statement analysis and value type inference - Object instantiation requirements and accessor methods
Signature Analysis

Examines method declarations for: - Parameter names and positional information - Instance vs. class method detection - Method modifiers (Moose-style before/after/around) - Various parameter declaration styles (shift, @_ assignment)
Heuristic Inference

Applies Perl-specific domain knowledge: - Boolean return detection from method names (is_*, has_*, can_*) - Common Perl idioms and coding patterns - Context awareness (scalar vs list, wantarray usage) - Object-oriented patterns (constructors, accessors, chaining)

Generated Schema Structure

The extracted schemas follow this YAML structure:

function: method_name
module: Package::Name
input:
  param1:
    type: string
    min: 3
    max: 50
    optional: 0
    position: 0
  param2:
    type: integer
    min: 0
    max: 100
    optional: 1
    position: 1
output:
  type: boolean
  value: 1
new: Package::Name # if object instantiation required
config:
  test_empty: 1
  test_nuls: 0
  test_undef: 0
  test_non_ascii: 0

Advanced Detection Capabilities

Accessor Method Detection

Automatically identifies getter, setter, and combined accessor methods by analyzing common patterns like return $self->{field} and $self->{field} = $value.
Boolean Return Inference

Detects boolean-returning methods through multiple signals: - Method name patterns (is_*, has_*, can_*) - Return patterns (consistent 1/0 returns) - POD descriptions ("returns true on success") - Ternary operators with boolean results
Context Awareness

Identifies methods that use wantarray and can return different values in scalar vs list context.
Object Lifecycle Management

Detects instance methods requiring object instantiation and automatically adds the new field to schemas.

Confidence Scoring

Each generated schema includes detailed confidence assessments:

High Confidence

Multiple independent analysis sources converge on consistent, well-constrained parameters with explicit validation logic and comprehensive documentation.
Medium Confidence

Reasonable evidence from code patterns or partial documentation, but may lack comprehensive constraints or have some ambiguities.
Low Confidence

Minimal evidence - primarily based on naming conventions, default assumptions, or single-source analysis.
Very Low Confidence

Barely any detectable signals - schema should be thoroughly reviewed before use in test generation.

Use Cases

Automated Test Generation

Generate comprehensive test suites with App::Test::Generator using extracted schemas as input. The schemas provide the necessary structure for generating both positive and negative test cases.
API Documentation Generation

Supplement existing documentation with automatically inferred interface specifications, parameter requirements, and return types.
Code Quality Assessment

Identify methods with poor documentation, inconsistent parameter handling, or unclear interfaces that may benefit from refactoring.
Refactoring Assistance

Detect method dependencies, object instantiation requirements, and parameter usage patterns to inform refactoring decisions.
Legacy Code Analysis

Quickly understand the interface contracts of legacy Perl codebases without extensive manual code reading.

Integration with Testing Ecosystem

The generated schemas are specifically designed to work with the App::Test::Generator ecosystem:

# Extract schemas from your module
my $extractor = App::Test::Generator::SchemaExtractor->new(...);
my $schemas = $extractor->extract_all();

# Use with test generator (typically as separate steps)
# fuzz-harness-generator -r schemas/method_name.yaml

Limitations and Considerations

Dynamic Code Patterns

Highly dynamic code (string evals, AUTOLOAD, symbolic references) may not be fully detected by static analysis.
Complex Validation Logic

Sophisticated validation involving multiple parameters or external dependencies may require manual schema refinement.
Confidence Heuristics

Confidence scores are based on heuristics and should be reviewed by developers familiar with the codebase.
Perl Idiom Recognition

Some Perl-specific idioms may require custom pattern recognition beyond the built-in detectors.
Documentation Dependency

Analysis quality improves significantly with comprehensive POD documentation following consistent patterns.

Best Practices for Optimal Results

Comprehensive POD Documentation

Write detailed POD with explicit parameter documentation using consistent patterns like $param - type (constraints), description.
Consistent Coding Patterns

Use consistent parameter validation patterns and method signatures throughout your codebase.
Schema Review Process

Review and refine automatically generated schemas, particularly those with low confidence scores.
Descriptive Naming

Use descriptive method and parameter names that clearly indicate purpose and expected types.
Progressive Enhancement

Start with automatically generated schemas and progressively refine them based on test results and code understanding.

The module is particularly valuable for large codebases where manual schema creation would be prohibitively time-consuming, and for maintaining test coverage as code evolves through continuous integration pipelines.

METHODS

new

Private methods are not included, unless include_private is used in new().

The extractor supports several configuration parameters:

my $extractor = App::Test::Generator::SchemaExtractor->new(
    input_file          => 'lib/MyModule.pm',  # Required
    output_dir          => 'schemas/',         # Default: 'schemas'
    verbose             => 1,                  # Default: 0
    include_private     => 1,                  # Default: 0
    max_parameters      => 50,                 # Default: 20
    confidence_threshold => 0.7,               # Default: 0.5
);

extract_all

Extract schemas for all methods in the module.

Returns a hashref of method_name => schema.

Pseudo Code

  FOREACH method
  DO
	analyze the method
	write a schema file for that method
  END

_extract_package_name

Extract the package name from the document.

_find_methods

Find all subroutines/methods in the document.

Returns an arrayref of hashrefs with the structure: { name => $name, node => $ppi_node, body => $code_text }

_extract_pod_before

Extract POD documentation that appears before a subroutine.

_analyze_method

Analyze a method and generate its schema.

Combines POD analysis, code pattern analysis, and signature analysis.

_analyze_pod

Parse POD documentation to extract parameter information.

Looks for patterns like: $name - string (3-50 chars), username $age - integer, must be positive $email - string, matches /\@/

_analyze_output

Analyze return values from POD and code.

Looks for: - Returns: section in POD - return statements in code - Common patterns like "returns 1 on success"

_parse_constraints

Parse constraint strings like "3-50 chars" or "positive" or "1-100".

_analyze_code

Analyze code patterns to infer parameter types and constraints.

Looks for common validation patterns: - defined checks - ref() checks - regex matches - length checks - numeric comparisons

_analyze_signature

Analyze method signature to extract parameter names.

_merge_parameter_analyses

Merge parameter information from multiple sources.

Priority: POD > Code > Signature

_calculate_confidence

Calculate confidence score for parameter analysis.

Returns: 'high', 'medium', 'low'

_generate_notes

Generate helpful notes about the analysis.

_write_schema

Write a schema to a YAML file.

_needs_object_instantiation

Determine if a method needs object instantiation and return the class name.

Returns the package name if this is an instance method, undef if it's a class method or constructor.

_log

Log a message if verbose mode is on.

NOTES

This is pre-pre-alpha proof of concept code. Nevertheless, it is useful for creating a template which you can modify to create a working schema to pass into App::Test::Generator.

AUTHOR

Nigel Horne, <njh at nigelhorne.com>

Portions of this module's initial design and documentation were created with the assistance of AI.

To install App::Test::Generator, copy and paste the appropriate command in to your terminal.

cpanm

cpanm App::Test::Generator

CPAN shell

perl -MCPAN -e shell
install App::Test::Generator

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)