NAME
App::Test::Generator::SchemaExtractor - Extract test schemas from Perl modules
VERSION
Version 0.20
SYNOPSIS
use App::Test::Generator::SchemaExtractor;
my $extractor = App::Test::Generator::SchemaExtractor->new(
input_file => 'lib/MyModule.pm',
output_dir => 'schemas/',
verbose => 1,
);
my $schemas = $extractor->extract_all();
DESCRIPTION
App::Test::Generator::SchemaExtractor automatically analyzes Perl modules and generates structured YAML schema files suitable for automated test generation. This module employs static analysis techniques to infer parameter types, constraints, and method behaviors directly from your source code.
Analysis Methods
The extractor combines multiple analysis approaches for comprehensive schema generation:
POD Documentation Analysis
Parses embedded documentation to extract: - Parameter names, types, and descriptions from =head2 sections - Method signatures with positional parameters - Return value specifications from "Returns:" sections - Constraints (ranges, patterns, required/optional status) - Semantic type detection (email, URL, filename)
Code Pattern Detection
Analyzes source code using PPI to identify: - Method signatures and parameter extraction patterns - Type validation (ref(), isa(), blessed()) - Constraint patterns (length checks, numeric comparisons, regex matches) - Return statement analysis and value type inference - Object instantiation requirements and accessor methods
Signature Analysis
Examines method declarations for: - Parameter names and positional information - Instance vs. class method detection - Method modifiers (Moose-style before/after/around) - Various parameter declaration styles (shift, @_ assignment)
Heuristic Inference
Applies Perl-specific domain knowledge: - Boolean return detection from method names (is_*, has_*, can_*) - Common Perl idioms and coding patterns - Context awareness (scalar vs list, wantarray usage) - Object-oriented patterns (constructors, accessors, chaining)
Generated Schema Structure
The extracted schemas follow this YAML structure:
function: method_name
module: Package::Name
input:
param1:
type: string
min: 3
max: 50
optional: 0
position: 0
param2:
type: integer
min: 0
max: 100
optional: 1
position: 1
output:
type: boolean
value: 1
new: Package::Name # if object instantiation required
config:
test_empty: 1
test_nuls: 0
test_undef: 0
test_non_ascii: 0
Advanced Detection Capabilities
Accessor Method Detection
Automatically identifies getter, setter, and combined accessor methods by analyzing common patterns like
return $self->{field}and$self->{field} = $value.Boolean Return Inference
Detects boolean-returning methods through multiple signals: - Method name patterns (is_*, has_*, can_*) - Return patterns (consistent 1/0 returns) - POD descriptions ("returns true on success") - Ternary operators with boolean results
Context Awareness
Identifies methods that use
wantarrayand can return different values in scalar vs list context.Object Lifecycle Management
Detects instance methods requiring object instantiation and automatically adds the
newfield to schemas.
Confidence Scoring
Each generated schema includes detailed confidence assessments:
High Confidence
Multiple independent analysis sources converge on consistent, well-constrained parameters with explicit validation logic and comprehensive documentation.
Medium Confidence
Reasonable evidence from code patterns or partial documentation, but may lack comprehensive constraints or have some ambiguities.
Low Confidence
Minimal evidence - primarily based on naming conventions, default assumptions, or single-source analysis.
Very Low Confidence
Barely any detectable signals - schema should be thoroughly reviewed before use in test generation.
Use Cases
Automated Test Generation
Generate comprehensive test suites with App::Test::Generator using extracted schemas as input. The schemas provide the necessary structure for generating both positive and negative test cases.
API Documentation Generation
Supplement existing documentation with automatically inferred interface specifications, parameter requirements, and return types.
Code Quality Assessment
Identify methods with poor documentation, inconsistent parameter handling, or unclear interfaces that may benefit from refactoring.
Refactoring Assistance
Detect method dependencies, object instantiation requirements, and parameter usage patterns to inform refactoring decisions.
Legacy Code Analysis
Quickly understand the interface contracts of legacy Perl codebases without extensive manual code reading.
Integration with Testing Ecosystem
The generated schemas are specifically designed to work with the App::Test::Generator ecosystem:
# Extract schemas from your module
my $extractor = App::Test::Generator::SchemaExtractor->new(...);
my $schemas = $extractor->extract_all();
# Use with test generator (typically as separate steps)
# fuzz-harness-generator -r schemas/method_name.yaml
Limitations and Considerations
Dynamic Code Patterns
Highly dynamic code (string evals, AUTOLOAD, symbolic references) may not be fully detected by static analysis.
Complex Validation Logic
Sophisticated validation involving multiple parameters or external dependencies may require manual schema refinement.
Confidence Heuristics
Confidence scores are based on heuristics and should be reviewed by developers familiar with the codebase.
Perl Idiom Recognition
Some Perl-specific idioms may require custom pattern recognition beyond the built-in detectors.
Documentation Dependency
Analysis quality improves significantly with comprehensive POD documentation following consistent patterns.
Best Practices for Optimal Results
Comprehensive POD Documentation
Write detailed POD with explicit parameter documentation using consistent patterns like
$param - type (constraints), description.Consistent Coding Patterns
Use consistent parameter validation patterns and method signatures throughout your codebase.
Schema Review Process
Review and refine automatically generated schemas, particularly those with low confidence scores.
Descriptive Naming
Use descriptive method and parameter names that clearly indicate purpose and expected types.
Progressive Enhancement
Start with automatically generated schemas and progressively refine them based on test results and code understanding.
The module is particularly valuable for large codebases where manual schema creation would be prohibitively time-consuming, and for maintaining test coverage as code evolves through continuous integration pipelines.
METHODS
new
Private methods are not included, unless include_private is used in new().
The extractor supports several configuration parameters:
my $extractor = App::Test::Generator::SchemaExtractor->new(
input_file => 'lib/MyModule.pm', # Required
output_dir => 'schemas/', # Default: 'schemas'
verbose => 1, # Default: 0
include_private => 1, # Default: 0
max_parameters => 50, # Default: 20
confidence_threshold => 0.7, # Default: 0.5
);
extract_all
Extract schemas for all methods in the module.
Returns a hashref of method_name => schema.
Pseudo Code
FOREACH method
DO
analyze the method
write a schema file for that method
END
_extract_package_name
Extract the package name from the document.
_find_methods
Find all subroutines/methods in the document.
Returns an arrayref of hashrefs with the structure: { name => $name, node => $ppi_node, body => $code_text }
_extract_pod_before
Extract POD documentation that appears before a subroutine.
_analyze_method
Analyze a method and generate its schema.
Combines POD analysis, code pattern analysis, and signature analysis.
_analyze_pod
Parse POD documentation to extract parameter information.
Looks for patterns like: $name - string (3-50 chars), username $age - integer, must be positive $email - string, matches /\@/
_analyze_output
Analyze return values from POD and code.
Looks for: - Returns: section in POD - return statements in code - Common patterns like "returns 1 on success"
_parse_constraints
Parse constraint strings like "3-50 chars" or "positive" or "1-100".
_analyze_code
Analyze code patterns to infer parameter types and constraints.
Looks for common validation patterns: - defined checks - ref() checks - regex matches - length checks - numeric comparisons
_analyze_signature
Analyze method signature to extract parameter names.
_merge_parameter_analyses
Merge parameter information from multiple sources.
Priority: POD > Code > Signature
_calculate_confidence
Calculate confidence score for parameter analysis.
Returns: 'high', 'medium', 'low'
_generate_notes
Generate helpful notes about the analysis.
_write_schema
Write a schema to a YAML file.
_needs_object_instantiation
Determine if a method needs object instantiation and return the class name.
Returns the package name if this is an instance method, undef if it's a class method or constructor.
_log
Log a message if verbose mode is on.
NOTES
This is pre-pre-alpha proof of concept code. Nevertheless, it is useful for creating a template which you can modify to create a working schema to pass into App::Test::Generator.
SEE ALSO
App::Test::Generator - Generate fuzz and corpus-driven test harnesses
Output from this module serves as input to that module. So with well-documented code, you can automatically create your tests.
App::Test::Generator::Template - Template of the file of tests created by
App::Test::Generator
AUTHOR
Nigel Horne, <njh at nigelhorne.com>
Portions of this module's initial design and documentation were created with the assistance of AI.