NAME

Sidef::Parser - Parser for the Sidef programming language

SYNOPSIS

use Sidef::Parser;

my $parser = Sidef::Parser->new(
    file_name   => 'script.sf',
    script_name => 'script.sf',
);

my $code = 'say "Hello, World!"';
my $ast = $parser->parse_script(code => \$code);

DESCRIPTION

Sidef::Parser is the main parser for the Sidef programming language. It performs lexical analysis and syntactic parsing of Sidef source code, generating an Abstract Syntax Tree (AST) that can be executed or compiled.

The parser handles:

  • Variable declarations and scoping

  • Function and method definitions

  • Class and module declarations

  • Operators and expressions

  • Control flow structures

  • String interpolation and special literals

  • Regex patterns

  • Block constructs

METHODS

Constructor

new

my $parser = Sidef::Parser->new(%options);

Creates a new parser instance. Accepts the following optional parameters:

  • line - Starting line number (default: 1)

  • inc - Array reference of include paths

  • class - Current namespace (default: 'main')

  • file_name - Name of file being parsed (default: '-')

  • script_name - Name of main script (default: '-')

  • interactive - Boolean flag for interactive mode

  • eval_mode - Boolean flag for eval mode

Core Parsing Methods

parse_script

my $ast = $parser->parse_script(code => \$code);

Parses a complete Sidef script and returns the Abstract Syntax Tree. This is the main entry point for parsing.

Parameters:

  • code - Reference to string containing Sidef code

Returns: AST structure (typically a hash reference)

parse_expr

my $expr = $parser->parse_expr(code => \$code);

Parses a single expression. Handles literals, variables, operators, function calls, and other expression forms.

parse_obj

my $obj = $parser->parse_obj(code => \$code, %options);

Parses an object or value with optional method calls and operators.

Options:

  • multiline - Allow multiline expressions

parse_block

my $block = $parser->parse_block(code => \$code, %options);

Parses a code block enclosed in braces {...}.

Options:

  • with_vars - Include variable declarations

  • topic_var - Create topic variable (_)

  • is_module - Block is a module definition

  • prev_class - Previous class context

parse_arg

my $arg = $parser->parse_arg(code => \$code);

Parses arguments enclosed in parentheses (...).

parse_array

my $array = $parser->parse_array(code => \$code);

Parses array literals enclosed in brackets [...].

Variable and Declaration Parsing

parse_init_vars

my $vars = $parser->parse_init_vars(code => \$code, %options);

Parses variable declarations with optional initialization and type annotations.

Options:

  • type - Declaration type ('var', 'global', 'const', 'static', 'del', 'has')

  • private - Private declaration (not added to symbol table)

  • params - Parsing function/method parameters

  • callback - Callback function for each variable

  • ignore_delim - Hash of delimiters to ignore

get_init_vars

my $vars = $parser->get_init_vars(code => \$code, %options);

Similar to parse_init_vars but returns string representations instead of objects.

Options:

  • with_vals - Include values in output

  • type - Declaration type

find_var

my $var = $parser->find_var($var_name, $class_name);
my ($var, $is_lexical) = $parser->find_var($var_name, $class_name);

Looks up a variable in the symbol table by name and class.

In scalar context, returns the variable hash or undef. In list context, returns the variable hash and a boolean indicating if it's lexical.

String and Literal Parsing

get_quoted_string

my $string = $parser->get_quoted_string(code => \$code, %options);

Extracts a quoted string with support for various delimiter pairs.

Options:

  • no_count_line - Don't count newlines in the string

Supports delimiters: '...', "...", (...), [...], {...}, and many Unicode paired delimiters.

get_quoted_words

my $words = $parser->get_quoted_words(code => \$code);

Parses space-separated quoted words, returning an array reference.

get_method_name

my ($method, $takes_arg, $type) = $parser->get_method_name(code => \$code);

Extracts a method or operator name from the input.

Returns:

1. Method/operator name (or hashref for expression-based names)
2. Boolean indicating if operator requires an argument
3. Operator type (from hyper_ops hash, or 'op', or empty string)

Whitespace and Comment Handling

parse_whitespace

my $found = $parser->parse_whitespace(code => \$code);

Skips whitespace, comments, and handles here-documents. Returns true if whitespace was found.

Handles:

  • Horizontal and vertical whitespace

  • Single-line comments (#...)

  • Multi-line C-style comments (/* ... */)

  • Embedded comments (#`(...))

  • Here-documents (<EOF, <'EOF', <<-EOF)

  • Zero-width spaces

backtrack_whitespace

$parser->backtrack_whitespace(code => \$code);

Moves the position backwards past any trailing whitespace that was just parsed.

Helper Methods

parse_delim

my $end_delim = $parser->parse_delim(code => \$code, %options);

Parses a delimiter and returns its corresponding closing delimiter.

Options:

  • ignore_delim - Hash of delimiters to ignore

get_name_and_class

my ($name, $class) = $parser->get_name_and_class($var_name);

Splits a potentially qualified variable name into name and class components.

Examples:

'foo'       => ('foo', 'main')
'Foo::bar'  => ('bar', 'Foo')

check_declarations

$parser->check_declarations($vars_hash);

Checks variable declarations for unused variables and generates warnings (except in interactive/eval mode).

Error Handling

fatal_error

$parser->fatal_error(
    error  => "Error message",
    reason => "Additional context",
    code   => $code,
    pos    => $position,
    line   => $line_number,
    var    => $var_name,
);

Throws a fatal parsing error with detailed context information including:

  • File name and line number

  • Error position with visual indicator

  • Error message and reason

  • Suggestions for similar variable names (if var provided)

PARSER CONFIGURATION

The parser maintains several configuration hashes:

postfix_ops

Hash of postfix operators that can appear after an expression:

'--', '++', '...', '!', '!!'

hyper_ops

Hash of hyper/meta operators that transform other operators:

map     => [1, 'map_operator']
pam     => [1, 'pam_operator']
zip     => [1, 'zip_operator']
wise    => [1, 'wise_operator']
scalar  => [1, 'scalar_operator']
rscalar => [1, 'rscalar_operator']
cross   => [1, 'cross_operator']
unroll  => [1, 'unroll_operator']
reduce  => [0, 'reduce_operator']
lmap    => [0, 'map_operator']

Format: [takes_args, method_name]

built_in_classes

Hash of built-in class names like:

File, Array, String, Number, Hash, Regex, etc.

keywords

Hash of reserved keywords:

if, elsif, else, while, for, foreach, func, class, module,
return, break, next, var, const, static, import, include, etc.

Delimiters

The parser supports extensive delimiter pairs for strings and grouping:

( )   [ ]   { }   < >
« »   ‹ ›   " "   ' '
And many more Unicode paired delimiters

SPECIAL FEATURES

Here-Documents

Support for here-documents with optional indentation:

<<EOF       # Basic here-doc
<<'EOF'     # Non-interpolating
<<"EOF"     # Interpolating (default)
<<-EOF      # With indentation stripping

Quote Operators

Variety of quote operators for different types:

%q/.../     # String (non-interpolating)
%Q/.../     # String (interpolating)
%w/.../     # Word array
%i/.../     # Integer array
%r/.../     # Regex
%f/.../     # File object
%x/.../     # Backtick command

Magic Variables

Support for Perl-compatible magic variables:

$.   $?   $$   $!   $@   $/   etc.

Number Formats

Support for various number literal formats:

123         # Decimal
0b1010      # Binary
0o755       # Octal
0xFF        # Hexadecimal
3.14        # Float
1.5e10      # Scientific notation
42i         # Imaginary
1.23f       # Explicit float
¹²³         # Superscript (for exponents)

REGULAR EXPRESSIONS

The parser uses several compiled regular expressions for efficiency:

  • static_obj_re - Matches static objects like true, false, nil, built-in types

  • prefix_obj_re - Matches prefix keywords like if, while, return

  • quote_operators_re - Matches quote-like operators

  • operators_re - Matches all operators including symbolic and Unicode

  • var_name_re - Matches valid variable names

  • method_name_re - Matches valid method names

  • match_flags_re - Matches regex modifier flags

SYMBOL TABLE

The parser maintains a hierarchical symbol table with:

  • vars - Hash of arrays containing variable information per namespace

  • ref_vars_refs - Referenced variables from outer scopes

  • class - Current namespace/class context

Each variable entry contains:

{
    obj   => $variable_object,
    name  => $variable_name,
    count => $usage_count,
    type  => $declaration_type,
    line  => $declaration_line,
}

AUTHOR

Daniel "Trizen" Șuteu

LICENSE

This module is free software; you can redistribute it and/or modify it under the same terms as Sidef itself.

SEE ALSO