NAME

LTSV::LINQ - LINQ-style query interface for LTSV files

VERSION

Version 1.05

SYNOPSIS

use LTSV::LINQ;

# Read LTSV file and query
my @results = LTSV::LINQ->FromLTSV("access.log")
    ->Where(sub { $_[0]{status} eq '200' })
    ->Select(sub { $_[0]{url} })
    ->Distinct()
    ->ToArray();

# DSL syntax for simple filtering
my @errors = LTSV::LINQ->FromLTSV("access.log")
    ->Where(status => '404')
    ->ToArray();

# Grouping and aggregation
my @stats = LTSV::LINQ->FromLTSV("access.log")
    ->GroupBy(sub { $_[0]{status} })
    ->Select(sub {
        my $g = shift;
        return {
            Status => $g->{Key},
            Count => scalar(@{$g->{Elements}})
        };
    })
    ->OrderByDescending(sub { $_[0]{Count} })
    ->ToArray();

TABLE OF CONTENTS

DESCRIPTION

LTSV::LINQ provides a LINQ-style query interface for LTSV (Labeled Tab-Separated Values) files. It offers a fluent, chainable API for filtering, transforming, and aggregating LTSV data.

Key features:

  • Lazy evaluation - O(1) memory usage for most operations

  • Method chaining - Fluent, readable query composition

  • DSL syntax - Simple key-value filtering

  • 60 LINQ methods - Comprehensive query capabilities

  • Pure Perl - No XS dependencies

  • Perl 5.005_03+ - Works on ancient and modern Perl

What is LTSV?

LTSV (Labeled Tab-Separated Values) is a text format for structured logs and data records. Each line consists of tab-separated fields, where each field is a label:value pair. A single LTSV record occupies exactly one line.

Format example:

time:2026-02-13T10:00:00	host:192.0.2.1	status:200	url:/index.html	bytes:1024

LTSV Characteristics

  • One record per line

    A complete record is always a single newline-terminated line. This makes streaming processing trivial: read a line, parse it, process it, discard it. There is no multi-line quoting problem, no block parser required.

  • Tab as field delimiter

    Fields are separated by a single horizontal tab character (0x09). The tab is a C0 control character in the ASCII range (0x00-0x7F), which has an important consequence for multibyte character encodings.

  • Colon as label-value separator

    Within each field, the label and value are separated by a single colon (0x3A, US-ASCII :). This is also a plain ASCII character with the same multibyte-safety guarantees as the tab.

LTSV Advantages

  • Multibyte-safe delimiters (Tab and Colon)

    This is perhaps the most important technical advantage of LTSV over formats such as CSV (comma-delimited) or TSV without labels.

    In many multibyte character encodings used across Asia and beyond, a single logical character is represented by a sequence of two or more bytes. The danger in older encodings is that a byte within a multibyte sequence can coincidentally equal the byte value of an ASCII delimiter, causing a naive byte-level parser to split the field in the wrong place.

    The following table shows well-known encodings and their byte ranges:

    Encoding     First byte range       Following byte range
    ----------   --------------------   -------------------------------
    Big5         0x81-0xFE              0x40-0x7E, 0xA1-0xFE
    Big5-HKSCS   0x81-0xFE              0x40-0x7E, 0xA1-0xFE
    CP932X       0x81-0x9F, 0xE0-0xFC   0x40-0x7E, 0x80-0xFC
    EUC-JP       0x8E-0x8F, 0xA1-0xFE   0xA1-0xFE
    GB 18030     0x81-0xFE              0x30-0x39, 0x40-0x7E, 0x80-0xFE
    GBK          0x81-0xFE              0x40-0x7E, 0x80-0xFE
    Shift_JIS    0x81-0x9F, 0xE0-0xFC   0x40-0x7E, 0x80-0xFC
    RFC 2279     0xC2-0xF4              0x80-0xBF
    UHC          0x81-0xFE              0x41-0x5A, 0x61-0x7A, 0x81-0xFE
    UTF-8        0xC2-0xF4              0x80-0xBF
    WTF-8        0xC2-0xF4              0x80-0xBF

    The tab character is 0x09. The colon is 0x3A. Both values are strictly below 0x40, the lower bound of any following byte in the encodings listed above. Neither 0x09 nor 0x3A appears anywhere as a first byte either. Therefore:

    TAB  (0x09) never appears as a byte within any multibyte character
                in Big5, Big5-HKSCS, CP932X, EUC-JP, GB 18030, GBK, Shift_JIS,
                RFC 2279, UHC, UTF-8, or WTF-8.
    ':'  (0x3A) never appears as a byte within any multibyte character
                in the same set of encodings.

    This means that LTSV files containing values in any of those encodings can be parsed correctly by a simple byte-level split on tab and colon, with no knowledge of the encoding whatsoever. There is no need to decode the text before parsing, and no risk of a misidentified delimiter.

    By contrast, CSV has encoding problems of a different kind. The comma (0x2C) and the double-quote (0x22) do not appear as following bytes in Shift_JIS or Big5, so they are not directly confused with multibyte character content. However, the backslash (0x5C) does appear as a valid following byte in both Shift_JIS (following byte range 0x40-0x7E includes 0x5C) and Big5 (same range). Many CSV parsers and the C runtime on Windows use backslash or backslash-like sequences for escaping, so a naive byte-level search for the escape character can be misled by a multibyte character whose second byte is 0x5C. Beyond this, CSV's quoting rules are underspecified (RFC 4180 vs. Excel vs. custom dialects differ), which makes writing a correct, encoding-aware CSV parser considerably harder than parsing LTSV. LTSV sidesteps all of these issues by choosing delimiters (tab and colon) that fall below 0x40, outside every following-byte range of every traditional multibyte encoding.

    UTF-8 is safe for all ASCII delimiters because continuation bytes are always in the range 0x80-0xBF, never overlapping ASCII. But LTSV's choice of tab and colon also makes it safe for the traditional multibyte encodings that predate Unicode, which is critical for systems that still operate on traditional-encoded data.

  • Self-describing fields

    Every field carries its own label. A record is human-readable without a separate schema or header line. Fields can appear in any order, and optional fields can simply be omitted. Adding a new field to some records does not break parsers that do not know about it.

  • Streaming-friendly

    Because each record is one line, LTSV files can be processed with line-by-line streaming. Memory usage is proportional to the longest single record, not the total file size. This is why FromLTSV in this module uses a lazy iterator rather than loading the whole file.

  • Grep- and awk-friendly

    Standard Unix text tools (grep, awk, sed, sort, cut) work naturally on LTSV files. A field can be located with a pattern like status:5[0-9][0-9] without any special parser. This makes ad-hoc analysis and shell scripting straightforward.

  • No quoting rules

    CSV requires quoting fields that contain commas or newlines, and the quoting rules differ between implementations (RFC 4180 vs. Microsoft Excel vs. others). LTSV has no quoting: the tab delimiter and the colon separator do not appear inside values in any of the supported encodings (by the multibyte-safety argument above), so no escaping mechanism is needed.

  • Wide adoption in server logging

    LTSV originated in the Japanese web industry as a structured log format for HTTP access logs. Many web servers (Apache, Nginx) and log aggregation tools support LTSV output or parsing. The format is particularly popular for application and infrastructure logging where grep-ability and streaming analysis matter.

For the formal LTSV specification, see http://ltsv.org/.

What is LINQ?

LINQ (Language Integrated Query) is a set of query capabilities introduced in the .NET Framework 3.5 (C# 3.0, 2007) by Microsoft. It defines a unified model for querying and transforming data from diverse sources -- in-memory collections, relational databases (LINQ to SQL), XML documents (LINQ to XML), and more -- using a single, consistent API.

This module brings LINQ-style querying to Perl, applied specifically to LTSV data sources.

LINQ Characteristics

  • Unified query model

    LINQ provides a single set of operators that works uniformly across data sources. Whether the source is an array, a file, or a database, the same Where, Select, OrderBy, GroupBy methods apply. LTSV::LINQ follows this principle: the same methods work on in-memory arrays (From) and LTSV files (FromLTSV) alike.

  • Declarative style

    LINQ queries express what to retrieve, not how to retrieve it. A query like ->Where(sub { $_[0]{status} = 400 })->Select(...)> describes the intent clearly, without explicit loop management. This reduces cognitive overhead and makes queries easier to read and verify.

  • Composability

    Each LINQ operator takes a sequence and returns a new sequence (or a scalar result for terminal operators). Because operators are ordinary method calls that return objects, they compose naturally:

    $query->Where(...)->Select(...)->OrderBy(...)->GroupBy(...)->ToArray()

    Any intermediate result is itself a valid query object, ready for further transformation or immediate consumption.

  • Lazy evaluation (deferred execution)

    Intermediate operators (Where, Select, Take, etc.) do not execute immediately. They construct a chain of iterator closures. Evaluation is deferred until a terminal operator (ToArray, Count, First, Sum, ForEach, etc.) pulls items through the chain. This means:

    - Memory usage is bounded by the window of data in flight, not by the total data size. A Where->Select->Take(10) over a million-line file reads at most 10 records past the first matching one.
    - Short-circuiting is free. First stops at the first match. Any stops as soon as one match is found.
    - Pipelines can be built without executing them, and executed multiple times by wrapping in a factory (see _from_snapshot).
  • Method chaining (fluent interface)

    LINQ's design makes chaining natural. In C# this is supported by extension methods; in Perl it is supported by returning $self-class objects from every intermediate operator. The result is readable, left-to-right query expressions.

  • Separation of query definition from execution

    A LINQ query object is a description of a computation, not its result. You can pass query objects around, inspect them, extend them, and decide later when to execute them. This separation is valuable in library and framework code.

LINQ Advantages for LTSV Processing

  • Readable log analysis

    LTSV log analysis often involves the same logical steps: filter records by a condition, extract a field, aggregate. LINQ methods map directly onto these steps, making the code read like a description of the analysis.

  • Memory-efficient processing of large log files

    Web server access logs can be gigabytes in size. LTSV::LINQ's lazy FromLTSV iterator reads one line at a time. Combined with Where and Take, only the needed records are ever in memory simultaneously.

  • No new language syntax required

    Unlike C# LINQ (which has query comprehension syntax from x in xs where ... select ...), LTSV::LINQ works with ordinary Perl method calls and anonymous subroutines. There is no source filter, no parser extension, and no dependency on modern Perl features. The same code runs on Perl 5.005_03 and Perl 5.40.

  • Composable, reusable query fragments

    A Where clause stored in a variable can be applied to multiple data sources. Query logic can be parameterized and reused across scripts.

For the original LINQ documentation, see https://learn.microsoft.com/en-us/dotnet/csharp/linq/.

METHODS

Complete Method Reference

This module implements 60 LINQ-style methods organized into 15 categories:

  • Data Sources (5): From, FromLTSV, Range, Empty, Repeat

  • Filtering (1): Where (with DSL)

  • Projection (2): Select, SelectMany

  • Concatenation (2): Concat, Zip

  • Partitioning (4): Take, Skip, TakeWhile, SkipWhile

  • Ordering (13): OrderBy, OrderByDescending, OrderByStr, OrderByStrDescending, OrderByNum, OrderByNumDescending, Reverse, ThenBy, ThenByDescending, ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending

  • Grouping (1): GroupBy

  • Set Operations (4): Distinct, Union, Intersect, Except

  • Join (2): Join, GroupJoin

  • Quantifiers (3): All, Any, Contains

  • Comparison (1): SequenceEqual

  • Element Access (8): First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault

  • Aggregation (7): Count, Sum, Min, Max, Average, AverageOrDefault, Aggregate

  • Conversion (6): ToArray, ToList, ToDictionary, ToLookup, ToLTSV, DefaultIfEmpty

  • Utility (1): ForEach

Method Summary Table:

Method                 Category        Lazy?  Returns
=====================  ==============  =====  ================
From                   Data Source     Yes    Query
FromLTSV               Data Source     Yes    Query
Range                  Data Source     Yes    Query
Empty                  Data Source     Yes    Query
Repeat                 Data Source     Yes    Query
Where                  Filtering       Yes    Query
Select                 Projection      Yes    Query
SelectMany             Projection      Yes    Query
Concat                 Concatenation   Yes    Query
Zip                    Concatenation   Yes    Query
Take                   Partitioning    Yes    Query
Skip                   Partitioning    Yes    Query
TakeWhile              Partitioning    Yes    Query
SkipWhile              Partitioning    Yes    Query
OrderBy                Ordering        No*    OrderedQuery
OrderByDescending      Ordering        No*    OrderedQuery
OrderByStr             Ordering        No*    OrderedQuery
OrderByStrDescending   Ordering        No*    OrderedQuery
OrderByNum             Ordering        No*    OrderedQuery
OrderByNumDescending   Ordering        No*    OrderedQuery
Reverse                Ordering        No*    Query
ThenBy                 Ordering        No*    OrderedQuery
ThenByDescending       Ordering        No*    OrderedQuery
ThenByStr              Ordering        No*    OrderedQuery
ThenByStrDescending    Ordering        No*    OrderedQuery
ThenByNum              Ordering        No*    OrderedQuery
ThenByNumDescending    Ordering        No*    OrderedQuery
GroupBy                Grouping        No*    Query
Distinct               Set Operation   Yes    Query
Union                  Set Operation   No*    Query
Intersect              Set Operation   No*    Query
Except                 Set Operation   No*    Query
Join                   Join            No*    Query
GroupJoin              Join            No*    Query
All                    Quantifier      No     Boolean
Any                    Quantifier      No     Boolean
Contains               Quantifier      No     Boolean
SequenceEqual          Comparison      No     Boolean
First                  Element Access  No     Element
FirstOrDefault         Element Access  No     Element
Last                   Element Access  No*    Element
LastOrDefault          Element Access  No*    Element or undef
Single                 Element Access  No*    Element
SingleOrDefault        Element Access  No*    Element or undef
ElementAt              Element Access  No*    Element
ElementAtOrDefault     Element Access  No*    Element or undef
Count                  Aggregation     No     Integer
Sum                    Aggregation     No     Number
Min                    Aggregation     No     Number
Max                    Aggregation     No     Number
Average                Aggregation     No     Number
AverageOrDefault       Aggregation     No     Number or undef
Aggregate              Aggregation     No     Any
DefaultIfEmpty         Conversion      Yes    Query
ToArray                Conversion      No     Array
ToList                 Conversion      No     ArrayRef
ToDictionary           Conversion      No     HashRef
ToLookup               Conversion      No     HashRef
ToLTSV                 Conversion      No     Boolean
ForEach                Utility         No     Void

* Materializing operation (loads all data into memory)
OrderedQuery = LTSV::LINQ::Ordered (subclass of LTSV::LINQ;
               all LTSV::LINQ methods available plus ThenBy* methods)

Data Source Methods

From(\@array)

Create a query from an array.

my $query = LTSV::LINQ->From([{name => 'Alice'}, {name => 'Bob'}]);
FromLTSV($filename)

Create a query from an LTSV file.

my $query = LTSV::LINQ->FromLTSV("access.log");

File handle management: FromLTSV opens the file immediately and holds the file handle open until the iterator reaches end-of-file. If the query is not fully consumed (e.g. you call First or Take and stop early), the file handle remains open until the query object is garbage collected.

This is harmless for a small number of files, but if you open many LTSV files concurrently without consuming them fully, you may exhaust the OS file descriptor limit. In such cases, consume the query fully or use ToArray() to materialise the data and close the file immediately:

# File closed as soon as all records are loaded
my @records = LTSV::LINQ->FromLTSV("access.log")->ToArray();
Range($start, $count)

Generate a sequence of integers.

my $query = LTSV::LINQ->Range(1, 10);  # 1, 2, ..., 10
Empty()

Create an empty sequence.

Returns: Empty LTSV::LINQ query

Examples:

my $empty = LTSV::LINQ->Empty();
$empty->Count();  # 0

# Conditional empty sequence
my $result = $condition ? $query : LTSV::LINQ->Empty();

Note: Equivalent to From([]) but more explicit.

Repeat($element, $count)

Repeat the same element a specified number of times.

Parameters:

  • $element - Element to repeat

  • $count - Number of times to repeat

Returns: LTSV::LINQ query with repeated elements

Examples:

# Repeat scalar
LTSV::LINQ->Repeat('x', 5)->ToArray();  # ('x', 'x', 'x', 'x', 'x')

# Repeat reference (same reference repeated)
my $item = {id => 1};
LTSV::LINQ->Repeat($item, 3)->ToArray();  # ($item, $item, $item)

# Generate default values
LTSV::LINQ->Repeat(0, 10)->ToArray();  # (0, 0, 0, ..., 0)

Note: The element reference is repeated, not cloned.

Filtering Methods

Where($predicate)
Where(key => value, ...)

Filter elements. Accepts either a code reference or DSL form.

Code Reference Form:

->Where(sub { $_[0]{status} == 200 })
->Where(sub { $_[0]{status} >= 400 && $_[0]{bytes} > 1000 })

The code reference receives each element as $_[0] and should return true to include the element, false to exclude it.

DSL Form:

The DSL (Domain Specific Language) form provides a concise syntax for simple equality comparisons. All conditions are combined with AND logic.

# Single condition
->Where(status => '200')

# Multiple conditions (AND)
->Where(status => '200', method => 'GET')

# Equivalent to:
->Where(sub {
    $_[0]{status} eq '200' && $_[0]{method} eq 'GET'
})

DSL Specification:

  • Arguments must be an even number of key => value pairs

    The DSL form interprets its arguments as a flat list of key-value pairs. Passing an odd number of arguments produces a Perl warning (Odd number of elements in hash assignment) and the unpaired key receives undef as its value, which will never match. Always use complete pairs:

    ->Where(status => '200')              # correct: 1 pair
    ->Where(status => '200', method => 'GET')  # correct: 2 pairs
    ->Where(status => '200', 'method')    # wrong: 3 args, Perl warning
  • All comparisons are string equality (eq)

  • All conditions are combined with AND

  • Undefined values are treated as failures

  • For numeric or OR logic, use code reference form

Examples:

# DSL: Simple and readable
->Where(status => '200')
->Where(user => 'alice', role => 'admin')

# Code ref: Complex logic
->Where(sub { $_[0]{status} >= 400 && $_[0]{status} < 500 })
->Where(sub { $_[0]{user} eq 'alice' || $_[0]{user} eq 'bob' })

Projection Methods

Select($selector)

Transform each element using the provided selector function.

The selector receives each element as $_[0] and should return the transformed value.

Parameters:

  • $selector - Code reference that transforms each element

Returns: New query with transformed elements (lazy)

Examples:

# Extract single field
->Select(sub { $_[0]{url} })

# Transform to new structure
->Select(sub {
    {
        path => $_[0]{url},
        code => $_[0]{status}
    }
})

# Calculate derived values
->Select(sub { $_[0]{bytes} * 8 })  # bytes to bits

Note: Select preserves one-to-one mapping. For one-to-many, use SelectMany.

SelectMany($selector)

Flatten nested sequences into a single sequence.

The selector should return an array reference. All arrays are flattened into a single sequence.

Parameters:

  • $selector - Code reference returning array reference

Returns: New query with flattened elements (lazy)

Examples:

# Flatten array of arrays
my @nested = ([1, 2], [3, 4], [5]);
LTSV::LINQ->From(\@nested)
    ->SelectMany(sub { $_[0] })
    ->ToArray();  # (1, 2, 3, 4, 5)

# Expand related records
->SelectMany(sub {
    my $user = shift;
    return [ map {
        { user => $user->{name}, role => $_ }
    } @{$user->{roles}} ];
})

Use Cases:

  • Flattening nested arrays

  • Expanding one-to-many relationships

  • Generating multiple outputs per input

Important: The selector must return an ARRAY reference. If it returns any other value (e.g. a hashref or scalar), this method throws an exception:

die "SelectMany: selector must return an ARRAY reference"

This matches the behaviour of .NET LINQ's SelectMany, which requires the selector to return an IEnumerable. Always wrap results in [...]:

->SelectMany(sub { [ $_[0]{items} ] })   # correct: arrayref
->SelectMany(sub {   $_[0]{items}   })   # wrong: dies at runtime

Concatenation Methods

Concat($second)

Concatenate two sequences into one.

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

Returns: New query with both sequences concatenated (lazy)

Examples:

# Combine two data sources
my $q1 = LTSV::LINQ->From([1, 2, 3]);
my $q2 = LTSV::LINQ->From([4, 5, 6]);
$q1->Concat($q2)->ToArray();  # (1, 2, 3, 4, 5, 6)

# Merge LTSV files
LTSV::LINQ->FromLTSV("jan.log")
    ->Concat(LTSV::LINQ->FromLTSV("feb.log"))
    ->Where(status => '500')

Note: This operation is lazy - sequences are read on-demand.

Zip($second, $result_selector)

Combine two sequences element-wise using a result selector function.

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

  • $result_selector - Function to combine elements: ($first, $second) -> $result

Returns: New query with combined elements (lazy)

Examples:

# Combine numbers
my $numbers = LTSV::LINQ->From([1, 2, 3]);
my $letters = LTSV::LINQ->From(['a', 'b', 'c']);
$numbers->Zip($letters, sub {
    my($num, $letter) = @_;
    return "$num-$letter";
})->ToArray();  # ('1-a', '2-b', '3-c')

# Create key-value pairs
my $keys = LTSV::LINQ->From(['name', 'age', 'city']);
my $values = LTSV::LINQ->From(['Alice', 30, 'NYC']);
$keys->Zip($values, sub {
    return {$_[0] => $_[1]};
})->ToArray();

# Stops at shorter sequence
LTSV::LINQ->From([1, 2, 3, 4])
    ->Zip(LTSV::LINQ->From(['a', 'b']), sub { [$_[0], $_[1]] })
    ->ToArray();  # ([1, 'a'], [2, 'b'])

Note: Iteration stops when either sequence ends.

Partitioning Methods

Take($count)

Take the first N elements from the sequence.

Parameters:

  • $count - Number of elements to take (integer >= 0)

Returns: New query limited to first N elements (lazy)

Examples:

# Top 10 results
->OrderByDescending(sub { $_[0]{score} })
  ->Take(10)

# First record only
->Take(1)->ToArray()

# Limit large file processing
LTSV::LINQ->FromLTSV("huge.log")->Take(1000)

Note: Take(0) returns empty sequence. Negative values treated as 0.

Skip($count)

Skip the first N elements, return the rest.

Parameters:

  • $count - Number of elements to skip (integer >= 0)

Returns: New query skipping first N elements (lazy)

Examples:

# Skip header row
->Skip(1)

# Pagination: page 3, size 20
->Skip(40)->Take(20)

# Skip first batch
->Skip(1000)->ForEach(sub { ... })

Use Cases:

  • Pagination

  • Skipping header rows

  • Processing in batches

TakeWhile($predicate)

Take elements while the predicate is true. Stops at first false.

Parameters:

  • $predicate - Code reference returning boolean

Returns: New query taking elements while predicate holds (lazy)

Examples:

# Take while value is small
->TakeWhile(sub { $_[0]{count} < 100 })

# Take while timestamp is in range
->TakeWhile(sub { $_[0]{time} lt '2026-02-01' })

# Process until error
->TakeWhile(sub { $_[0]{status} < 400 })

Important: TakeWhile stops immediately when predicate returns false. It does NOT filter - it terminates the sequence.

# Different from Where:
->TakeWhile(sub { $_[0] < 5 })  # 1,2,3,4 then STOP
->Where(sub { $_[0] < 5 })      # 1,2,3,4 (checks all)
SkipWhile($predicate)

Skip elements while the predicate is true. Returns rest after first false.

Parameters:

  • $predicate - Code reference returning boolean

Returns: New query skipping initial elements (lazy)

Examples:

# Skip header lines
->SkipWhile(sub { $_[0]{line} =~ /^#/ })

# Skip while value is small
->SkipWhile(sub { $_[0]{count} < 100 })

# Process after certain timestamp
->SkipWhile(sub { $_[0]{time} lt '2026-02-01' })

Important: SkipWhile only skips initial elements. Once predicate is false, all remaining elements are included.

[1,2,3,4,5,2,1]->SkipWhile(sub { $_[0] < 4 })  # (4,5,2,1)

Ordering Methods

Sort stability: OrderBy* and ThenBy* use a Schwartzian-Transform decorated-array technique that appends the original element index as a final tie-breaker. This guarantees completely stable multi-key sorting on every Perl version including 5.005_03, where built-in sort stability is not guaranteed.

Comparison type: LTSV::LINQ provides three families:

  • OrderBy / OrderByDescending / ThenBy / ThenByDescending

    Smart comparison: numeric (<=>) when both keys look numeric, string (cmp) otherwise. Convenient for LTSV data where field values are always strings but commonly hold numbers.

  • OrderByStr / OrderByStrDescending / ThenByStr / ThenByStrDescending

    Unconditional string comparison (cmp). Use when keys must sort lexicographically regardless of content (e.g. version strings, codes).

  • OrderByNum / OrderByNumDescending / ThenByNum / ThenByNumDescending

    Unconditional numeric comparison (<=>). Use when keys are always numeric. Undefined or empty values are treated as 0.

IOrderedEnumerable: OrderBy* methods return a LTSV::LINQ::Ordered object (a subclass of LTSV::LINQ). This mirrors the way .NET LINQ's OrderBy returns IOrderedEnumerable<T>, which exposes ThenBy and ThenByDescending. All LTSV::LINQ methods (Where, Select, Take, etc.) are available on the returned object through inheritance. ThenBy* methods are only available on LTSV::LINQ::Ordered objects, not on plain LTSV::LINQ objects.

Non-destructive: ThenBy* always returns a new LTSV::LINQ::Ordered object; the original is unchanged. Branching sort chains work correctly:

my $by_dept = LTSV::LINQ->From(\@data)->OrderBy(sub { $_[0]{dept} });
my $asc  = $by_dept->ThenBy(sub    { $_[0]{name}   });
my $desc = $by_dept->ThenByNum(sub { $_[0]{salary} });
# $asc and $desc are completely independent queries
OrderBy($key_selector)

Sort in ascending order using smart comparison: if both keys look like numbers (integers, decimals, negative, or exponential notation), numeric comparison (<=>) is used; otherwise string comparison (cmp) is used. Returns a LTSV::LINQ::Ordered object.

->OrderBy(sub { $_[0]{timestamp} })   # string keys: lexicographic
->OrderBy(sub { $_[0]{bytes} })       # "1024", "256" -> numeric (256, 1024)

Note: When you need explicit control over the comparison type, use OrderByStr (always cmp) or OrderByNum (always <=>).

OrderByDescending($key_selector)

Sort in descending order using the same smart comparison as OrderBy. Returns a LTSV::LINQ::Ordered object.

->OrderByDescending(sub { $_[0]{count} })
OrderByStr($key_selector)

Sort in ascending order using string comparison (cmp) unconditionally. Returns a LTSV::LINQ::Ordered object.

->OrderByStr(sub { $_[0]{code} })    # "10" lt "9" (lexicographic)
OrderByStrDescending($key_selector)

Sort in descending order using string comparison (cmp) unconditionally. Returns a LTSV::LINQ::Ordered object.

->OrderByStrDescending(sub { $_[0]{name} })
OrderByNum($key_selector)

Sort in ascending order using numeric comparison (<=>) unconditionally. Returns a LTSV::LINQ::Ordered object.

->OrderByNum(sub { $_[0]{bytes} })   # 9 < 10 (numeric)

Note: Undefined or empty values are treated as 0.

OrderByNumDescending($key_selector)

Sort in descending order using numeric comparison (<=>) unconditionally. Returns a LTSV::LINQ::Ordered object.

->OrderByNumDescending(sub { $_[0]{response_time} })
Reverse()

Reverse the order.

->Reverse()
ThenBy($key_selector)

Add an ascending secondary sort key using smart comparison. Must be called on a LTSV::LINQ::Ordered object (i.e., after OrderBy*). Returns a new LTSV::LINQ::Ordered object; the original is unchanged.

->OrderBy(sub { $_[0]{dept} })->ThenBy(sub { $_[0]{name} })
ThenByDescending($key_selector)

Add a descending secondary sort key using smart comparison.

->OrderBy(sub { $_[0]{dept} })->ThenByDescending(sub { $_[0]{salary} })
ThenByStr($key_selector)

Add an ascending secondary sort key using string comparison (cmp).

->OrderByStr(sub { $_[0]{dept} })->ThenByStr(sub { $_[0]{code} })
ThenByStrDescending($key_selector)

Add a descending secondary sort key using string comparison (cmp).

->OrderByStr(sub { $_[0]{dept} })->ThenByStrDescending(sub { $_[0]{name} })
ThenByNum($key_selector)

Add an ascending secondary sort key using numeric comparison (<=>).

->OrderByStr(sub { $_[0]{dept} })->ThenByNum(sub { $_[0]{salary} })
ThenByNumDescending($key_selector)

Add a descending secondary sort key using numeric comparison (<=>). Undefined or empty values are treated as 0.

->OrderByStr(sub { $_[0]{host} })->ThenByNumDescending(sub { $_[0]{bytes} })

Grouping Methods

GroupBy($key_selector [, $element_selector])

Group elements by key.

Returns: New query where each element is a hashref with two fields:

  • Key - The group key (string)

  • Elements - Array reference of elements in the group

Note: This operation is eager - the entire sequence is loaded into memory immediately. Groups are returned in the order their keys first appear in the source sequence, matching the behaviour of .NET LINQ's GroupBy.

Examples:

# Group access log by status code
my @groups = LTSV::LINQ->FromLTSV('access.log')
    ->GroupBy(sub { $_[0]{status} })
    ->ToArray();

for my $g (@groups) {
    printf "status=%s count=%d\n", $g->{Key}, scalar @{$g->{Elements}};
}

# With element selector
->GroupBy(sub { $_[0]{status} }, sub { $_[0]{path} })

Note: Elements is a plain array reference, not a LTSV::LINQ object. To apply further LINQ operations on a group, wrap it with From:

for my $g (@groups) {
    my $total = LTSV::LINQ->From($g->{Elements})
        ->Sum(sub { $_[0]{bytes} });
    printf "status=%s total_bytes=%d\n", $g->{Key}, $total;
}

Set Operations

Evaluation model:

  • Distinct is fully lazy: elements are tested one by one as the output sequence is consumed.

  • Union, Intersect, Except are partially eager: when the method is called, the second sequence is consumed in full and stored in an in-memory hash for O(1) lookup. The first sequence is then iterated lazily. This matches the behaviour of .NET LINQ, which also buffers the second (hash-side) sequence up front.

Distinct([$key_selector])

Remove duplicate elements.

Parameters:

  • $key_selector - (Optional) Code ref: ($element) -> $key. Extracts a comparison key from each element. This is a single-argument function (unlike Perl's sort comparator), and is not a two-argument comparison function.

->Distinct()
->Distinct(sub { lc($_[0]) })          # case-insensitive strings
->Distinct(sub { $_[0]{id} })          # hashref: dedupe by field
Union($second [, $key_selector])

Produce set union of two sequences (no duplicates).

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

  • $key_selector - (Optional) Code ref: ($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).

Returns: New query with elements from both sequences (distinct)

Evaluation: Partially eager. The first sequence is iterated lazily; the second is fully consumed at call time and stored in memory.

Examples:

# Simple union
my $q1 = LTSV::LINQ->From([1, 2, 3]);
my $q2 = LTSV::LINQ->From([3, 4, 5]);
$q1->Union($q2)->ToArray();  # (1, 2, 3, 4, 5)

# Case-insensitive union
->Union($other, sub { lc($_[0]) })

Note: Equivalent to Concat()->Distinct(). Automatically removes duplicates.

Intersect($second [, $key_selector])

Produce set intersection of two sequences.

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

  • $key_selector - (Optional) Code ref: ($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).

Returns: New query with common elements only (distinct)

Evaluation: Partially eager. The second sequence is fully consumed at call time and stored in a hash; the first is iterated lazily.

Examples:

# Common elements
LTSV::LINQ->From([1, 2, 3])
    ->Intersect(LTSV::LINQ->From([2, 3, 4]))
    ->ToArray();  # (2, 3)

# Find users in both lists
$users1->Intersect($users2, sub { $_[0]{id} })

Note: Only includes elements present in both sequences.

Except($second [, $key_selector])

Produce set difference (elements in first but not in second).

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

  • $key_selector - (Optional) Code ref: ($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).

Returns: New query with elements only in first sequence (distinct)

Evaluation: Partially eager. The second sequence is fully consumed at call time and stored in a hash; the first is iterated lazily.

Examples:

# Set difference
LTSV::LINQ->From([1, 2, 3])
    ->Except(LTSV::LINQ->From([2, 3, 4]))
    ->ToArray();  # (1)

# Find users in first list but not second
$all_users->Except($inactive_users, sub { $_[0]{id} })

Note: Returns elements from first sequence not present in second.

Join Operations

Evaluation model: Both Join and GroupJoin are partially eager: when the method is called, the inner sequence is consumed in full and stored in an in-memory lookup table (hash of arrays, keyed by inner key). The outer sequence is then iterated lazily, producing results on demand.

This matches the behaviour of .NET LINQ's hash-join implementation. The memory cost is O(inner size); for very large inner sequences, consider reversing the join or pre-filtering the inner sequence before passing it.

Join($inner, $outer_key_selector, $inner_key_selector, $result_selector)

Correlate elements of two sequences based on matching keys (inner join).

Parameters:

  • $inner - Inner sequence (LTSV::LINQ object)

  • $outer_key_selector - Function to extract key from outer element

  • $inner_key_selector - Function to extract key from inner element

  • $result_selector - Function to create result: ($outer_item, $inner_item) -> $result

Returns: Query with joined results

Examples:

# Join users with their orders
my $users = LTSV::LINQ->From([
    {id => 1, name => 'Alice'},
    {id => 2, name => 'Bob'}
]);

my $orders = LTSV::LINQ->From([
    {user_id => 1, product => 'Book'},
    {user_id => 1, product => 'Pen'},
    {user_id => 2, product => 'Notebook'}
]);

$users->Join(
    $orders,
    sub { $_[0]{id} },          # outer key
    sub { $_[0]{user_id} },     # inner key
    sub {
        my($user, $order) = @_;
        return {
            name => $user->{name},
            product => $order->{product}
        };
    }
)->ToArray();
# [{name => 'Alice', product => 'Book'},
#  {name => 'Alice', product => 'Pen'},
#  {name => 'Bob', product => 'Notebook'}]

# Join LTSV files by request ID
LTSV::LINQ->FromLTSV('access.log')->Join(
    LTSV::LINQ->FromLTSV('error.log'),
    sub { $_[0]{request_id} },
    sub { $_[0]{request_id} },
    sub {
        my($access, $error) = @_;
        return {
            url => $access->{url},
            error => $error->{message}
        };
    }
)

Note: This is an inner join - only matching elements are returned. The inner sequence is fully loaded into memory.

GroupJoin($inner, $outer_key_selector, $inner_key_selector, $result_selector)

Correlates elements of two sequences with group join (LEFT OUTER JOIN-like). Each outer element is matched with a group of inner elements (possibly empty).

Parameters:

  • $inner - Inner sequence (LTSV::LINQ object)

  • $outer_key_selector - Function to extract key from outer element

  • $inner_key_selector - Function to extract key from inner element

  • $result_selector - Function: ($outer_item, $inner_group) -> $result. The $inner_group is a LTSV::LINQ object containing matched inner elements (empty sequence if no matches).

Returns: New query with one result per outer element (lazy)

Examples:

# Order count per user (including users with no orders)
my $users = LTSV::LINQ->From([
    {id => 1, name => 'Alice'},
    {id => 2, name => 'Bob'},
    {id => 3, name => 'Carol'}
]);

my $orders = LTSV::LINQ->From([
    {user_id => 1, product => 'Book', amount => 10},
    {user_id => 1, product => 'Pen', amount => 5},
    {user_id => 2, product => 'Notebook', amount => 15}
]);

$users->GroupJoin(
    $orders,
    sub { $_[0]{id} },
    sub { $_[0]{user_id} },
    sub {
        my($user, $orders) = @_;
        return {
            name  => $user->{name},
            count => $orders->Count(),
            total => $orders->Sum(sub { $_[0]{amount} })
        };
    }
)->ToArray();
# [
#   {name => 'Alice', count => 2, total => 15},
#   {name => 'Bob', count => 1, total => 15},
#   {name => 'Carol', count => 0, total => 0},  # no orders
# ]

# Flat list with no-match rows included (LEFT OUTER JOIN, cf. Join for inner join)
$users->GroupJoin(
    $orders,
    sub { $_[0]{id} },
    sub { $_[0]{user_id} },
    sub {
        my($user, $user_orders) = @_;
        my @rows = $user_orders->ToArray();
        return @rows
            ? [ map { {name => $user->{name}, product => $_->{product}} } @rows ]
            : [ {name => $user->{name}, product => 'none'} ];
    }
)->SelectMany(sub { $_[0] }) # Flatten the array references
 ->ToArray();

Note: Unlike Join, every outer element appears in the result even when there are no matching inner elements (LEFT OUTER JOIN semantics). The inner sequence is fully loaded into memory.

Important: The $inner_group LTSV::LINQ object is highly flexible. It is specifically designed to be iterated multiple times within the result selector (e.g., calling Count() followed by Sum()) because it generates a fresh iterator for every terminal operation.

Quantifier Methods

All($predicate)

Test if all elements satisfy condition.

->All(sub { $_[0]{status} == 200 })
Any([$predicate])

Test if any element satisfies condition.

->Any(sub { $_[0]{status} >= 400 })
->Any()  # Test if sequence is non-empty
Contains($value [, $comparer])

Check if sequence contains specified element.

Parameters:

  • $value - Value to search for

  • $comparer - (Optional) Custom comparison function

Returns: Boolean (1 or 0)

Examples:

# Simple search
->Contains(5)  # 1 if found, 0 otherwise

# Case-insensitive search
->Contains('foo', sub { lc($_[0]) eq lc($_[1]) })

# Check for undef
->Contains(undef)
SequenceEqual($second [, $comparer])

Determine if two sequences are equal (same elements in same order).

Parameters:

  • $second - Second sequence (LTSV::LINQ object)

  • $comparer - (Optional) Comparison function ($a, $b) -> boolean

Returns: Boolean (1 if equal, 0 otherwise)

Examples:

# Same sequences
LTSV::LINQ->From([1, 2, 3])
    ->SequenceEqual(LTSV::LINQ->From([1, 2, 3]))  # 1 (true)

# Different elements
LTSV::LINQ->From([1, 2, 3])
    ->SequenceEqual(LTSV::LINQ->From([1, 2, 4]))  # 0 (false)

# Different lengths
LTSV::LINQ->From([1, 2])
    ->SequenceEqual(LTSV::LINQ->From([1, 2, 3]))  # 0 (false)

# Case-insensitive comparison
$seq1->SequenceEqual($seq2, sub { lc($_[0]) eq lc($_[1]) })

Note: Order matters. Both content AND order must match.

Element Access Methods

First([$predicate])

Get first element. Dies if empty.

->First()
->First(sub { $_[0]{status} == 404 })
FirstOrDefault([$predicate,] $default)

Get first element or default value.

->FirstOrDefault(undef, {})
Last([$predicate])

Get last element. Dies if empty.

->Last()
LastOrDefault([$predicate,] $default)

Get last element or default value. Never throws exceptions.

Parameters:

  • $predicate - (Optional) Condition

  • $default - (Optional) Value to return when no element is found. Defaults to undef when omitted.

Returns: Last element or $default

Examples:

# Get last element (undef if empty)
->LastOrDefault()

# Specify a default value
LTSV::LINQ->From([])->LastOrDefault(undef, 0)  # 0

# With predicate and default
->LastOrDefault(sub { $_[0] % 2 == 0 }, -1)  # Last even, or -1
Single([$predicate])

Get the only element. Dies if sequence has zero or more than one element.

Parameters:

  • $predicate - (Optional) Condition

Returns: Single element

Exceptions: - Dies with "Sequence contains no elements" if empty - Dies with "Sequence contains more than one element" if multiple elements

.NET LINQ Compatibility: Exception messages match .NET LINQ behavior exactly.

Performance: Uses lazy evaluation. Stops iterating immediately when second element is found (does not load entire sequence).

Examples:

# Exactly one element
LTSV::LINQ->From([5])->Single()  # 5

# With predicate
->Single(sub { $_[0] > 10 })

# Memory-efficient: stops at 2nd element
LTSV::LINQ->FromLTSV("huge.log")->Single(sub { $_[0]{id} eq '999' })
SingleOrDefault([$predicate])

Get the only element, or undef if zero or multiple elements.

Returns: Single element or undef (if 0 or 2+ elements)

.NET LINQ Compatibility: Note: .NET's SingleOrDefault throws InvalidOperationException when the sequence contains more than one element. LTSV::LINQ returns undef in that case instead of throwing, which makes it more convenient for Perl code that checks return values. If you require the strict .NET behaviour (exception on multiple elements), use Single() wrapped in eval.

Performance: Uses lazy evaluation. Memory-efficient.

Examples:

LTSV::LINQ->From([5])->SingleOrDefault()  # 5
LTSV::LINQ->From([])->SingleOrDefault()   # undef (empty)
LTSV::LINQ->From([1,2])->SingleOrDefault()  # undef (multiple)
ElementAt($index)

Get element at specified index. Dies if out of range.

Parameters:

  • $index - Zero-based index

Returns: Element at index

Exceptions: Dies if index is negative or out of range

Performance: Uses lazy evaluation (iterator-based). Does NOT load entire sequence into memory. Stops iterating once target index is reached.

Examples:

->ElementAt(0)  # First element
->ElementAt(2)  # Third element

# Memory-efficient for large files
LTSV::LINQ->FromLTSV("huge.log")->ElementAt(10)  # Reads only 11 lines
ElementAtOrDefault($index)

Get element at index, or undef if out of range.

Returns: Element or undef

Performance: Uses lazy evaluation (iterator-based). Memory-efficient.

Examples:

->ElementAtOrDefault(0)   # First element
->ElementAtOrDefault(99)  # undef if out of range

Aggregation Methods

All aggregation methods are terminal operations - they consume the entire sequence and return a scalar value.

Count([$predicate])

Count the number of elements.

Parameters:

  • $predicate - (Optional) Code reference to filter elements

Returns: Integer count

Examples:

# Count all
->Count()  # 1000

# Count with condition
->Count(sub { $_[0]{status} >= 400 })  # 42

# Equivalent to
->Where(sub { $_[0]{status} >= 400 })->Count()

Performance: O(n) - must iterate entire sequence

Sum([$selector])

Calculate sum of numeric values.

Parameters:

  • $selector - (Optional) Code reference to extract value. Default: identity function

Returns: Numeric sum

Examples:

# Sum of values
LTSV::LINQ->From([1, 2, 3, 4, 5])->Sum()  # 15

# Sum of field
->Sum(sub { $_[0]{bytes} })

# Sum with transformation
->Sum(sub { $_[0]{price} * $_[0]{quantity} })

Note: Non-numeric values may produce warnings. Use numeric context.

Empty sequence: Returns 0.

Min([$selector])

Find minimum value.

Parameters:

  • $selector - (Optional) Code reference to extract value

Returns: Minimum value, or undef if sequence is empty.

Examples:

# Minimum of values
->Min()

# Minimum of field
->Min(sub { $_[0]{response_time} })

# Oldest timestamp
->Min(sub { $_[0]{timestamp} })
Max([$selector])

Find maximum value.

Parameters:

  • $selector - (Optional) Code reference to extract value

Returns: Maximum value, or undef if sequence is empty.

Examples:

# Maximum of values
->Max()

# Maximum of field
->Max(sub { $_[0]{bytes} })

# Latest timestamp
->Max(sub { $_[0]{timestamp} })
Average([$selector])

Calculate arithmetic mean.

Parameters:

  • $selector - (Optional) Code reference to extract value

Returns: Numeric average (floating point)

Examples:

# Average of values
LTSV::LINQ->From([1, 2, 3, 4, 5])->Average()  # 3

# Average of field
->Average(sub { $_[0]{bytes} })

# Average response time
->Average(sub { $_[0]{response_time} })

Empty sequence: Dies with "Sequence contains no elements". Unlike Sum (returns 0) and Min/Max (return undef), Average throws on an empty sequence. Use AverageOrDefault to avoid the exception.

Note: Returns floating point. Use int() for integer result.

AverageOrDefault([$selector])

Calculate arithmetic mean, or return undef if sequence is empty.

Parameters:

  • $selector - (Optional) Code reference to extract value

Returns: Numeric average (floating point), or undef if empty

Examples:

# Safe average - returns undef for empty sequence
my @empty = ();
my $avg = LTSV::LINQ->From(\@empty)->AverageOrDefault();  # undef

# With data
LTSV::LINQ->From([1, 2, 3])->AverageOrDefault();  # 2

# With selector
->AverageOrDefault(sub { $_[0]{value} })

Note: Unlike Average(), this method never throws an exception.

Aggregate([$seed,] $func [, $result_selector])

Apply an accumulator function over a sequence.

Signatures:

  • Aggregate($func) - Use first element as seed

  • Aggregate($seed, $func) - Explicit seed value

  • Aggregate($seed, $func, $result_selector) - Transform result

Parameters:

  • $seed - Initial accumulator value (optional for first signature)

  • $func - Code reference: ($accumulator, $element) -> $new_accumulator

  • $result_selector - (Optional) Transform final result

Returns: Accumulated value

Examples:

# Sum (without seed)
LTSV::LINQ->From([1,2,3,4])->Aggregate(sub { $_[0] + $_[1] })  # 10

# Product (with seed)
LTSV::LINQ->From([2,3,4])->Aggregate(1, sub { $_[0] * $_[1] })  # 24

# Concatenate strings
LTSV::LINQ->From(['a','b','c'])
    ->Aggregate('', sub { $_[0] ? "$_[0],$_[1]" : $_[1] })  # 'a,b,c'

# With result selector
LTSV::LINQ->From([1,2,3])
    ->Aggregate(0,
        sub { $_[0] + $_[1] },      # accumulate
        sub { "Sum: $_[0]" })       # transform result
# "Sum: 6"

# Build complex structure
->Aggregate([], sub {
    my($list, $item) = @_;
    push @$list, uc($item);
    return $list;
})

.NET LINQ Compatibility: Supports all three .NET signatures.

Conversion Methods

ToArray()

Convert to array.

my @array = $query->ToArray();
ToList()

Convert to array reference.

my $arrayref = $query->ToList();
ToDictionary($key_selector [, $value_selector])

Convert sequence to hash reference with unique keys.

Parameters:

  • $key_selector - Function to extract key from element

  • $value_selector - (Optional) Function to extract value, defaults to element itself

Returns: Hash reference

Examples:

# ID to name mapping
my $users = LTSV::LINQ->From([
    {id => 1, name => 'Alice'},
    {id => 2, name => 'Bob'}
]);

my $dict = $users->ToDictionary(
    sub { $_[0]{id} },
    sub { $_[0]{name} }
);
# {1 => 'Alice', 2 => 'Bob'}

# Without value selector (stores entire element)
my $dict = $users->ToDictionary(sub { $_[0]{id} });
# {1 => {id => 1, name => 'Alice'}, 2 => {id => 2, name => 'Bob'}}

# Quick lookup table
my $status_codes = LTSV::LINQ->FromLTSV('access.log')
    ->Select(sub { $_[0]{status} })
    ->Distinct()
    ->ToDictionary(sub { $_ }, sub { 1 });

Note: If duplicate keys exist, later values overwrite earlier ones.

.NET LINQ Compatibility: .NET's ToDictionary throws ArgumentException on duplicate keys. This module silently overwrites with the later value, following Perl hash semantics. Use ToLookup if you need to preserve all values for each key.

ToLookup($key_selector [, $value_selector])

Convert sequence to hash reference with grouped values (multi-value dictionary).

Parameters:

  • $key_selector - Function to extract key from element

  • $value_selector - (Optional) Function to extract value, defaults to element itself

Returns: Hash reference where values are array references

Examples:

# Group orders by user ID
my $orders = LTSV::LINQ->From([
    {user_id => 1, product => 'Book'},
    {user_id => 1, product => 'Pen'},
    {user_id => 2, product => 'Notebook'}
]);

my $lookup = $orders->ToLookup(
    sub { $_[0]{user_id} },
    sub { $_[0]{product} }
);
# {
#   1 => ['Book', 'Pen'],
#   2 => ['Notebook']
# }

# Group LTSV by status code
my $by_status = LTSV::LINQ->FromLTSV('access.log')
    ->ToLookup(sub { $_[0]{status} });
# {
#   '200' => [{...}, {...}, ...],
#   '404' => [{...}, ...],
#   '500' => [{...}]
# }

Note: Unlike ToDictionary, this preserves all values for each key.

DefaultIfEmpty([$default_value])

Return default value if sequence is empty, otherwise return the sequence.

Parameters:

  • $default_value - (Optional) Default value, defaults to undef

Returns: New query with default value if empty (lazy)

Examples:

# Return 0 if empty
->DefaultIfEmpty(0)->ToArray()  # (0) if empty, or original data

# With undef default
->DefaultIfEmpty()->First()  # undef if empty

# Useful for left joins
->Where(condition)->DefaultIfEmpty({id => 0, name => 'None'})

Note: This is useful for ensuring a sequence always has at least one element.

ToLTSV($filename)

Write to LTSV file.

$query->ToLTSV("output.ltsv");

Utility Methods

ForEach($action)

Execute action for each element.

$query->ForEach(sub { print $_[0]{url}, "\n" });

EXAMPLES

Basic Filtering

use LTSV::LINQ;

# DSL syntax
my @successful = LTSV::LINQ->FromLTSV("access.log")
    ->Where(status => '200')
    ->ToArray();

# Code reference
my @errors = LTSV::LINQ->FromLTSV("access.log")
    ->Where(sub { $_[0]{status} >= 400 })
    ->ToArray();

Aggregation

# Count errors
my $error_count = LTSV::LINQ->FromLTSV("access.log")
    ->Where(sub { $_[0]{status} >= 400 })
    ->Count();

# Average bytes for successful requests
my $avg_bytes = LTSV::LINQ->FromLTSV("access.log")
    ->Where(status => '200')
    ->Average(sub { $_[0]{bytes} });

print "Average bytes: $avg_bytes\n";

Grouping and Ordering

# Top 10 URLs by request count
my @top_urls = LTSV::LINQ->FromLTSV("access.log")
    ->Where(sub { $_[0]{status} eq '200' })
    ->GroupBy(sub { $_[0]{url} })
    ->Select(sub {
        my $g = shift;
        return {
            URL => $g->{Key},
            Count => scalar(@{$g->{Elements}}),
            TotalBytes => LTSV::LINQ->From($g->{Elements})
                ->Sum(sub { $_[0]{bytes} })
        };
    })
    ->OrderByDescending(sub { $_[0]{Count} })
    ->Take(10)
    ->ToArray();

for my $stat (@top_urls) {
    printf "%5d requests - %s (%d bytes)\n",
        $stat->{Count}, $stat->{URL}, $stat->{TotalBytes};
}

Complex Query Chain

# Multi-step analysis
my @result = LTSV::LINQ->FromLTSV("access.log")
    ->Where(status => '200')              # Filter successful
    ->Select(sub { $_[0]{bytes} })         # Extract bytes
    ->Where(sub { $_[0] > 1000 })          # Large responses only
    ->OrderByDescending(sub { $_[0] })     # Sort descending
    ->Take(100)                             # Top 100
    ->ToArray();

print "Largest 100 successful responses:\n";
print "  ", join(", ", @result), "\n";

Lazy Processing of Large Files

# Process huge file with constant memory
LTSV::LINQ->FromLTSV("huge.log")
    ->Where(sub { $_[0]{level} eq 'ERROR' })
    ->ForEach(sub {
        my $rec = shift;
        print "ERROR at $rec->{time}: $rec->{message}\n";
    });

Quantifiers

# Check if all requests are successful
my $all_ok = LTSV::LINQ->FromLTSV("access.log")
    ->All(sub { $_[0]{status} < 400 });

print $all_ok ? "All OK\n" : "Some errors\n";

# Check if any errors exist
my $has_errors = LTSV::LINQ->FromLTSV("access.log")
    ->Any(sub { $_[0]{status} >= 500 });

print "Server errors detected\n" if $has_errors;

Data Transformation

# Read LTSV, transform, write back
LTSV::LINQ->FromLTSV("input.ltsv")
    ->Select(sub {
        my $rec = shift;
        return {
            %$rec,
            processed => 1,
            timestamp => time(),
        };
    })
    ->ToLTSV("output.ltsv");

Working with Arrays

# Query in-memory data
my @data = (
    {name => 'Alice', age => 30, city => 'Tokyo'},
    {name => 'Bob',   age => 25, city => 'Osaka'},
    {name => 'Carol', age => 35, city => 'Tokyo'},
);

my @tokyo_residents = LTSV::LINQ->From(\@data)
    ->Where(city => 'Tokyo')
    ->OrderBy(sub { $_[0]{age} })
    ->ToArray();

FEATURES

Lazy Evaluation

All query operations use lazy evaluation via iterators. Data is processed on-demand, not all at once.

# Only reads 10 records from file
my @top10 = LTSV::LINQ->FromLTSV("huge.log")
    ->Take(10)
    ->ToArray();

Method Chaining

All methods (except terminal operations like ToArray) return a new query object, enabling fluent method chaining.

->Where(...)->Select(...)->OrderBy(...)->Take(10)

DSL Syntax

Simple key-value filtering without code references.

# Readable and concise
->Where(status => '200', method => 'GET')

# Instead of
->Where(sub { $_[0]{status} eq '200' && $_[0]{method} eq 'GET' })

ARCHITECTURE

Iterator-Based Design

LTSV::LINQ uses an iterator-based architecture for lazy evaluation.

Core Concept:

Each query operation returns a new query object wrapping an iterator (a code reference that produces one element per call).

my $iter = sub {
    # Read next element
    # Apply transformation
    # Return element or undef
};

my $query = LTSV::LINQ->new($iter);

Benefits:

  • Memory Efficiency - O(1) memory for most operations

  • Lazy Evaluation - Elements computed on-demand

  • Composability - Iterators chain naturally

  • Early Termination - Stop processing when done

Method Categories

The table below shows, for every method, whether it is lazy or eager, and what it returns. Knowing this prevents surprises about memory use and iterator consumption.

Method                Category        Evaluation         Returns
------                --------        ----------         -------
From                  Source          Lazy (factory)     Query
FromLTSV              Source          Lazy (factory)     Query
Range                 Source          Lazy               Query
Empty                 Source          Lazy               Query
Repeat                Source          Lazy               Query
Where                 Filter          Lazy               Query
Select                Projection      Lazy               Query
SelectMany            Projection      Lazy               Query
Concat                Concatenation   Lazy               Query
Zip                   Concatenation   Lazy               Query
Take                  Partitioning    Lazy               Query
Skip                  Partitioning    Lazy               Query
TakeWhile             Partitioning    Lazy               Query
SkipWhile             Partitioning    Lazy               Query
Distinct              Set Operation   Lazy (1st seq)     Query
DefaultIfEmpty        Conversion      Lazy               Query
OrderBy               Ordering        Eager (full)       Query
OrderByDescending     Ordering        Eager (full)       Query
OrderByStr            Ordering        Eager (full)       Query
OrderByStrDescending  Ordering        Eager (full)       Query
OrderByNum            Ordering        Eager (full)       Query
OrderByNumDescending  Ordering        Eager (full)       Query
Reverse               Ordering        Eager (full)       Query
GroupBy               Grouping        Eager (full)       Query
Union                 Set Operation   Eager (2nd seq)    Query
Intersect             Set Operation   Eager (2nd seq)    Query
Except                Set Operation   Eager (2nd seq)    Query
Join                  Join            Eager (inner seq)  Query
GroupJoin             Join            Eager (inner seq)  Query
All                   Quantifier      Lazy (early exit)  Boolean
Any                   Quantifier      Lazy (early exit)  Boolean
Contains              Quantifier      Lazy (early exit)  Boolean
SequenceEqual         Comparison      Lazy (early exit)  Boolean
First                 Element Access  Lazy (early exit)  Element
FirstOrDefault        Element Access  Lazy (early exit)  Element
Last                  Element Access  Eager (full)       Element
LastOrDefault         Element Access  Eager (full)       Element
Single                Element Access  Lazy (stops at 2)  Element
SingleOrDefault       Element Access  Lazy (stops at 2)  Element
ElementAt             Element Access  Lazy (early exit)  Element
ElementAtOrDefault    Element Access  Lazy (early exit)  Element
Count                 Aggregation     Eager (full)       Integer
Sum                   Aggregation     Eager (full)       Number
Min                   Aggregation     Eager (full)       Number
Max                   Aggregation     Eager (full)       Number
Average               Aggregation     Eager (full)       Number
AverageOrDefault      Aggregation     Eager (full)       Number or undef
Aggregate             Aggregation     Eager (full)       Scalar
ToArray               Conversion      Eager (full)       Array
ToList                Conversion      Eager (full)       ArrayRef
ToDictionary          Conversion      Eager (full)       HashRef
ToLookup              Conversion      Eager (full)       HashRef
ToLTSV                Conversion      Eager (full)       (file written)
ForEach               Utility         Eager (full)       (void)

Legend:

  • Lazy - returns a new Query immediately; no data is read yet.

  • Lazy (early exit) - reads only as many elements as needed, then stops.

  • Lazy (stops at 2) - reads until it finds a second match, then stops.

  • Eager (full) - must read the entire input sequence before returning.

  • Eager (2nd seq) / Eager (inner seq) - the indicated sequence is read in full up front; the other sequence remains lazy.

Practical guidance:

  • Chain lazy operations freely - no cost until a terminal is called.

  • Each terminal operation exhausts the iterator; to reuse data, call ToArray() first and rebuild with From(\@array).

  • For very large files, avoid eager operations (OrderBy, GroupBy, Join, etc.) unless the data fits in memory, or pre-filter with Where to reduce the working set first.

Query Execution Flow

# Build query (lazy - no execution yet)
my $query = LTSV::LINQ->FromLTSV("access.log")
    ->Where(status => '200')      # Lazy
    ->Select(sub { $_[0]{url} })  # Lazy
    ->Distinct();                  # Lazy

# Execute query (terminal operation)
my @results = $query->ToArray();  # Now executes entire chain

Execution Order:

1. FromLTSV opens file and creates iterator
2. Where wraps iterator with filter
3. Select wraps with transformation
4. Distinct wraps with deduplication
5. ToArray pulls elements through chain

Each element flows through the entire chain before the next element is read.

Memory Characteristics

O(1) / Streaming Operations:

These hold at most one element in memory at a time:

  • Where, Select, SelectMany, Concat, Zip

  • Take, Skip, TakeWhile, SkipWhile

  • DefaultIfEmpty

  • ForEach, Count, Sum, Min, Max, Average, AverageOrDefault

  • First, FirstOrDefault, Any, All, Contains

  • Single, SingleOrDefault, ElementAt, ElementAtOrDefault

O(unique) Operations:

  • Distinct - hash grows with the number of distinct keys seen

O(second/inner sequence) Operations:

The following are partially eager: one sequence is buffered in full, the other is streamed:

  • Union, Intersect, Except - second sequence is fully loaded

  • Join, GroupJoin - inner sequence is fully loaded

O(n) / Full-materialisation Operations:

  • ToArray, ToList, ToDictionary, ToLookup, ToLTSV (O(n))

  • OrderBy, OrderByDescending and Str/Num variants, Reverse (O(n))

  • GroupBy (O(n))

  • Last, LastOrDefault (O(n))

  • Aggregate (O(n), O(1) intermediate accumulator)

PERFORMANCE

Memory Efficiency

Lazy evaluation means memory usage is O(1) for most operations, regardless of input size.

# Processes 1GB file with constant memory
LTSV::LINQ->FromLTSV("1gb.log")
    ->Where(status => '500')
    ->ForEach(sub { print $_[0]{url}, "\n" });

Terminal Operations

These operations materialize the entire result set:

  • ToArray, ToList

  • OrderBy, OrderByDescending, Reverse

  • GroupBy

  • Last

For large datasets, use these operations carefully.

Optimization Tips

  • Filter early: Place Where clauses first

    # Good: Filter before expensive operations
    ->Where(status => '200')->OrderBy(...)->Take(10)
    
    # Bad: Order all data, then filter
    ->OrderBy(...)->Where(status => '200')->Take(10)
  • Limit early: Use Take to reduce processing

    # Process only what you need
    ->Take(1000)->GroupBy(...)
  • Avoid repeated ToArray: Reuse results

    # Bad: Calls ToArray twice
    my $count = scalar($query->ToArray());
    my @items = $query->ToArray();
    
    # Good: Call once, reuse
    my @items = $query->ToArray();
    my $count = scalar(@items);

COMPATIBILITY

Perl Version Support

This module is compatible with Perl 5.00503 and later.

Tested on:

  • Perl 5.005_03 (released 1999)

  • Perl 5.6.x

  • Perl 5.8.x

  • Perl 5.10.x - 5.42.x

Compatibility Policy

Ancient Perl Support:

This module maintains compatibility with Perl 5.005_03 through careful coding practices:

  • No use of features introduced after 5.005

  • use warnings compatibility shim for pre-5.6

  • our keyword avoided (5.6+ feature)

  • Three-argument open used on Perl 5.6 and later (two-argument form retained for 5.005_03)

  • No Unicode features required

  • No module dependencies beyond core

Why Perl 5.005_03 Specification?:

This module adheres to the Perl 5.005_03 specification, which was the final version of JPerl (Japanese Perl). This is not about using the old interpreter, but about maintaining the simple, original programming model that made Perl enjoyable.

The Strength of Modern Times:

Some people think the strength of modern times is the ability to use modern technology. That thinking is insufficient. The strength of modern times is the ability to use all technology up to the present day.

By adhering to the Perl 5.005_03 specification, we gain access to the entire history of Perl--from 5.005_03 to 5.42 and beyond--rather than limiting ourselves to only the latest versions.

Key reasons:

  • Simplicity - The original Perl approach keeps programming fun and easy

    Perl 5.6 and later introduced character encoding complexity that made programming harder. The confusion around character handling contributed to Perl's decline. By staying with the 5.005_03 specification, we maintain the simplicity that made Perl "rakuda" (camel) -> "raku" (easy/fun).

  • JPerl Compatibility - Preserves the last JPerl version

    Perl 5.005_03 was the final version of JPerl, which handled Japanese text naturally. Later versions abandoned this approach for Unicode, adding unnecessary complexity for many use cases.

  • Universal Compatibility - Runs on ANY Perl version

    Code written to the 5.005_03 specification runs on all Perl versions from 5.005_03 through 5.42 and beyond. This maximizes compatibility across two decades of Perl releases.

  • Production Systems - Real-world enterprise needs

    Many production systems, embedded environments, and enterprise deployments still run Perl 5.005, 5.6, or 5.8. This module provides modern query capabilities without requiring upgrades.

  • Philosophy - Programming should be enjoyable

    As readers of the "Camel Book" (Programming Perl) know, Perl was designed to make programming enjoyable. The 5.005_03 specification preserves this original vision.

The ina CPAN Philosophy:

All modules under the ina CPAN account (including mb, Jacode, UTF8-R2, mb-JSON, and this module) follow this principle: Write to the Perl 5.005_03 specification, test on all versions, maintain programming joy.

This is not nostalgia--it's a commitment to:

  • Simple, maintainable code

  • Maximum compatibility

  • The original Perl philosophy

  • Making programming "raku" (easy and fun)

Build System:

This module uses pmake.bat instead of traditional make, since Perl 5.005_03 on Microsoft Windows lacks make. All tests pass on Perl 5.005_03 through modern versions.

.NET LINQ Compatibility

This section documents where LTSV::LINQ's behaviour matches .NET LINQ exactly, where it intentionally differs, and where it cannot differ due to Perl's type system.

Exact matches with .NET LINQ:

  • Single - throws when sequence is empty or has more than one element

  • First, Last - throw when sequence is empty or no element matches

  • Aggregate(seed, func) and Aggregate(seed, func, result_selector) - matching 2- and 3-argument forms

  • GroupBy - groups are returned in insertion order (first-seen key order)

  • GroupJoin - every outer element appears even with zero inner matches

  • Join - inner join semantics; unmatched outer elements are dropped

  • Union / Intersect / Except - partially eager (second/inner sequence buffered up front), matching .NET's hash-join approach

  • Take, Skip, TakeWhile, SkipWhile - identical semantics

  • All / Any with early exit

Intentional differences from .NET LINQ:

  • SingleOrDefault

    .NET throws InvalidOperationException when the sequence contains more than one element. LTSV::LINQ returns undef instead. This makes it more natural in Perl code that checks return values with defined.

    If you require strict .NET behaviour (exception on multiple elements), use Single() inside an eval:

    my $val = eval { $query->Single() };
    # $val is undef and $@ is set if empty or multiple
  • DefaultIfEmpty(undef)

    .NET's DefaultIfEmpty can return a sequence containing null (the reference-type default). LTSV::LINQ cannot: the iterator protocol uses undef to signal end-of-sequence, so a default value of undef is indistinguishable from EOF and is silently lost.

    # .NET: seq.DefaultIfEmpty() produces one null element
    # Perl:
    LTSV::LINQ->From([])->DefaultIfEmpty(undef)->ToArray()  # () - empty!
    LTSV::LINQ->From([])->DefaultIfEmpty(0)->ToArray()      # (0) - works

    Use a sentinel value (0, '', {}) and handle it explicitly.

  • OrderBy smart comparison

    .NET's OrderBy is strongly typed: the key type determines the comparison. In Perl there is no static type, so LTSV::LINQ's OrderBy uses a heuristic: if both keys look like numbers, <=> is used; otherwise cmp. For explicit control, use OrderByStr (always cmp) or OrderByNum (always <=>).

  • EqualityComparer / IComparer

    .NET LINQ accepts IEqualityComparer and IComparer interface objects for custom equality and ordering. LTSV::LINQ uses code references (sub) that extract a key from each element. This is equivalent in power but different in calling convention: the sub receives one element and returns a key, rather than receiving two elements and returning a comparison result.

  • Concat on typed sequences

    .NET's Concat is type-checked. LTSV::LINQ accepts any two sequences regardless of element type.

  • No query expression syntax

    .NET's from x in ... where ... select ... syntax compiles to LINQ method calls. Perl has no equivalent; use method chaining directly.

Pure Perl Implementation

No XS Dependencies:

This module is implemented in Pure Perl with no XS (C extensions). Benefits:

  • Works on any Perl installation

  • No C compiler required

  • Easy installation in restricted environments

  • Consistent behavior across platforms

  • Simpler debugging and maintenance

Core Module Dependencies

None. This module uses only Perl core features available since 5.005.

No CPAN dependencies required.

DIAGNOSTICS

Error Messages

This module may throw the following exceptions:

From() requires ARRAY reference

Thrown by From() when the argument is not an array reference.

Example:

LTSV::LINQ->From("string");  # Dies
LTSV::LINQ->From([1, 2, 3]); # OK
SelectMany: selector must return an ARRAY reference

Thrown by SelectMany() when the selector function returns anything other than an ARRAY reference. Wrap the return value in [...]:

# Wrong - hashref causes die
->SelectMany(sub { {key => 'val'} })

# Correct - arrayref
->SelectMany(sub { [{key => 'val'}] })

# Correct - empty array for "no results" case
->SelectMany(sub { [] })
Sequence contains no elements

Thrown by First(), Last(), or Average() when called on an empty sequence.

Methods that throw this error:

  • First()

  • Last()

  • Average()

To avoid this error, use the OrDefault variants:

  • FirstOrDefault() - returns undef instead of dying

  • LastOrDefault() - returns undef instead of dying

  • AverageOrDefault() - returns undef instead of dying

Example:

my @empty = ();
LTSV::LINQ->From(\@empty)->First();          # Dies
LTSV::LINQ->From(\@empty)->FirstOrDefault(); # Returns undef
No element satisfies the condition

Thrown by First() or Last() with a predicate when no element matches.

Example:

my @data = (1, 2, 3);
LTSV::LINQ->From(\@data)->First(sub { $_[0] > 10 });          # Dies
LTSV::LINQ->From(\@data)->FirstOrDefault(sub { $_[0] > 10 }); # Returns undef
Cannot open 'filename': ...

File I/O error when FromLTSV() cannot open the specified file.

Common causes:

  • File does not exist

  • Insufficient permissions

  • Invalid path

Example:

LTSV::LINQ->FromLTSV("/nonexistent/file.ltsv"); # Dies with this error

Methods That May Throw Exceptions

From($array_ref)

Dies if argument is not an array reference.

FromLTSV($filename)

Dies if file cannot be opened.

Note: The file handle is held open until the iterator is fully consumed. Partially consumed queries keep their file handles open. See FromLTSV in "Data Source Methods" for details.

First([$predicate])

Dies if sequence is empty or no element matches predicate.

Safe alternative: FirstOrDefault()

Last([$predicate])

Dies if sequence is empty or no element matches predicate.

Safe alternative: LastOrDefault()

Average([$selector])

Dies if sequence is empty.

Safe alternative: AverageOrDefault()

Safe Alternatives

For methods that may throw exceptions, use the OrDefault variants:

First()   -> FirstOrDefault()   (returns undef)
Last()    -> LastOrDefault()    (returns undef)
Average() -> AverageOrDefault() (returns undef)

Example:

# Unsafe - may die
my $first = LTSV::LINQ->From(\@data)->First();

# Safe - returns undef if empty
my $first = LTSV::LINQ->From(\@data)->FirstOrDefault();
if (defined $first) {
    # Process $first
}

Exception Format and Stack Traces

All exceptions thrown by this module are plain strings produced by die "message". Because no trailing newline is appended, Perl automatically appends the source location:

Sequence contains no elements at lib/LTSV/LINQ.pm line 764.

This is intentional: the location helps when diagnosing unexpected failures during development.

When catching exceptions with eval, the full string including the location suffix is available in $@. Use a prefix match if you want to test only the message text:

eval { LTSV::LINQ->From([])->First() };
if ($@ =~ /^Sequence contains no elements/) {
    # handle empty sequence
}

If you prefer exceptions without the location suffix, wrap the call in a thin eval and re-die with a newline:

eval { $result = $query->First() };
die "$@\n" if $@;   # strip " at ... line N" from the message

FAQ

General Questions

Q: Why LINQ-style instead of SQL-style?

A: LINQ provides:

  • Method chaining (more Perl-like)

  • Type safety through code

  • No string parsing required

  • Composable queries

Q: Can I reuse a query object?

A: No. Query objects use iterators that can only be consumed once.

# Wrong - iterator consumed by first ToArray
my $query = LTSV::LINQ->FromLTSV("file.ltsv");
my @first = $query->ToArray();   # OK
my @second = $query->ToArray();  # Empty! Iterator exhausted

# Right - create new query for each use
my $query1 = LTSV::LINQ->FromLTSV("file.ltsv");
my @first = $query1->ToArray();

my $query2 = LTSV::LINQ->FromLTSV("file.ltsv");
my @second = $query2->ToArray();
Q: How do I do OR conditions in Where?

A: Use code reference form with ||:

# OR condition requires code reference
->Where(sub {
    $_[0]{status} == 200 || $_[0]{status} == 304
})

# DSL only supports AND
->Where(status => '200')  # Single condition only
Q: Why does my query seem to run multiple times?

A: Some operations require multiple passes:

# This reads the file TWICE
my $avg = $query->Average(...);    # Pass 1: Calculate
my @all = $query->ToArray();       # Pass 2: Collect (iterator reset!)

# Save result instead
my @all = $query->ToArray();
my $avg = LTSV::LINQ->From(\@all)->Average(...);

Performance Questions

Q: How can I process a huge file efficiently?

A: Use lazy operations and avoid materializing:

# Good - constant memory
LTSV::LINQ->FromLTSV("huge.log")
    ->Where(status => '500')
    ->ForEach(sub { print $_[0]{message}, "\n" });

# Bad - loads everything into memory
my @all = LTSV::LINQ->FromLTSV("huge.log")->ToArray();
Q: Why is OrderBy slow on large files?

A: OrderBy must load all elements into memory to sort them.

# Slow on 1GB file - loads everything
->OrderBy(sub { $_[0]{timestamp} })->Take(10)

# Faster - limit before sorting (if possible)
->Where(status => '500')->OrderBy(...)->Take(10)
Q: How do I process files larger than memory?

A: Use ForEach or streaming terminal operations:

# Process 100GB file with 1KB memory
my $error_count = 0;
LTSV::LINQ->FromLTSV("100gb.log")
    ->Where(sub { $_[0]{level} eq 'ERROR' })
    ->ForEach(sub { $error_count++ });

print "Errors: $error_count\n";

DSL Questions

Q: Can DSL do numeric comparisons?

A: No. DSL uses string equality (eq). Use code reference for numeric:

# DSL - string comparison
->Where(status => '200')  # $_[0]{status} eq '200'

# Code ref - numeric comparison
->Where(sub { $_[0]{status} == 200 })
->Where(sub { $_[0]{bytes} > 1000 })
Q: How do I do case-insensitive matching in DSL?

A: DSL doesn't support it. Use code reference:

# Case-insensitive requires code reference
->Where(sub { lc($_[0]{method}) eq 'get' })
Q: Can I use regular expressions in DSL?

A: No. Use code reference:

# Regex requires code reference
->Where(sub { $_[0]{url} =~ m{^/api/} })

Compatibility Questions

Q: Does this work on Perl 5.6?

A: Yes. Tested on Perl 5.005_03 through 5.40+.

Q: Do I need to install any CPAN modules?

A: No. Pure Perl with no dependencies beyond core.

Q: Can I use this on Windows?

A: Yes. Pure Perl works on all platforms.

Q: Why support such old Perl versions?

A: Many production systems cannot upgrade. This module provides modern query capabilities without requiring upgrades.

COOKBOOK

Common Patterns

Find top N by value
->OrderByDescending(sub { $_[0]{score} })
  ->Take(10)
  ->ToArray()
Group and count
->GroupBy(sub { $_[0]{category} })
  ->Select(sub {
      {
          Category => $_[0]{Key},
          Count => scalar(@{$_[0]{Elements}})
      }
  })
  ->ToArray()
Running total
my $total = 0;
->Select(sub {
    $total += $_[0]{amount};
    { %{$_[0]}, running_total => $total }
})
Pagination
# Page 3, size 20
->Skip(40)->Take(20)->ToArray()
Unique values
->Select(sub { $_[0]{category} })
  ->Distinct()
  ->ToArray()
Conditional aggregation

Note: A query object can only be consumed once. To compute multiple aggregations over the same source, materialise it first with ToArray().

my @all = LTSV::LINQ->FromLTSV("access.log")->ToArray();

my $success_avg = LTSV::LINQ->From(\@all)
    ->Where(status => '200')
    ->Average(sub { $_[0]{response_time} });

my $error_avg = LTSV::LINQ->From(\@all)
    ->Where(sub { $_[0]{status} >= 400 })
    ->Average(sub { $_[0]{response_time} });
Iterator consumption: when to snapshot with ToArray()

A query object wraps a single-pass iterator. Once consumed, it is exhausted and subsequent terminal operations return empty results or die.

# WRONG - $q is exhausted after the first Count()
my $q = LTSV::LINQ->FromLTSV("access.log")->Where(status => '200');
my $n     = $q->Count();          # OK
my $first = $q->First();          # WRONG: iterator already at EOF

# RIGHT - snapshot into array, then query as many times as needed
my @rows  = LTSV::LINQ->FromLTSV("access.log")->Where(status => '200')->ToArray();
my $n     = LTSV::LINQ->From(\@rows)->Count();
my $first = LTSV::LINQ->From(\@rows)->First();

The snapshot approach is also the correct pattern for any multi-pass computation such as computing both average and standard deviation, comparing the same sequence against two different filters, or iterating once to validate and once to transform.

Efficient large-file pattern

For files too large to fit in memory, keep the chain fully lazy by ensuring only one terminal operation is performed per pass:

# One pass - pick only what you need
my @slow = LTSV::LINQ->FromLTSV("access.log")
    ->Where(sub { $_[0]{response_time} > 1000 })
    ->OrderByNum(sub { $_[0]{response_time} })
    ->Take(20)
    ->ToArray();

# Never do two passes on the same FromLTSV object -
# open the file again for a second pass:
my $count = LTSV::LINQ->FromLTSV("access.log")->Count();
my $sum   = LTSV::LINQ->FromLTSV("access.log")
                ->Sum(sub { $_[0]{bytes} });

DESIGN PHILOSOPHY

Historical Compatibility: Perl 5.005_03

This module maintains compatibility with Perl 5.005_03 (released 1999-03-28), following the Universal Consensus 1998 for primetools.

Why maintain such old compatibility?

  • Long-term stability

    Code written in 1998-era Perl should still run in 2026 and beyond. This demonstrates Perl's commitment to backwards compatibility.

  • Embedded systems and traditional environments

    Some production systems, embedded devices, and enterprise environments cannot easily upgrade Perl. Maintaining compatibility ensures this module remains useful in those contexts.

  • Minimal dependencies

    By avoiding modern Perl features, this module has zero non-core dependencies. It works with only the Perl core that has existed since 1999.

Technical implications:

  • No our keyword - uses package variables

  • No warnings pragma - uses local $^W=1

  • No use strict 'subs' improvements from 5.6+

  • All features implemented with Perl 5.005-era constructs

The code comment # use 5.008001; # Lancaster Consensus 2013 for toolchains marks where modern code would typically start. We intentionally stay below this line.

US-ASCII Only Policy

All source code is strictly US-ASCII (bytes 0x00-0x7F). No UTF-8, no extended characters.

Rationale:

  • Universal portability

    US-ASCII works everywhere - ancient terminals, modern IDEs, web browsers, email systems. No encoding issues, ever.

  • No locale dependencies

    The code behaves identically regardless of system locale settings.

  • Clear separation of concerns

    Source code (ASCII) vs. data (any encoding). The module processes LTSV data in any encoding, but its own code remains pure ASCII.

This policy is verified by t/010_ascii_only.t.

The $VERSION = $VERSION Idiom

You may notice:

$VERSION = '1.05';
$VERSION = $VERSION;

This is intentional, not a typo. Under use strict, a variable used only once triggers a warning. The self-assignment ensures $VERSION appears twice, silencing the warning without requiring our (which doesn't exist in Perl 5.005).

This is a well-known idiom from the pre-our era.

Design Principles

  • Lazy evaluation by default

    Operations return query objects, not arrays. Data is processed on-demand when terminal operations (ToArray, Count, etc.) are called.

  • Method chaining

    All query operations return new query objects, enabling fluent syntax:

    $query->Where(...)->Select(...)->OrderBy(...)->ToArray()
  • No side effects

    Query operations never modify the source data. They create new lazy iterators.

  • Perl idioms, LINQ semantics

    We follow LINQ's method names and semantics, but use Perl idioms for implementation (closures for iterators, hash refs for records).

  • Zero dependencies

    This module has zero non-core dependencies. It works with only the Perl core that has existed since 1999. Even warnings.pm is optional (stubbed for Perl < 5.6). This ensures installation succeeds on minimal Perl installations, avoids dependency chain vulnerabilities, and provides permanence - the code will work decades into the future.

LIMITATIONS AND KNOWN ISSUES

Current Limitations

  • Iterator Consumption

    Query objects can only be consumed once. The iterator is exhausted after terminal operations.

    Workaround: Create new query object or save ToArray() result.

  • Undef Values in Sequences

    Due to iterator-based design, undef cannot be distinguished from end-of-sequence. Sequences containing undef values may not work correctly with all operations.

    This is not a practical limitation for LTSV data (which uses hash references), but affects operations on plain arrays containing undef.

    # Works fine (LTSV data - hash references)
    LTSV::LINQ->FromLTSV("file.ltsv")->Contains({status => '200'})
    
    # Limitation (plain array with undef)
    LTSV::LINQ->From([1, undef, 3])->Contains(undef)  # May not work
  • No Parallel Execution

    All operations execute sequentially in a single thread.

  • No Index Support

    All filtering requires full scan. No index optimization.

  • Distinct Uses String Keys

    Distinct with custom comparer uses stringified keys. May not work correctly for complex objects.

  • DefaultIfEmpty(undef) Cannot Be Distinguished from End-of-Sequence

    Because the iterator protocol uses undef to signal end-of-sequence, DefaultIfEmpty(undef) cannot reliably deliver its undef default to downstream operations.

    # Works correctly (non-undef default)
    LTSV::LINQ->From([])->DefaultIfEmpty(0)->ToArray()    # (0)
    LTSV::LINQ->From([])->DefaultIfEmpty({})->ToArray()   # ({})
    
    # Does NOT work (undef default is indistinguishable from EOF)
    LTSV::LINQ->From([])->DefaultIfEmpty(undef)->ToArray() # () - empty!

    Workaround: Use a sentinel value such as 0, '', or {} instead of undef, and treat it as "no element" after the fact.

Not Implemented

The following LINQ methods from the .NET standard library are intentionally not implemented in LTSV::LINQ. This section explains the design rationale for each omission.

Parallel LINQ (PLINQ) Methods

The following methods belong to Parallel LINQ (PLINQ), the .NET parallel-execution extension to LINQ introduced in .NET 4.0. They exist to distribute query execution across multiple CPU cores using the .NET Thread Pool and Task Parallel Library.

Perl does not have native shared-memory multithreading that maps onto this execution model. Perl threads (threads.pm) copy the interpreter state and communicate through shared variables, making them unsuitable for the fine-grained, automatic work-stealing parallelism that PLINQ provides. LTSV::LINQ's iterator-based design assumes a single sequential execution context; introducing PLINQ semantics would require a completely different architecture and would add heavy dependencies.

Furthermore, the primary use case for LTSV::LINQ -- parsing and querying LTSV log files -- is typically I/O-bound rather than CPU-bound. Parallelizing I/O over a single file provides little benefit and considerable complexity.

For these reasons, the entire PLINQ surface is omitted:

  • AsParallel

    Entry point for PLINQ. Converts an IEnumerable<T> into a ParallelQuery<T> that the .NET runtime executes in parallel using multiple threads. Not applicable: Perl lacks the runtime infrastructure.

  • AsSequential

    Converts a ParallelQuery<T> back to a sequential IEnumerable<T>, forcing subsequent operators to run on a single thread. Since AsParallel is not implemented, AsSequential has no counterpart to convert from.

  • AsOrdered

    Instructs PLINQ to preserve the source order in the output even during parallel execution. This is a hint to the PLINQ scheduler; it does not exist outside of PLINQ. Not applicable.

  • AsUnordered

    Instructs PLINQ that output order does not need to match source order, potentially allowing more efficient parallel execution. Not applicable.

  • ForAll

    PLINQ terminal operator that applies an action to each element in parallel, without collecting results. It is the parallel equivalent of ForEach. LTSV::LINQ provides ForEach for sequential iteration. A parallel ForAll is not applicable.

  • WithCancellation

    Attaches a .NET CancellationToken to a ParallelQuery<T>, allowing cooperative cancellation of a running parallel query. Cancellation tokens are a .NET threading primitive. Not applicable.

  • WithDegreeOfParallelism

    Sets the maximum number of concurrent tasks that PLINQ may use. A tuning knob for the PLINQ scheduler. Not applicable.

  • WithExecutionMode

    Controls whether PLINQ may choose sequential execution for efficiency (Default) or is forced to parallelize (ForceParallelism). Not applicable.

  • WithMergeOptions

    Controls how PLINQ merges results from parallel partitions back into the output stream (buffered, auto-buffered, or not-buffered). Not applicable.

.NET Type System Methods

The following methods are specific to .NET's static type system. They exist to work with .NET generics and interface hierarchies, which have no Perl equivalent.

  • Cast

    Casts each element of a non-generic IEnumerable to a specified type T, returning IEnumerable<T>. In .NET, Cast<T> is needed when working with legacy APIs that return IEnumerable (without a type parameter) and you need to treat the elements as a specific type.

    Perl is dynamically typed. Every Perl value already holds type information at runtime (scalar, reference, blessed object), and Perl does not have a concept of a "non-generic enumerable" that needs to be explicitly cast before it can be queried. There is no meaningful operation to implement.

  • OfType

    Filters elements of a non-generic IEnumerable, returning only those that can be successfully cast to a specified type T. Like Cast, it exists to bridge generic and non-generic .NET APIs.

    In LTSV::LINQ, all records from FromLTSV are hash references. Records from From are whatever the caller puts in the array. Perl's ref(), UNIVERSAL::isa(), or a Where predicate can perform any type-based filtering the caller needs. A dedicated OfType adds no expressiveness.

    # Perl equivalent of OfType for blessed objects of class "Foo":
    $query->Where(sub { ref($_[0]) && $_[0]->isa('Foo') })

64-bit and Large-Count Methods

  • LongCount

    Returns the number of elements as a 64-bit integer (Int64 in .NET). On 32-bit .NET platforms, a sequence can theoretically contain more than 2**31 - 1 (~2 billion) elements, which would overflow int; hence the need for LongCount.

    In Perl, integers are represented as native signed integers or floating- point doubles (NV). On 64-bit Perl (which is universal in practice today), the native integer type is 64 bits, so Count already handles any realistic sequence length. On 32-bit Perl, the floating-point NV provides 53 bits of integer precision (~9 quadrillion), far exceeding any in-memory sequence. There is no semantic gap between Count and LongCount in Perl.

IEnumerable Conversion Method

  • AsEnumerable

    In .NET, AsEnumerable<T> is used to force evaluation of a query as IEnumerable<T> rather than, for example, IQueryable<T> (which might be translated to SQL). It is a type-cast at the interface level, not a data transformation.

    LTSV::LINQ has only one query type: LTSV::LINQ. There is no IQueryable counterpart that would benefit from being downgraded to IEnumerable. The method has no meaningful semantics to implement.

BUGS

Please report any bugs or feature requests to:

  • Email: ina@cpan.org

SUPPORT

Documentation

Full documentation is available via:

perldoc LTSV::LINQ

CPAN

https://metacpan.org/pod/LTSV::LINQ

SEE ALSO

  • LTSV specification

    http://ltsv.org/

  • Microsoft LINQ documentation

    https://learn.microsoft.com/en-us/dotnet/csharp/linq/

AUTHOR

INABA Hitoshi <ina@cpan.org>

Contributors

Contributions are welcome! See file: CONTRIBUTING.

ACKNOWLEDGEMENTS

LINQ Technology

This module is inspired by LINQ (Language Integrated Query), which was developed by Microsoft Corporation for the .NET Framework.

LINQ(R) is a registered trademark of Microsoft Corporation.

We are grateful to Microsoft for pioneering the LINQ technology and making it a widely recognized programming pattern. The elegance and power of LINQ has influenced query interfaces across many programming languages, and this module brings that same capability to LTSV data processing in Perl.

This module is not affiliated with, endorsed by, or sponsored by Microsoft Corporation.

References

This module was inspired by:

COPYRIGHT AND LICENSE

Copyright (c) 2026 INABA Hitoshi

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

License Details

This module is released under the same license as Perl itself:

You may choose either license.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.