NAME
LTSV::LINQ - LINQ-style query interface for LTSV files
VERSION
Version 1.05
SYNOPSIS
use LTSV::LINQ;
# Read LTSV file and query
my @results = LTSV::LINQ->FromLTSV("access.log")
->Where(sub { $_[0]{status} eq '200' })
->Select(sub { $_[0]{url} })
->Distinct()
->ToArray();
# DSL syntax for simple filtering
my @errors = LTSV::LINQ->FromLTSV("access.log")
->Where(status => '404')
->ToArray();
# Grouping and aggregation
my @stats = LTSV::LINQ->FromLTSV("access.log")
->GroupBy(sub { $_[0]{status} })
->Select(sub {
my $g = shift;
return {
Status => $g->{Key},
Count => scalar(@{$g->{Elements}})
};
})
->OrderByDescending(sub { $_[0]{Count} })
->ToArray();
TABLE OF CONTENTS
"METHODS" - Complete method reference (60 methods)
"EXAMPLES" - 8 practical examples
"FEATURES" - Lazy evaluation, method chaining, DSL
"ARCHITECTURE" - Iterator design, execution flow
"PERFORMANCE" - Memory usage, optimization tips
"COMPATIBILITY" - Perl 5.005+ support, pure Perl
"DIAGNOSTICS" - Error messages
"FAQ" - Common questions and answers
"COOKBOOK" - Common patterns
"SEE ALSO" - Related resources
DESCRIPTION
LTSV::LINQ provides a LINQ-style query interface for LTSV (Labeled Tab-Separated Values) files. It offers a fluent, chainable API for filtering, transforming, and aggregating LTSV data.
Key features:
Lazy evaluation - O(1) memory usage for most operations
Method chaining - Fluent, readable query composition
DSL syntax - Simple key-value filtering
60 LINQ methods - Comprehensive query capabilities
Pure Perl - No XS dependencies
Perl 5.005_03+ - Works on ancient and modern Perl
What is LTSV?
LTSV (Labeled Tab-Separated Values) is a text format for structured logs and data records. Each line consists of tab-separated fields, where each field is a label:value pair. A single LTSV record occupies exactly one line.
Format example:
time:2026-02-13T10:00:00 host:192.0.2.1 status:200 url:/index.html bytes:1024
LTSV Characteristics
One record per line
A complete record is always a single newline-terminated line. This makes streaming processing trivial: read a line, parse it, process it, discard it. There is no multi-line quoting problem, no block parser required.
Tab as field delimiter
Fields are separated by a single horizontal tab character (
0x09). The tab is a C0 control character in the ASCII range (0x00-0x7F), which has an important consequence for multibyte character encodings.Colon as label-value separator
Within each field, the label and value are separated by a single colon (
0x3A, US-ASCII:). This is also a plain ASCII character with the same multibyte-safety guarantees as the tab.
LTSV Advantages
Multibyte-safe delimiters (Tab and Colon)
This is perhaps the most important technical advantage of LTSV over formats such as CSV (comma-delimited) or TSV without labels.
In many multibyte character encodings used across Asia and beyond, a single logical character is represented by a sequence of two or more bytes. The danger in older encodings is that a byte within a multibyte sequence can coincidentally equal the byte value of an ASCII delimiter, causing a naive byte-level parser to split the field in the wrong place.
The following table shows well-known encodings and their byte ranges:
Encoding First byte range Following byte range ---------- -------------------- ------------------------------- Big5 0x81-0xFE 0x40-0x7E, 0xA1-0xFE Big5-HKSCS 0x81-0xFE 0x40-0x7E, 0xA1-0xFE CP932X 0x81-0x9F, 0xE0-0xFC 0x40-0x7E, 0x80-0xFC EUC-JP 0x8E-0x8F, 0xA1-0xFE 0xA1-0xFE GB 18030 0x81-0xFE 0x30-0x39, 0x40-0x7E, 0x80-0xFE GBK 0x81-0xFE 0x40-0x7E, 0x80-0xFE Shift_JIS 0x81-0x9F, 0xE0-0xFC 0x40-0x7E, 0x80-0xFC RFC 2279 0xC2-0xF4 0x80-0xBF UHC 0x81-0xFE 0x41-0x5A, 0x61-0x7A, 0x81-0xFE UTF-8 0xC2-0xF4 0x80-0xBF WTF-8 0xC2-0xF4 0x80-0xBFThe tab character is
0x09. The colon is0x3A. Both values are strictly below0x40, the lower bound of any following byte in the encodings listed above. Neither0x09nor0x3Aappears anywhere as a first byte either. Therefore:TAB (0x09) never appears as a byte within any multibyte character in Big5, Big5-HKSCS, CP932X, EUC-JP, GB 18030, GBK, Shift_JIS, RFC 2279, UHC, UTF-8, or WTF-8. ':' (0x3A) never appears as a byte within any multibyte character in the same set of encodings.This means that LTSV files containing values in any of those encodings can be parsed correctly by a simple byte-level split on tab and colon, with no knowledge of the encoding whatsoever. There is no need to decode the text before parsing, and no risk of a misidentified delimiter.
By contrast, CSV has encoding problems of a different kind. The comma (
0x2C) and the double-quote (0x22) do not appear as following bytes in Shift_JIS or Big5, so they are not directly confused with multibyte character content. However, the backslash (0x5C) does appear as a valid following byte in both Shift_JIS (following byte range0x40-0x7Eincludes0x5C) and Big5 (same range). Many CSV parsers and the C runtime on Windows use backslash or backslash-like sequences for escaping, so a naive byte-level search for the escape character can be misled by a multibyte character whose second byte is0x5C. Beyond this, CSV's quoting rules are underspecified (RFC 4180 vs. Excel vs. custom dialects differ), which makes writing a correct, encoding-aware CSV parser considerably harder than parsing LTSV. LTSV sidesteps all of these issues by choosing delimiters (tab and colon) that fall below0x40, outside every following-byte range of every traditional multibyte encoding.UTF-8 is safe for all ASCII delimiters because continuation bytes are always in the range
0x80-0xBF, never overlapping ASCII. But LTSV's choice of tab and colon also makes it safe for the traditional multibyte encodings that predate Unicode, which is critical for systems that still operate on traditional-encoded data.Self-describing fields
Every field carries its own label. A record is human-readable without a separate schema or header line. Fields can appear in any order, and optional fields can simply be omitted. Adding a new field to some records does not break parsers that do not know about it.
Streaming-friendly
Because each record is one line, LTSV files can be processed with line-by-line streaming. Memory usage is proportional to the longest single record, not the total file size. This is why
FromLTSVin this module uses a lazy iterator rather than loading the whole file.Grep- and awk-friendly
Standard Unix text tools (
grep,awk,sed,sort,cut) work naturally on LTSV files. A field can be located with a pattern likestatus:5[0-9][0-9]without any special parser. This makes ad-hoc analysis and shell scripting straightforward.No quoting rules
CSV requires quoting fields that contain commas or newlines, and the quoting rules differ between implementations (RFC 4180 vs. Microsoft Excel vs. others). LTSV has no quoting: the tab delimiter and the colon separator do not appear inside values in any of the supported encodings (by the multibyte-safety argument above), so no escaping mechanism is needed.
Wide adoption in server logging
LTSV originated in the Japanese web industry as a structured log format for HTTP access logs. Many web servers (Apache, Nginx) and log aggregation tools support LTSV output or parsing. The format is particularly popular for application and infrastructure logging where grep-ability and streaming analysis matter.
For the formal LTSV specification, see http://ltsv.org/.
What is LINQ?
LINQ (Language Integrated Query) is a set of query capabilities introduced in the .NET Framework 3.5 (C# 3.0, 2007) by Microsoft. It defines a unified model for querying and transforming data from diverse sources -- in-memory collections, relational databases (LINQ to SQL), XML documents (LINQ to XML), and more -- using a single, consistent API.
This module brings LINQ-style querying to Perl, applied specifically to LTSV data sources.
LINQ Characteristics
Unified query model
LINQ provides a single set of operators that works uniformly across data sources. Whether the source is an array, a file, or a database, the same
Where,Select,OrderBy,GroupBymethods apply. LTSV::LINQ follows this principle: the same methods work on in-memory arrays (From) and LTSV files (FromLTSV) alike.Declarative style
LINQ queries express what to retrieve, not how to retrieve it. A query like
->Where(sub { $_[0]{status}= 400 })->Select(...)> describes the intent clearly, without explicit loop management. This reduces cognitive overhead and makes queries easier to read and verify.Composability
Each LINQ operator takes a sequence and returns a new sequence (or a scalar result for terminal operators). Because operators are ordinary method calls that return objects, they compose naturally:
$query->Where(...)->Select(...)->OrderBy(...)->GroupBy(...)->ToArray()Any intermediate result is itself a valid query object, ready for further transformation or immediate consumption.
Lazy evaluation (deferred execution)
Intermediate operators (
Where,Select,Take, etc.) do not execute immediately. They construct a chain of iterator closures. Evaluation is deferred until a terminal operator (ToArray,Count,First,Sum,ForEach, etc.) pulls items through the chain. This means:- - Memory usage is bounded by the window of data in flight, not by the total data size. A
Where->Select->Take(10)over a million-line file reads at most 10 records past the first matching one. - - Short-circuiting is free.
Firststops at the first match.Anystops as soon as one match is found. - - Pipelines can be built without executing them, and executed multiple times by wrapping in a factory (see
_from_snapshot).
- - Memory usage is bounded by the window of data in flight, not by the total data size. A
Method chaining (fluent interface)
LINQ's design makes chaining natural. In C# this is supported by extension methods; in Perl it is supported by returning
$self-class objects from every intermediate operator. The result is readable, left-to-right query expressions.Separation of query definition from execution
A LINQ query object is a description of a computation, not its result. You can pass query objects around, inspect them, extend them, and decide later when to execute them. This separation is valuable in library and framework code.
LINQ Advantages for LTSV Processing
Readable log analysis
LTSV log analysis often involves the same logical steps: filter records by a condition, extract a field, aggregate. LINQ methods map directly onto these steps, making the code read like a description of the analysis.
Memory-efficient processing of large log files
Web server access logs can be gigabytes in size. LTSV::LINQ's lazy
FromLTSViterator reads one line at a time. Combined withWhereandTake, only the needed records are ever in memory simultaneously.No new language syntax required
Unlike C# LINQ (which has query comprehension syntax
from x in xs where ... select ...), LTSV::LINQ works with ordinary Perl method calls and anonymous subroutines. There is no source filter, no parser extension, and no dependency on modern Perl features. The same code runs on Perl 5.005_03 and Perl 5.40.Composable, reusable query fragments
A
Whereclause stored in a variable can be applied to multiple data sources. Query logic can be parameterized and reused across scripts.
For the original LINQ documentation, see https://learn.microsoft.com/en-us/dotnet/csharp/linq/.
METHODS
Complete Method Reference
This module implements 60 LINQ-style methods organized into 15 categories:
Data Sources (5): From, FromLTSV, Range, Empty, Repeat
Filtering (1): Where (with DSL)
Projection (2): Select, SelectMany
Concatenation (2): Concat, Zip
Partitioning (4): Take, Skip, TakeWhile, SkipWhile
Ordering (13): OrderBy, OrderByDescending, OrderByStr, OrderByStrDescending, OrderByNum, OrderByNumDescending, Reverse, ThenBy, ThenByDescending, ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending
Grouping (1): GroupBy
Set Operations (4): Distinct, Union, Intersect, Except
Join (2): Join, GroupJoin
Quantifiers (3): All, Any, Contains
Comparison (1): SequenceEqual
Element Access (8): First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault
Aggregation (7): Count, Sum, Min, Max, Average, AverageOrDefault, Aggregate
Conversion (6): ToArray, ToList, ToDictionary, ToLookup, ToLTSV, DefaultIfEmpty
Utility (1): ForEach
Method Summary Table:
Method Category Lazy? Returns
===================== ============== ===== ================
From Data Source Yes Query
FromLTSV Data Source Yes Query
Range Data Source Yes Query
Empty Data Source Yes Query
Repeat Data Source Yes Query
Where Filtering Yes Query
Select Projection Yes Query
SelectMany Projection Yes Query
Concat Concatenation Yes Query
Zip Concatenation Yes Query
Take Partitioning Yes Query
Skip Partitioning Yes Query
TakeWhile Partitioning Yes Query
SkipWhile Partitioning Yes Query
OrderBy Ordering No* OrderedQuery
OrderByDescending Ordering No* OrderedQuery
OrderByStr Ordering No* OrderedQuery
OrderByStrDescending Ordering No* OrderedQuery
OrderByNum Ordering No* OrderedQuery
OrderByNumDescending Ordering No* OrderedQuery
Reverse Ordering No* Query
ThenBy Ordering No* OrderedQuery
ThenByDescending Ordering No* OrderedQuery
ThenByStr Ordering No* OrderedQuery
ThenByStrDescending Ordering No* OrderedQuery
ThenByNum Ordering No* OrderedQuery
ThenByNumDescending Ordering No* OrderedQuery
GroupBy Grouping No* Query
Distinct Set Operation Yes Query
Union Set Operation No* Query
Intersect Set Operation No* Query
Except Set Operation No* Query
Join Join No* Query
GroupJoin Join No* Query
All Quantifier No Boolean
Any Quantifier No Boolean
Contains Quantifier No Boolean
SequenceEqual Comparison No Boolean
First Element Access No Element
FirstOrDefault Element Access No Element
Last Element Access No* Element
LastOrDefault Element Access No* Element or undef
Single Element Access No* Element
SingleOrDefault Element Access No* Element or undef
ElementAt Element Access No* Element
ElementAtOrDefault Element Access No* Element or undef
Count Aggregation No Integer
Sum Aggregation No Number
Min Aggregation No Number
Max Aggregation No Number
Average Aggregation No Number
AverageOrDefault Aggregation No Number or undef
Aggregate Aggregation No Any
DefaultIfEmpty Conversion Yes Query
ToArray Conversion No Array
ToList Conversion No ArrayRef
ToDictionary Conversion No HashRef
ToLookup Conversion No HashRef
ToLTSV Conversion No Boolean
ForEach Utility No Void
* Materializing operation (loads all data into memory)
OrderedQuery = LTSV::LINQ::Ordered (subclass of LTSV::LINQ;
all LTSV::LINQ methods available plus ThenBy* methods)
Data Source Methods
- From(\@array)
-
Create a query from an array.
my $query = LTSV::LINQ->From([{name => 'Alice'}, {name => 'Bob'}]); - FromLTSV($filename)
-
Create a query from an LTSV file.
my $query = LTSV::LINQ->FromLTSV("access.log");File handle management:
FromLTSVopens the file immediately and holds the file handle open until the iterator reaches end-of-file. If the query is not fully consumed (e.g. you callFirstorTakeand stop early), the file handle remains open until the query object is garbage collected.This is harmless for a small number of files, but if you open many LTSV files concurrently without consuming them fully, you may exhaust the OS file descriptor limit. In such cases, consume the query fully or use
ToArray()to materialise the data and close the file immediately:# File closed as soon as all records are loaded my @records = LTSV::LINQ->FromLTSV("access.log")->ToArray(); - Range($start, $count)
-
Generate a sequence of integers.
my $query = LTSV::LINQ->Range(1, 10); # 1, 2, ..., 10 - Empty()
-
Create an empty sequence.
Returns: Empty LTSV::LINQ query
Examples:
my $empty = LTSV::LINQ->Empty(); $empty->Count(); # 0 # Conditional empty sequence my $result = $condition ? $query : LTSV::LINQ->Empty();Note: Equivalent to
From([])but more explicit. - Repeat($element, $count)
-
Repeat the same element a specified number of times.
Parameters:
$element- Element to repeat$count- Number of times to repeat
Returns: LTSV::LINQ query with repeated elements
Examples:
# Repeat scalar LTSV::LINQ->Repeat('x', 5)->ToArray(); # ('x', 'x', 'x', 'x', 'x') # Repeat reference (same reference repeated) my $item = {id => 1}; LTSV::LINQ->Repeat($item, 3)->ToArray(); # ($item, $item, $item) # Generate default values LTSV::LINQ->Repeat(0, 10)->ToArray(); # (0, 0, 0, ..., 0)Note: The element reference is repeated, not cloned.
Filtering Methods
- Where($predicate)
- Where(key => value, ...)
-
Filter elements. Accepts either a code reference or DSL form.
Code Reference Form:
->Where(sub { $_[0]{status} == 200 }) ->Where(sub { $_[0]{status} >= 400 && $_[0]{bytes} > 1000 })The code reference receives each element as
$_[0]and should return true to include the element, false to exclude it.DSL Form:
The DSL (Domain Specific Language) form provides a concise syntax for simple equality comparisons. All conditions are combined with AND logic.
# Single condition ->Where(status => '200') # Multiple conditions (AND) ->Where(status => '200', method => 'GET') # Equivalent to: ->Where(sub { $_[0]{status} eq '200' && $_[0]{method} eq 'GET' })DSL Specification:
Arguments must be an even number of
key => valuepairsThe DSL form interprets its arguments as a flat list of key-value pairs. Passing an odd number of arguments produces a Perl warning (
Odd number of elements in hash assignment) and the unpaired key receivesundefas its value, which will never match. Always use complete pairs:->Where(status => '200') # correct: 1 pair ->Where(status => '200', method => 'GET') # correct: 2 pairs ->Where(status => '200', 'method') # wrong: 3 args, Perl warningAll comparisons are string equality (
eq)All conditions are combined with AND
Undefined values are treated as failures
For numeric or OR logic, use code reference form
Examples:
# DSL: Simple and readable ->Where(status => '200') ->Where(user => 'alice', role => 'admin') # Code ref: Complex logic ->Where(sub { $_[0]{status} >= 400 && $_[0]{status} < 500 }) ->Where(sub { $_[0]{user} eq 'alice' || $_[0]{user} eq 'bob' })
Projection Methods
- Select($selector)
-
Transform each element using the provided selector function.
The selector receives each element as
$_[0]and should return the transformed value.Parameters:
$selector- Code reference that transforms each element
Returns: New query with transformed elements (lazy)
Examples:
# Extract single field ->Select(sub { $_[0]{url} }) # Transform to new structure ->Select(sub { { path => $_[0]{url}, code => $_[0]{status} } }) # Calculate derived values ->Select(sub { $_[0]{bytes} * 8 }) # bytes to bitsNote: Select preserves one-to-one mapping. For one-to-many, use SelectMany.
- SelectMany($selector)
-
Flatten nested sequences into a single sequence.
The selector should return an array reference. All arrays are flattened into a single sequence.
Parameters:
$selector- Code reference returning array reference
Returns: New query with flattened elements (lazy)
Examples:
# Flatten array of arrays my @nested = ([1, 2], [3, 4], [5]); LTSV::LINQ->From(\@nested) ->SelectMany(sub { $_[0] }) ->ToArray(); # (1, 2, 3, 4, 5) # Expand related records ->SelectMany(sub { my $user = shift; return [ map { { user => $user->{name}, role => $_ } } @{$user->{roles}} ]; })Use Cases:
Flattening nested arrays
Expanding one-to-many relationships
Generating multiple outputs per input
Important: The selector must return an ARRAY reference. If it returns any other value (e.g. a hashref or scalar), this method throws an exception:
die "SelectMany: selector must return an ARRAY reference"This matches the behaviour of .NET LINQ's
SelectMany, which requires the selector to return anIEnumerable. Always wrap results in[...]:->SelectMany(sub { [ $_[0]{items} ] }) # correct: arrayref ->SelectMany(sub { $_[0]{items} }) # wrong: dies at runtime
Concatenation Methods
- Concat($second)
-
Concatenate two sequences into one.
Parameters:
$second- Second sequence (LTSV::LINQ object)
Returns: New query with both sequences concatenated (lazy)
Examples:
# Combine two data sources my $q1 = LTSV::LINQ->From([1, 2, 3]); my $q2 = LTSV::LINQ->From([4, 5, 6]); $q1->Concat($q2)->ToArray(); # (1, 2, 3, 4, 5, 6) # Merge LTSV files LTSV::LINQ->FromLTSV("jan.log") ->Concat(LTSV::LINQ->FromLTSV("feb.log")) ->Where(status => '500')Note: This operation is lazy - sequences are read on-demand.
- Zip($second, $result_selector)
-
Combine two sequences element-wise using a result selector function.
Parameters:
$second- Second sequence (LTSV::LINQ object)$result_selector- Function to combine elements: ($first, $second) -> $result
Returns: New query with combined elements (lazy)
Examples:
# Combine numbers my $numbers = LTSV::LINQ->From([1, 2, 3]); my $letters = LTSV::LINQ->From(['a', 'b', 'c']); $numbers->Zip($letters, sub { my($num, $letter) = @_; return "$num-$letter"; })->ToArray(); # ('1-a', '2-b', '3-c') # Create key-value pairs my $keys = LTSV::LINQ->From(['name', 'age', 'city']); my $values = LTSV::LINQ->From(['Alice', 30, 'NYC']); $keys->Zip($values, sub { return {$_[0] => $_[1]}; })->ToArray(); # Stops at shorter sequence LTSV::LINQ->From([1, 2, 3, 4]) ->Zip(LTSV::LINQ->From(['a', 'b']), sub { [$_[0], $_[1]] }) ->ToArray(); # ([1, 'a'], [2, 'b'])Note: Iteration stops when either sequence ends.
Partitioning Methods
- Take($count)
-
Take the first N elements from the sequence.
Parameters:
$count- Number of elements to take (integer >= 0)
Returns: New query limited to first N elements (lazy)
Examples:
# Top 10 results ->OrderByDescending(sub { $_[0]{score} }) ->Take(10) # First record only ->Take(1)->ToArray() # Limit large file processing LTSV::LINQ->FromLTSV("huge.log")->Take(1000)Note: Take(0) returns empty sequence. Negative values treated as 0.
- Skip($count)
-
Skip the first N elements, return the rest.
Parameters:
$count- Number of elements to skip (integer >= 0)
Returns: New query skipping first N elements (lazy)
Examples:
# Skip header row ->Skip(1) # Pagination: page 3, size 20 ->Skip(40)->Take(20) # Skip first batch ->Skip(1000)->ForEach(sub { ... })Use Cases:
Pagination
Skipping header rows
Processing in batches
- TakeWhile($predicate)
-
Take elements while the predicate is true. Stops at first false.
Parameters:
$predicate- Code reference returning boolean
Returns: New query taking elements while predicate holds (lazy)
Examples:
# Take while value is small ->TakeWhile(sub { $_[0]{count} < 100 }) # Take while timestamp is in range ->TakeWhile(sub { $_[0]{time} lt '2026-02-01' }) # Process until error ->TakeWhile(sub { $_[0]{status} < 400 })Important: TakeWhile stops immediately when predicate returns false. It does NOT filter - it terminates the sequence.
# Different from Where: ->TakeWhile(sub { $_[0] < 5 }) # 1,2,3,4 then STOP ->Where(sub { $_[0] < 5 }) # 1,2,3,4 (checks all) - SkipWhile($predicate)
-
Skip elements while the predicate is true. Returns rest after first false.
Parameters:
$predicate- Code reference returning boolean
Returns: New query skipping initial elements (lazy)
Examples:
# Skip header lines ->SkipWhile(sub { $_[0]{line} =~ /^#/ }) # Skip while value is small ->SkipWhile(sub { $_[0]{count} < 100 }) # Process after certain timestamp ->SkipWhile(sub { $_[0]{time} lt '2026-02-01' })Important: SkipWhile only skips initial elements. Once predicate is false, all remaining elements are included.
[1,2,3,4,5,2,1]->SkipWhile(sub { $_[0] < 4 }) # (4,5,2,1)
Ordering Methods
Sort stability: OrderBy* and ThenBy* use a Schwartzian-Transform decorated-array technique that appends the original element index as a final tie-breaker. This guarantees completely stable multi-key sorting on every Perl version including 5.005_03, where built-in sort stability is not guaranteed.
Comparison type: LTSV::LINQ provides three families:
OrderBy/OrderByDescending/ThenBy/ThenByDescendingSmart comparison: numeric (
<=>) when both keys look numeric, string (cmp) otherwise. Convenient for LTSV data where field values are always strings but commonly hold numbers.OrderByStr/OrderByStrDescending/ThenByStr/ThenByStrDescendingUnconditional string comparison (
cmp). Use when keys must sort lexicographically regardless of content (e.g. version strings, codes).OrderByNum/OrderByNumDescending/ThenByNum/ThenByNumDescendingUnconditional numeric comparison (
<=>). Use when keys are always numeric. Undefined or empty values are treated as0.
IOrderedEnumerable: OrderBy* methods return a LTSV::LINQ::Ordered object (a subclass of LTSV::LINQ). This mirrors the way .NET LINQ's OrderBy returns IOrderedEnumerable<T>, which exposes ThenBy and ThenByDescending. All LTSV::LINQ methods (Where, Select, Take, etc.) are available on the returned object through inheritance. ThenBy* methods are only available on LTSV::LINQ::Ordered objects, not on plain LTSV::LINQ objects.
Non-destructive: ThenBy* always returns a new LTSV::LINQ::Ordered object; the original is unchanged. Branching sort chains work correctly:
my $by_dept = LTSV::LINQ->From(\@data)->OrderBy(sub { $_[0]{dept} });
my $asc = $by_dept->ThenBy(sub { $_[0]{name} });
my $desc = $by_dept->ThenByNum(sub { $_[0]{salary} });
# $asc and $desc are completely independent queries
- OrderBy($key_selector)
-
Sort in ascending order using smart comparison: if both keys look like numbers (integers, decimals, negative, or exponential notation), numeric comparison (
<=>) is used; otherwise string comparison (cmp) is used. Returns aLTSV::LINQ::Orderedobject.->OrderBy(sub { $_[0]{timestamp} }) # string keys: lexicographic ->OrderBy(sub { $_[0]{bytes} }) # "1024", "256" -> numeric (256, 1024)Note: When you need explicit control over the comparison type, use
OrderByStr(alwayscmp) orOrderByNum(always<=>). - OrderByDescending($key_selector)
-
Sort in descending order using the same smart comparison as
OrderBy. Returns aLTSV::LINQ::Orderedobject.->OrderByDescending(sub { $_[0]{count} }) - OrderByStr($key_selector)
-
Sort in ascending order using string comparison (
cmp) unconditionally. Returns aLTSV::LINQ::Orderedobject.->OrderByStr(sub { $_[0]{code} }) # "10" lt "9" (lexicographic) - OrderByStrDescending($key_selector)
-
Sort in descending order using string comparison (
cmp) unconditionally. Returns aLTSV::LINQ::Orderedobject.->OrderByStrDescending(sub { $_[0]{name} }) - OrderByNum($key_selector)
-
Sort in ascending order using numeric comparison (
<=>) unconditionally. Returns aLTSV::LINQ::Orderedobject.->OrderByNum(sub { $_[0]{bytes} }) # 9 < 10 (numeric)Note: Undefined or empty values are treated as
0. - OrderByNumDescending($key_selector)
-
Sort in descending order using numeric comparison (
<=>) unconditionally. Returns aLTSV::LINQ::Orderedobject.->OrderByNumDescending(sub { $_[0]{response_time} }) - Reverse()
-
Reverse the order.
->Reverse() - ThenBy($key_selector)
-
Add an ascending secondary sort key using smart comparison. Must be called on a
LTSV::LINQ::Orderedobject (i.e., afterOrderBy*). Returns a newLTSV::LINQ::Orderedobject; the original is unchanged.->OrderBy(sub { $_[0]{dept} })->ThenBy(sub { $_[0]{name} }) - ThenByDescending($key_selector)
-
Add a descending secondary sort key using smart comparison.
->OrderBy(sub { $_[0]{dept} })->ThenByDescending(sub { $_[0]{salary} }) - ThenByStr($key_selector)
-
Add an ascending secondary sort key using string comparison (
cmp).->OrderByStr(sub { $_[0]{dept} })->ThenByStr(sub { $_[0]{code} }) - ThenByStrDescending($key_selector)
-
Add a descending secondary sort key using string comparison (
cmp).->OrderByStr(sub { $_[0]{dept} })->ThenByStrDescending(sub { $_[0]{name} }) - ThenByNum($key_selector)
-
Add an ascending secondary sort key using numeric comparison (
<=>).->OrderByStr(sub { $_[0]{dept} })->ThenByNum(sub { $_[0]{salary} }) - ThenByNumDescending($key_selector)
-
Add a descending secondary sort key using numeric comparison (
<=>). Undefined or empty values are treated as0.->OrderByStr(sub { $_[0]{host} })->ThenByNumDescending(sub { $_[0]{bytes} })
Grouping Methods
- GroupBy($key_selector [, $element_selector])
-
Group elements by key.
Returns: New query where each element is a hashref with two fields:
Key- The group key (string)Elements- Array reference of elements in the group
Note: This operation is eager - the entire sequence is loaded into memory immediately. Groups are returned in the order their keys first appear in the source sequence, matching the behaviour of .NET LINQ's
GroupBy.Examples:
# Group access log by status code my @groups = LTSV::LINQ->FromLTSV('access.log') ->GroupBy(sub { $_[0]{status} }) ->ToArray(); for my $g (@groups) { printf "status=%s count=%d\n", $g->{Key}, scalar @{$g->{Elements}}; } # With element selector ->GroupBy(sub { $_[0]{status} }, sub { $_[0]{path} })Note:
Elementsis a plain array reference, not a LTSV::LINQ object. To apply further LINQ operations on a group, wrap it withFrom:for my $g (@groups) { my $total = LTSV::LINQ->From($g->{Elements}) ->Sum(sub { $_[0]{bytes} }); printf "status=%s total_bytes=%d\n", $g->{Key}, $total; }
Set Operations
Evaluation model:
Distinctis fully lazy: elements are tested one by one as the output sequence is consumed.Union,Intersect,Exceptare partially eager: when the method is called, the second sequence is consumed in full and stored in an in-memory hash for O(1) lookup. The first sequence is then iterated lazily. This matches the behaviour of .NET LINQ, which also buffers the second (hash-side) sequence up front.
- Distinct([$key_selector])
-
Remove duplicate elements.
Parameters:
$key_selector- (Optional) Code ref:($element) -> $key. Extracts a comparison key from each element. This is a single-argument function (unlike Perl'ssortcomparator), and is not a two-argument comparison function.
->Distinct() ->Distinct(sub { lc($_[0]) }) # case-insensitive strings ->Distinct(sub { $_[0]{id} }) # hashref: dedupe by field - Union($second [, $key_selector])
-
Produce set union of two sequences (no duplicates).
Parameters:
$second- Second sequence (LTSV::LINQ object)$key_selector- (Optional) Code ref:($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).
Returns: New query with elements from both sequences (distinct)
Evaluation: Partially eager. The first sequence is iterated lazily; the second is fully consumed at call time and stored in memory.
Examples:
# Simple union my $q1 = LTSV::LINQ->From([1, 2, 3]); my $q2 = LTSV::LINQ->From([3, 4, 5]); $q1->Union($q2)->ToArray(); # (1, 2, 3, 4, 5) # Case-insensitive union ->Union($other, sub { lc($_[0]) })Note: Equivalent to Concat()->Distinct(). Automatically removes duplicates.
- Intersect($second [, $key_selector])
-
Produce set intersection of two sequences.
Parameters:
$second- Second sequence (LTSV::LINQ object)$key_selector- (Optional) Code ref:($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).
Returns: New query with common elements only (distinct)
Evaluation: Partially eager. The second sequence is fully consumed at call time and stored in a hash; the first is iterated lazily.
Examples:
# Common elements LTSV::LINQ->From([1, 2, 3]) ->Intersect(LTSV::LINQ->From([2, 3, 4])) ->ToArray(); # (2, 3) # Find users in both lists $users1->Intersect($users2, sub { $_[0]{id} })Note: Only includes elements present in both sequences.
- Except($second [, $key_selector])
-
Produce set difference (elements in first but not in second).
Parameters:
$second- Second sequence (LTSV::LINQ object)$key_selector- (Optional) Code ref:($element) -> $key. Single-argument key extraction function (not a two-argument sort comparator).
Returns: New query with elements only in first sequence (distinct)
Evaluation: Partially eager. The second sequence is fully consumed at call time and stored in a hash; the first is iterated lazily.
Examples:
# Set difference LTSV::LINQ->From([1, 2, 3]) ->Except(LTSV::LINQ->From([2, 3, 4])) ->ToArray(); # (1) # Find users in first list but not second $all_users->Except($inactive_users, sub { $_[0]{id} })Note: Returns elements from first sequence not present in second.
Join Operations
Evaluation model: Both Join and GroupJoin are partially eager: when the method is called, the inner sequence is consumed in full and stored in an in-memory lookup table (hash of arrays, keyed by inner key). The outer sequence is then iterated lazily, producing results on demand.
This matches the behaviour of .NET LINQ's hash-join implementation. The memory cost is O(inner size); for very large inner sequences, consider reversing the join or pre-filtering the inner sequence before passing it.
- Join($inner, $outer_key_selector, $inner_key_selector, $result_selector)
-
Correlate elements of two sequences based on matching keys (inner join).
Parameters:
$inner- Inner sequence (LTSV::LINQ object)$outer_key_selector- Function to extract key from outer element$inner_key_selector- Function to extract key from inner element$result_selector- Function to create result: ($outer_item, $inner_item) -> $result
Returns: Query with joined results
Examples:
# Join users with their orders my $users = LTSV::LINQ->From([ {id => 1, name => 'Alice'}, {id => 2, name => 'Bob'} ]); my $orders = LTSV::LINQ->From([ {user_id => 1, product => 'Book'}, {user_id => 1, product => 'Pen'}, {user_id => 2, product => 'Notebook'} ]); $users->Join( $orders, sub { $_[0]{id} }, # outer key sub { $_[0]{user_id} }, # inner key sub { my($user, $order) = @_; return { name => $user->{name}, product => $order->{product} }; } )->ToArray(); # [{name => 'Alice', product => 'Book'}, # {name => 'Alice', product => 'Pen'}, # {name => 'Bob', product => 'Notebook'}] # Join LTSV files by request ID LTSV::LINQ->FromLTSV('access.log')->Join( LTSV::LINQ->FromLTSV('error.log'), sub { $_[0]{request_id} }, sub { $_[0]{request_id} }, sub { my($access, $error) = @_; return { url => $access->{url}, error => $error->{message} }; } )Note: This is an inner join - only matching elements are returned. The inner sequence is fully loaded into memory.
- GroupJoin($inner, $outer_key_selector, $inner_key_selector, $result_selector)
-
Correlates elements of two sequences with group join (LEFT OUTER JOIN-like). Each outer element is matched with a group of inner elements (possibly empty).
Parameters:
$inner- Inner sequence (LTSV::LINQ object)$outer_key_selector- Function to extract key from outer element$inner_key_selector- Function to extract key from inner element$result_selector- Function: ($outer_item, $inner_group) -> $result. The$inner_groupis a LTSV::LINQ object containing matched inner elements (empty sequence if no matches).
Returns: New query with one result per outer element (lazy)
Examples:
# Order count per user (including users with no orders) my $users = LTSV::LINQ->From([ {id => 1, name => 'Alice'}, {id => 2, name => 'Bob'}, {id => 3, name => 'Carol'} ]); my $orders = LTSV::LINQ->From([ {user_id => 1, product => 'Book', amount => 10}, {user_id => 1, product => 'Pen', amount => 5}, {user_id => 2, product => 'Notebook', amount => 15} ]); $users->GroupJoin( $orders, sub { $_[0]{id} }, sub { $_[0]{user_id} }, sub { my($user, $orders) = @_; return { name => $user->{name}, count => $orders->Count(), total => $orders->Sum(sub { $_[0]{amount} }) }; } )->ToArray(); # [ # {name => 'Alice', count => 2, total => 15}, # {name => 'Bob', count => 1, total => 15}, # {name => 'Carol', count => 0, total => 0}, # no orders # ] # Flat list with no-match rows included (LEFT OUTER JOIN, cf. Join for inner join) $users->GroupJoin( $orders, sub { $_[0]{id} }, sub { $_[0]{user_id} }, sub { my($user, $user_orders) = @_; my @rows = $user_orders->ToArray(); return @rows ? [ map { {name => $user->{name}, product => $_->{product}} } @rows ] : [ {name => $user->{name}, product => 'none'} ]; } )->SelectMany(sub { $_[0] }) # Flatten the array references ->ToArray();Note: Unlike Join, every outer element appears in the result even when there are no matching inner elements (LEFT OUTER JOIN semantics). The inner sequence is fully loaded into memory.
Important: The
$inner_groupLTSV::LINQ object is highly flexible. It is specifically designed to be iterated multiple times within the result selector (e.g., callingCount()followed bySum()) because it generates a fresh iterator for every terminal operation.
Quantifier Methods
- All($predicate)
-
Test if all elements satisfy condition.
->All(sub { $_[0]{status} == 200 }) - Any([$predicate])
-
Test if any element satisfies condition.
->Any(sub { $_[0]{status} >= 400 }) ->Any() # Test if sequence is non-empty - Contains($value [, $comparer])
-
Check if sequence contains specified element.
Parameters:
$value- Value to search for$comparer- (Optional) Custom comparison function
Returns: Boolean (1 or 0)
Examples:
# Simple search ->Contains(5) # 1 if found, 0 otherwise # Case-insensitive search ->Contains('foo', sub { lc($_[0]) eq lc($_[1]) }) # Check for undef ->Contains(undef) - SequenceEqual($second [, $comparer])
-
Determine if two sequences are equal (same elements in same order).
Parameters:
$second- Second sequence (LTSV::LINQ object)$comparer- (Optional) Comparison function ($a, $b) -> boolean
Returns: Boolean (1 if equal, 0 otherwise)
Examples:
# Same sequences LTSV::LINQ->From([1, 2, 3]) ->SequenceEqual(LTSV::LINQ->From([1, 2, 3])) # 1 (true) # Different elements LTSV::LINQ->From([1, 2, 3]) ->SequenceEqual(LTSV::LINQ->From([1, 2, 4])) # 0 (false) # Different lengths LTSV::LINQ->From([1, 2]) ->SequenceEqual(LTSV::LINQ->From([1, 2, 3])) # 0 (false) # Case-insensitive comparison $seq1->SequenceEqual($seq2, sub { lc($_[0]) eq lc($_[1]) })Note: Order matters. Both content AND order must match.
Element Access Methods
- First([$predicate])
-
Get first element. Dies if empty.
->First() ->First(sub { $_[0]{status} == 404 }) - FirstOrDefault([$predicate,] $default)
-
Get first element or default value.
->FirstOrDefault(undef, {}) - Last([$predicate])
-
Get last element. Dies if empty.
->Last() - LastOrDefault([$predicate,] $default)
-
Get last element or default value. Never throws exceptions.
Parameters:
$predicate- (Optional) Condition$default- (Optional) Value to return when no element is found. Defaults toundefwhen omitted.
Returns: Last element or
$defaultExamples:
# Get last element (undef if empty) ->LastOrDefault() # Specify a default value LTSV::LINQ->From([])->LastOrDefault(undef, 0) # 0 # With predicate and default ->LastOrDefault(sub { $_[0] % 2 == 0 }, -1) # Last even, or -1 - Single([$predicate])
-
Get the only element. Dies if sequence has zero or more than one element.
Parameters:
$predicate- (Optional) Condition
Returns: Single element
Exceptions: - Dies with "Sequence contains no elements" if empty - Dies with "Sequence contains more than one element" if multiple elements
.NET LINQ Compatibility: Exception messages match .NET LINQ behavior exactly.
Performance: Uses lazy evaluation. Stops iterating immediately when second element is found (does not load entire sequence).
Examples:
# Exactly one element LTSV::LINQ->From([5])->Single() # 5 # With predicate ->Single(sub { $_[0] > 10 }) # Memory-efficient: stops at 2nd element LTSV::LINQ->FromLTSV("huge.log")->Single(sub { $_[0]{id} eq '999' }) - SingleOrDefault([$predicate])
-
Get the only element, or undef if zero or multiple elements.
Returns: Single element or undef (if 0 or 2+ elements)
.NET LINQ Compatibility: Note: .NET's
SingleOrDefaultthrowsInvalidOperationExceptionwhen the sequence contains more than one element. LTSV::LINQ returnsundefin that case instead of throwing, which makes it more convenient for Perl code that checks return values. If you require the strict .NET behaviour (exception on multiple elements), useSingle()wrapped ineval.Performance: Uses lazy evaluation. Memory-efficient.
Examples:
LTSV::LINQ->From([5])->SingleOrDefault() # 5 LTSV::LINQ->From([])->SingleOrDefault() # undef (empty) LTSV::LINQ->From([1,2])->SingleOrDefault() # undef (multiple) - ElementAt($index)
-
Get element at specified index. Dies if out of range.
Parameters:
$index- Zero-based index
Returns: Element at index
Exceptions: Dies if index is negative or out of range
Performance: Uses lazy evaluation (iterator-based). Does NOT load entire sequence into memory. Stops iterating once target index is reached.
Examples:
->ElementAt(0) # First element ->ElementAt(2) # Third element # Memory-efficient for large files LTSV::LINQ->FromLTSV("huge.log")->ElementAt(10) # Reads only 11 lines - ElementAtOrDefault($index)
-
Get element at index, or undef if out of range.
Returns: Element or undef
Performance: Uses lazy evaluation (iterator-based). Memory-efficient.
Examples:
->ElementAtOrDefault(0) # First element ->ElementAtOrDefault(99) # undef if out of range
Aggregation Methods
All aggregation methods are terminal operations - they consume the entire sequence and return a scalar value.
- Count([$predicate])
-
Count the number of elements.
Parameters:
$predicate- (Optional) Code reference to filter elements
Returns: Integer count
Examples:
# Count all ->Count() # 1000 # Count with condition ->Count(sub { $_[0]{status} >= 400 }) # 42 # Equivalent to ->Where(sub { $_[0]{status} >= 400 })->Count()Performance: O(n) - must iterate entire sequence
- Sum([$selector])
-
Calculate sum of numeric values.
Parameters:
$selector- (Optional) Code reference to extract value. Default: identity function
Returns: Numeric sum
Examples:
# Sum of values LTSV::LINQ->From([1, 2, 3, 4, 5])->Sum() # 15 # Sum of field ->Sum(sub { $_[0]{bytes} }) # Sum with transformation ->Sum(sub { $_[0]{price} * $_[0]{quantity} })Note: Non-numeric values may produce warnings. Use numeric context.
Empty sequence: Returns
0. - Min([$selector])
-
Find minimum value.
Parameters:
$selector- (Optional) Code reference to extract value
Returns: Minimum value, or
undefif sequence is empty.Examples:
# Minimum of values ->Min() # Minimum of field ->Min(sub { $_[0]{response_time} }) # Oldest timestamp ->Min(sub { $_[0]{timestamp} }) - Max([$selector])
-
Find maximum value.
Parameters:
$selector- (Optional) Code reference to extract value
Returns: Maximum value, or
undefif sequence is empty.Examples:
# Maximum of values ->Max() # Maximum of field ->Max(sub { $_[0]{bytes} }) # Latest timestamp ->Max(sub { $_[0]{timestamp} }) - Average([$selector])
-
Calculate arithmetic mean.
Parameters:
$selector- (Optional) Code reference to extract value
Returns: Numeric average (floating point)
Examples:
# Average of values LTSV::LINQ->From([1, 2, 3, 4, 5])->Average() # 3 # Average of field ->Average(sub { $_[0]{bytes} }) # Average response time ->Average(sub { $_[0]{response_time} })Empty sequence: Dies with "Sequence contains no elements". Unlike
Sum(returns 0) andMin/Max(returnundef),Averagethrows on an empty sequence. UseAverageOrDefaultto avoid the exception.Note: Returns floating point. Use
int()for integer result. - AverageOrDefault([$selector])
-
Calculate arithmetic mean, or return undef if sequence is empty.
Parameters:
$selector- (Optional) Code reference to extract value
Returns: Numeric average (floating point), or undef if empty
Examples:
# Safe average - returns undef for empty sequence my @empty = (); my $avg = LTSV::LINQ->From(\@empty)->AverageOrDefault(); # undef # With data LTSV::LINQ->From([1, 2, 3])->AverageOrDefault(); # 2 # With selector ->AverageOrDefault(sub { $_[0]{value} })Note: Unlike Average(), this method never throws an exception.
- Aggregate([$seed,] $func [, $result_selector])
-
Apply an accumulator function over a sequence.
Signatures:
Aggregate($func)- Use first element as seedAggregate($seed, $func)- Explicit seed valueAggregate($seed, $func, $result_selector)- Transform result
Parameters:
$seed- Initial accumulator value (optional for first signature)$func- Code reference: ($accumulator, $element) -> $new_accumulator$result_selector- (Optional) Transform final result
Returns: Accumulated value
Examples:
# Sum (without seed) LTSV::LINQ->From([1,2,3,4])->Aggregate(sub { $_[0] + $_[1] }) # 10 # Product (with seed) LTSV::LINQ->From([2,3,4])->Aggregate(1, sub { $_[0] * $_[1] }) # 24 # Concatenate strings LTSV::LINQ->From(['a','b','c']) ->Aggregate('', sub { $_[0] ? "$_[0],$_[1]" : $_[1] }) # 'a,b,c' # With result selector LTSV::LINQ->From([1,2,3]) ->Aggregate(0, sub { $_[0] + $_[1] }, # accumulate sub { "Sum: $_[0]" }) # transform result # "Sum: 6" # Build complex structure ->Aggregate([], sub { my($list, $item) = @_; push @$list, uc($item); return $list; }).NET LINQ Compatibility: Supports all three .NET signatures.
Conversion Methods
- ToArray()
-
Convert to array.
my @array = $query->ToArray(); - ToList()
-
Convert to array reference.
my $arrayref = $query->ToList(); - ToDictionary($key_selector [, $value_selector])
-
Convert sequence to hash reference with unique keys.
Parameters:
$key_selector- Function to extract key from element$value_selector- (Optional) Function to extract value, defaults to element itself
Returns: Hash reference
Examples:
# ID to name mapping my $users = LTSV::LINQ->From([ {id => 1, name => 'Alice'}, {id => 2, name => 'Bob'} ]); my $dict = $users->ToDictionary( sub { $_[0]{id} }, sub { $_[0]{name} } ); # {1 => 'Alice', 2 => 'Bob'} # Without value selector (stores entire element) my $dict = $users->ToDictionary(sub { $_[0]{id} }); # {1 => {id => 1, name => 'Alice'}, 2 => {id => 2, name => 'Bob'}} # Quick lookup table my $status_codes = LTSV::LINQ->FromLTSV('access.log') ->Select(sub { $_[0]{status} }) ->Distinct() ->ToDictionary(sub { $_ }, sub { 1 });Note: If duplicate keys exist, later values overwrite earlier ones.
.NET LINQ Compatibility: .NET's
ToDictionarythrowsArgumentExceptionon duplicate keys. This module silently overwrites with the later value, following Perl hash semantics. UseToLookupif you need to preserve all values for each key. - ToLookup($key_selector [, $value_selector])
-
Convert sequence to hash reference with grouped values (multi-value dictionary).
Parameters:
$key_selector- Function to extract key from element$value_selector- (Optional) Function to extract value, defaults to element itself
Returns: Hash reference where values are array references
Examples:
# Group orders by user ID my $orders = LTSV::LINQ->From([ {user_id => 1, product => 'Book'}, {user_id => 1, product => 'Pen'}, {user_id => 2, product => 'Notebook'} ]); my $lookup = $orders->ToLookup( sub { $_[0]{user_id} }, sub { $_[0]{product} } ); # { # 1 => ['Book', 'Pen'], # 2 => ['Notebook'] # } # Group LTSV by status code my $by_status = LTSV::LINQ->FromLTSV('access.log') ->ToLookup(sub { $_[0]{status} }); # { # '200' => [{...}, {...}, ...], # '404' => [{...}, ...], # '500' => [{...}] # }Note: Unlike ToDictionary, this preserves all values for each key.
- DefaultIfEmpty([$default_value])
-
Return default value if sequence is empty, otherwise return the sequence.
Parameters:
$default_value- (Optional) Default value, defaults to undef
Returns: New query with default value if empty (lazy)
Examples:
# Return 0 if empty ->DefaultIfEmpty(0)->ToArray() # (0) if empty, or original data # With undef default ->DefaultIfEmpty()->First() # undef if empty # Useful for left joins ->Where(condition)->DefaultIfEmpty({id => 0, name => 'None'})Note: This is useful for ensuring a sequence always has at least one element.
- ToLTSV($filename)
-
Write to LTSV file.
$query->ToLTSV("output.ltsv");
Utility Methods
EXAMPLES
Basic Filtering
use LTSV::LINQ;
# DSL syntax
my @successful = LTSV::LINQ->FromLTSV("access.log")
->Where(status => '200')
->ToArray();
# Code reference
my @errors = LTSV::LINQ->FromLTSV("access.log")
->Where(sub { $_[0]{status} >= 400 })
->ToArray();
Aggregation
# Count errors
my $error_count = LTSV::LINQ->FromLTSV("access.log")
->Where(sub { $_[0]{status} >= 400 })
->Count();
# Average bytes for successful requests
my $avg_bytes = LTSV::LINQ->FromLTSV("access.log")
->Where(status => '200')
->Average(sub { $_[0]{bytes} });
print "Average bytes: $avg_bytes\n";
Grouping and Ordering
# Top 10 URLs by request count
my @top_urls = LTSV::LINQ->FromLTSV("access.log")
->Where(sub { $_[0]{status} eq '200' })
->GroupBy(sub { $_[0]{url} })
->Select(sub {
my $g = shift;
return {
URL => $g->{Key},
Count => scalar(@{$g->{Elements}}),
TotalBytes => LTSV::LINQ->From($g->{Elements})
->Sum(sub { $_[0]{bytes} })
};
})
->OrderByDescending(sub { $_[0]{Count} })
->Take(10)
->ToArray();
for my $stat (@top_urls) {
printf "%5d requests - %s (%d bytes)\n",
$stat->{Count}, $stat->{URL}, $stat->{TotalBytes};
}
Complex Query Chain
# Multi-step analysis
my @result = LTSV::LINQ->FromLTSV("access.log")
->Where(status => '200') # Filter successful
->Select(sub { $_[0]{bytes} }) # Extract bytes
->Where(sub { $_[0] > 1000 }) # Large responses only
->OrderByDescending(sub { $_[0] }) # Sort descending
->Take(100) # Top 100
->ToArray();
print "Largest 100 successful responses:\n";
print " ", join(", ", @result), "\n";
Lazy Processing of Large Files
# Process huge file with constant memory
LTSV::LINQ->FromLTSV("huge.log")
->Where(sub { $_[0]{level} eq 'ERROR' })
->ForEach(sub {
my $rec = shift;
print "ERROR at $rec->{time}: $rec->{message}\n";
});
Quantifiers
# Check if all requests are successful
my $all_ok = LTSV::LINQ->FromLTSV("access.log")
->All(sub { $_[0]{status} < 400 });
print $all_ok ? "All OK\n" : "Some errors\n";
# Check if any errors exist
my $has_errors = LTSV::LINQ->FromLTSV("access.log")
->Any(sub { $_[0]{status} >= 500 });
print "Server errors detected\n" if $has_errors;
Data Transformation
# Read LTSV, transform, write back
LTSV::LINQ->FromLTSV("input.ltsv")
->Select(sub {
my $rec = shift;
return {
%$rec,
processed => 1,
timestamp => time(),
};
})
->ToLTSV("output.ltsv");
Working with Arrays
# Query in-memory data
my @data = (
{name => 'Alice', age => 30, city => 'Tokyo'},
{name => 'Bob', age => 25, city => 'Osaka'},
{name => 'Carol', age => 35, city => 'Tokyo'},
);
my @tokyo_residents = LTSV::LINQ->From(\@data)
->Where(city => 'Tokyo')
->OrderBy(sub { $_[0]{age} })
->ToArray();
FEATURES
Lazy Evaluation
All query operations use lazy evaluation via iterators. Data is processed on-demand, not all at once.
# Only reads 10 records from file
my @top10 = LTSV::LINQ->FromLTSV("huge.log")
->Take(10)
->ToArray();
Method Chaining
All methods (except terminal operations like ToArray) return a new query object, enabling fluent method chaining.
->Where(...)->Select(...)->OrderBy(...)->Take(10)
DSL Syntax
Simple key-value filtering without code references.
# Readable and concise
->Where(status => '200', method => 'GET')
# Instead of
->Where(sub { $_[0]{status} eq '200' && $_[0]{method} eq 'GET' })
ARCHITECTURE
Iterator-Based Design
LTSV::LINQ uses an iterator-based architecture for lazy evaluation.
Core Concept:
Each query operation returns a new query object wrapping an iterator (a code reference that produces one element per call).
my $iter = sub {
# Read next element
# Apply transformation
# Return element or undef
};
my $query = LTSV::LINQ->new($iter);
Benefits:
Memory Efficiency - O(1) memory for most operations
Lazy Evaluation - Elements computed on-demand
Composability - Iterators chain naturally
Early Termination - Stop processing when done
Method Categories
The table below shows, for every method, whether it is lazy or eager, and what it returns. Knowing this prevents surprises about memory use and iterator consumption.
Method Category Evaluation Returns
------ -------- ---------- -------
From Source Lazy (factory) Query
FromLTSV Source Lazy (factory) Query
Range Source Lazy Query
Empty Source Lazy Query
Repeat Source Lazy Query
Where Filter Lazy Query
Select Projection Lazy Query
SelectMany Projection Lazy Query
Concat Concatenation Lazy Query
Zip Concatenation Lazy Query
Take Partitioning Lazy Query
Skip Partitioning Lazy Query
TakeWhile Partitioning Lazy Query
SkipWhile Partitioning Lazy Query
Distinct Set Operation Lazy (1st seq) Query
DefaultIfEmpty Conversion Lazy Query
OrderBy Ordering Eager (full) Query
OrderByDescending Ordering Eager (full) Query
OrderByStr Ordering Eager (full) Query
OrderByStrDescending Ordering Eager (full) Query
OrderByNum Ordering Eager (full) Query
OrderByNumDescending Ordering Eager (full) Query
Reverse Ordering Eager (full) Query
GroupBy Grouping Eager (full) Query
Union Set Operation Eager (2nd seq) Query
Intersect Set Operation Eager (2nd seq) Query
Except Set Operation Eager (2nd seq) Query
Join Join Eager (inner seq) Query
GroupJoin Join Eager (inner seq) Query
All Quantifier Lazy (early exit) Boolean
Any Quantifier Lazy (early exit) Boolean
Contains Quantifier Lazy (early exit) Boolean
SequenceEqual Comparison Lazy (early exit) Boolean
First Element Access Lazy (early exit) Element
FirstOrDefault Element Access Lazy (early exit) Element
Last Element Access Eager (full) Element
LastOrDefault Element Access Eager (full) Element
Single Element Access Lazy (stops at 2) Element
SingleOrDefault Element Access Lazy (stops at 2) Element
ElementAt Element Access Lazy (early exit) Element
ElementAtOrDefault Element Access Lazy (early exit) Element
Count Aggregation Eager (full) Integer
Sum Aggregation Eager (full) Number
Min Aggregation Eager (full) Number
Max Aggregation Eager (full) Number
Average Aggregation Eager (full) Number
AverageOrDefault Aggregation Eager (full) Number or undef
Aggregate Aggregation Eager (full) Scalar
ToArray Conversion Eager (full) Array
ToList Conversion Eager (full) ArrayRef
ToDictionary Conversion Eager (full) HashRef
ToLookup Conversion Eager (full) HashRef
ToLTSV Conversion Eager (full) (file written)
ForEach Utility Eager (full) (void)
Legend:
Lazy - returns a new Query immediately; no data is read yet.
Lazy (early exit) - reads only as many elements as needed, then stops.
Lazy (stops at 2) - reads until it finds a second match, then stops.
Eager (full) - must read the entire input sequence before returning.
Eager (2nd seq) / Eager (inner seq) - the indicated sequence is read in full up front; the other sequence remains lazy.
Practical guidance:
Chain lazy operations freely - no cost until a terminal is called.
Each terminal operation exhausts the iterator; to reuse data, call
ToArray()first and rebuild withFrom(\@array).For very large files, avoid eager operations (
OrderBy,GroupBy,Join, etc.) unless the data fits in memory, or pre-filter withWhereto reduce the working set first.
Query Execution Flow
# Build query (lazy - no execution yet)
my $query = LTSV::LINQ->FromLTSV("access.log")
->Where(status => '200') # Lazy
->Select(sub { $_[0]{url} }) # Lazy
->Distinct(); # Lazy
# Execute query (terminal operation)
my @results = $query->ToArray(); # Now executes entire chain
Execution Order:
1. FromLTSV opens file and creates iterator
2. Where wraps iterator with filter
3. Select wraps with transformation
4. Distinct wraps with deduplication
5. ToArray pulls elements through chain
Each element flows through the entire chain before the next element is read.
Memory Characteristics
O(1) / Streaming Operations:
These hold at most one element in memory at a time:
Where, Select, SelectMany, Concat, Zip
Take, Skip, TakeWhile, SkipWhile
DefaultIfEmpty
ForEach, Count, Sum, Min, Max, Average, AverageOrDefault
First, FirstOrDefault, Any, All, Contains
Single, SingleOrDefault, ElementAt, ElementAtOrDefault
O(unique) Operations:
Distinct - hash grows with the number of distinct keys seen
O(second/inner sequence) Operations:
The following are partially eager: one sequence is buffered in full, the other is streamed:
Union, Intersect, Except - second sequence is fully loaded
Join, GroupJoin - inner sequence is fully loaded
O(n) / Full-materialisation Operations:
ToArray, ToList, ToDictionary, ToLookup, ToLTSV (O(n))
OrderBy, OrderByDescending and Str/Num variants, Reverse (O(n))
GroupBy (O(n))
Last, LastOrDefault (O(n))
Aggregate (O(n), O(1) intermediate accumulator)
PERFORMANCE
Memory Efficiency
Lazy evaluation means memory usage is O(1) for most operations, regardless of input size.
# Processes 1GB file with constant memory
LTSV::LINQ->FromLTSV("1gb.log")
->Where(status => '500')
->ForEach(sub { print $_[0]{url}, "\n" });
Terminal Operations
These operations materialize the entire result set:
ToArray, ToList
OrderBy, OrderByDescending, Reverse
GroupBy
Last
For large datasets, use these operations carefully.
Optimization Tips
Filter early: Place Where clauses first
# Good: Filter before expensive operations ->Where(status => '200')->OrderBy(...)->Take(10) # Bad: Order all data, then filter ->OrderBy(...)->Where(status => '200')->Take(10)Limit early: Use Take to reduce processing
# Process only what you need ->Take(1000)->GroupBy(...)Avoid repeated ToArray: Reuse results
# Bad: Calls ToArray twice my $count = scalar($query->ToArray()); my @items = $query->ToArray(); # Good: Call once, reuse my @items = $query->ToArray(); my $count = scalar(@items);
COMPATIBILITY
Perl Version Support
This module is compatible with Perl 5.00503 and later.
Tested on:
Perl 5.005_03 (released 1999)
Perl 5.6.x
Perl 5.8.x
Perl 5.10.x - 5.42.x
Compatibility Policy
Ancient Perl Support:
This module maintains compatibility with Perl 5.005_03 through careful coding practices:
No use of features introduced after 5.005
use warningscompatibility shim for pre-5.6ourkeyword avoided (5.6+ feature)Three-argument
openused on Perl 5.6 and later (two-argument form retained for 5.005_03)No Unicode features required
No module dependencies beyond core
Why Perl 5.005_03 Specification?:
This module adheres to the Perl 5.005_03 specification, which was the final version of JPerl (Japanese Perl). This is not about using the old interpreter, but about maintaining the simple, original programming model that made Perl enjoyable.
The Strength of Modern Times:
Some people think the strength of modern times is the ability to use modern technology. That thinking is insufficient. The strength of modern times is the ability to use all technology up to the present day.
By adhering to the Perl 5.005_03 specification, we gain access to the entire history of Perl--from 5.005_03 to 5.42 and beyond--rather than limiting ourselves to only the latest versions.
Key reasons:
Simplicity - The original Perl approach keeps programming fun and easy
Perl 5.6 and later introduced character encoding complexity that made programming harder. The confusion around character handling contributed to Perl's decline. By staying with the 5.005_03 specification, we maintain the simplicity that made Perl "rakuda" (camel) -> "raku" (easy/fun).
JPerl Compatibility - Preserves the last JPerl version
Perl 5.005_03 was the final version of JPerl, which handled Japanese text naturally. Later versions abandoned this approach for Unicode, adding unnecessary complexity for many use cases.
Universal Compatibility - Runs on ANY Perl version
Code written to the 5.005_03 specification runs on all Perl versions from 5.005_03 through 5.42 and beyond. This maximizes compatibility across two decades of Perl releases.
Production Systems - Real-world enterprise needs
Many production systems, embedded environments, and enterprise deployments still run Perl 5.005, 5.6, or 5.8. This module provides modern query capabilities without requiring upgrades.
Philosophy - Programming should be enjoyable
As readers of the "Camel Book" (Programming Perl) know, Perl was designed to make programming enjoyable. The 5.005_03 specification preserves this original vision.
The ina CPAN Philosophy:
All modules under the ina CPAN account (including mb, Jacode, UTF8-R2, mb-JSON, and this module) follow this principle: Write to the Perl 5.005_03 specification, test on all versions, maintain programming joy.
This is not nostalgia--it's a commitment to:
Simple, maintainable code
Maximum compatibility
The original Perl philosophy
Making programming "raku" (easy and fun)
Build System:
This module uses pmake.bat instead of traditional make, since Perl 5.005_03 on Microsoft Windows lacks make. All tests pass on Perl 5.005_03 through modern versions.
.NET LINQ Compatibility
This section documents where LTSV::LINQ's behaviour matches .NET LINQ exactly, where it intentionally differs, and where it cannot differ due to Perl's type system.
Exact matches with .NET LINQ:
Single- throws when sequence is empty or has more than one elementFirst,Last- throw when sequence is empty or no element matchesAggregate(seed, func)andAggregate(seed, func, result_selector)- matching 2- and 3-argument formsGroupBy- groups are returned in insertion order (first-seen key order)GroupJoin- every outer element appears even with zero inner matchesJoin- inner join semantics; unmatched outer elements are droppedUnion/Intersect/Except- partially eager (second/inner sequence buffered up front), matching .NET's hash-join approachTake,Skip,TakeWhile,SkipWhile- identical semanticsAll/Anywith early exit
Intentional differences from .NET LINQ:
SingleOrDefault.NET throws
InvalidOperationExceptionwhen the sequence contains more than one element. LTSV::LINQ returnsundefinstead. This makes it more natural in Perl code that checks return values withdefined.If you require strict .NET behaviour (exception on multiple elements), use
Single()inside aneval:my $val = eval { $query->Single() }; # $val is undef and $@ is set if empty or multipleDefaultIfEmpty(undef).NET's
DefaultIfEmptycan return a sequence containingnull(the reference-type default). LTSV::LINQ cannot: the iterator protocol usesundefto signal end-of-sequence, so a default value ofundefis indistinguishable from EOF and is silently lost.# .NET: seq.DefaultIfEmpty() produces one null element # Perl: LTSV::LINQ->From([])->DefaultIfEmpty(undef)->ToArray() # () - empty! LTSV::LINQ->From([])->DefaultIfEmpty(0)->ToArray() # (0) - worksUse a sentinel value (
0,'',{}) and handle it explicitly.OrderBysmart comparison.NET's
OrderByis strongly typed: the key type determines the comparison. In Perl there is no static type, so LTSV::LINQ'sOrderByuses a heuristic: if both keys look like numbers,<=>is used; otherwisecmp. For explicit control, useOrderByStr(alwayscmp) orOrderByNum(always<=>).EqualityComparer / IComparer
.NET LINQ accepts
IEqualityComparerandIComparerinterface objects for custom equality and ordering. LTSV::LINQ uses code references (sub) that extract a key from each element. This is equivalent in power but different in calling convention: the sub receives one element and returns a key, rather than receiving two elements and returning a comparison result.Concaton typed sequences.NET's
Concatis type-checked. LTSV::LINQ accepts any two sequences regardless of element type.No query expression syntax
.NET's
from x in ... where ... select ...syntax compiles to LINQ method calls. Perl has no equivalent; use method chaining directly.
Pure Perl Implementation
No XS Dependencies:
This module is implemented in Pure Perl with no XS (C extensions). Benefits:
Works on any Perl installation
No C compiler required
Easy installation in restricted environments
Consistent behavior across platforms
Simpler debugging and maintenance
Core Module Dependencies
None. This module uses only Perl core features available since 5.005.
No CPAN dependencies required.
DIAGNOSTICS
Error Messages
This module may throw the following exceptions:
From() requires ARRAY reference-
Thrown by From() when the argument is not an array reference.
Example:
LTSV::LINQ->From("string"); # Dies LTSV::LINQ->From([1, 2, 3]); # OK SelectMany: selector must return an ARRAY reference-
Thrown by SelectMany() when the selector function returns anything other than an ARRAY reference. Wrap the return value in
[...]:# Wrong - hashref causes die ->SelectMany(sub { {key => 'val'} }) # Correct - arrayref ->SelectMany(sub { [{key => 'val'}] }) # Correct - empty array for "no results" case ->SelectMany(sub { [] }) Sequence contains no elements-
Thrown by First(), Last(), or Average() when called on an empty sequence.
Methods that throw this error:
First()
Last()
Average()
To avoid this error, use the OrDefault variants:
FirstOrDefault() - returns undef instead of dying
LastOrDefault() - returns undef instead of dying
AverageOrDefault() - returns undef instead of dying
Example:
my @empty = (); LTSV::LINQ->From(\@empty)->First(); # Dies LTSV::LINQ->From(\@empty)->FirstOrDefault(); # Returns undef No element satisfies the condition-
Thrown by First() or Last() with a predicate when no element matches.
Example:
my @data = (1, 2, 3); LTSV::LINQ->From(\@data)->First(sub { $_[0] > 10 }); # Dies LTSV::LINQ->From(\@data)->FirstOrDefault(sub { $_[0] > 10 }); # Returns undef Cannot open 'filename': ...-
File I/O error when FromLTSV() cannot open the specified file.
Common causes:
File does not exist
Insufficient permissions
Invalid path
Example:
LTSV::LINQ->FromLTSV("/nonexistent/file.ltsv"); # Dies with this error
Methods That May Throw Exceptions
- From($array_ref)
-
Dies if argument is not an array reference.
- FromLTSV($filename)
-
Dies if file cannot be opened.
Note: The file handle is held open until the iterator is fully consumed. Partially consumed queries keep their file handles open. See
FromLTSVin "Data Source Methods" for details. - First([$predicate])
-
Dies if sequence is empty or no element matches predicate.
Safe alternative: FirstOrDefault()
- Last([$predicate])
-
Dies if sequence is empty or no element matches predicate.
Safe alternative: LastOrDefault()
- Average([$selector])
-
Dies if sequence is empty.
Safe alternative: AverageOrDefault()
Safe Alternatives
For methods that may throw exceptions, use the OrDefault variants:
First() -> FirstOrDefault() (returns undef)
Last() -> LastOrDefault() (returns undef)
Average() -> AverageOrDefault() (returns undef)
Example:
# Unsafe - may die
my $first = LTSV::LINQ->From(\@data)->First();
# Safe - returns undef if empty
my $first = LTSV::LINQ->From(\@data)->FirstOrDefault();
if (defined $first) {
# Process $first
}
Exception Format and Stack Traces
All exceptions thrown by this module are plain strings produced by die "message". Because no trailing newline is appended, Perl automatically appends the source location:
Sequence contains no elements at lib/LTSV/LINQ.pm line 764.
This is intentional: the location helps when diagnosing unexpected failures during development.
When catching exceptions with eval, the full string including the location suffix is available in $@. Use a prefix match if you want to test only the message text:
eval { LTSV::LINQ->From([])->First() };
if ($@ =~ /^Sequence contains no elements/) {
# handle empty sequence
}
If you prefer exceptions without the location suffix, wrap the call in a thin eval and re-die with a newline:
eval { $result = $query->First() };
die "$@\n" if $@; # strip " at ... line N" from the message
FAQ
General Questions
- Q: Why LINQ-style instead of SQL-style?
-
A: LINQ provides:
Method chaining (more Perl-like)
Type safety through code
No string parsing required
Composable queries
- Q: Can I reuse a query object?
-
A: No. Query objects use iterators that can only be consumed once.
# Wrong - iterator consumed by first ToArray my $query = LTSV::LINQ->FromLTSV("file.ltsv"); my @first = $query->ToArray(); # OK my @second = $query->ToArray(); # Empty! Iterator exhausted # Right - create new query for each use my $query1 = LTSV::LINQ->FromLTSV("file.ltsv"); my @first = $query1->ToArray(); my $query2 = LTSV::LINQ->FromLTSV("file.ltsv"); my @second = $query2->ToArray(); - Q: How do I do OR conditions in Where?
-
A: Use code reference form with
||:# OR condition requires code reference ->Where(sub { $_[0]{status} == 200 || $_[0]{status} == 304 }) # DSL only supports AND ->Where(status => '200') # Single condition only - Q: Why does my query seem to run multiple times?
-
A: Some operations require multiple passes:
# This reads the file TWICE my $avg = $query->Average(...); # Pass 1: Calculate my @all = $query->ToArray(); # Pass 2: Collect (iterator reset!) # Save result instead my @all = $query->ToArray(); my $avg = LTSV::LINQ->From(\@all)->Average(...);
Performance Questions
- Q: How can I process a huge file efficiently?
-
A: Use lazy operations and avoid materializing:
# Good - constant memory LTSV::LINQ->FromLTSV("huge.log") ->Where(status => '500') ->ForEach(sub { print $_[0]{message}, "\n" }); # Bad - loads everything into memory my @all = LTSV::LINQ->FromLTSV("huge.log")->ToArray(); - Q: Why is OrderBy slow on large files?
-
A: OrderBy must load all elements into memory to sort them.
# Slow on 1GB file - loads everything ->OrderBy(sub { $_[0]{timestamp} })->Take(10) # Faster - limit before sorting (if possible) ->Where(status => '500')->OrderBy(...)->Take(10) - Q: How do I process files larger than memory?
-
A: Use ForEach or streaming terminal operations:
# Process 100GB file with 1KB memory my $error_count = 0; LTSV::LINQ->FromLTSV("100gb.log") ->Where(sub { $_[0]{level} eq 'ERROR' }) ->ForEach(sub { $error_count++ }); print "Errors: $error_count\n";
DSL Questions
- Q: Can DSL do numeric comparisons?
-
A: No. DSL uses string equality (
eq). Use code reference for numeric:# DSL - string comparison ->Where(status => '200') # $_[0]{status} eq '200' # Code ref - numeric comparison ->Where(sub { $_[0]{status} == 200 }) ->Where(sub { $_[0]{bytes} > 1000 }) - Q: How do I do case-insensitive matching in DSL?
-
A: DSL doesn't support it. Use code reference:
# Case-insensitive requires code reference ->Where(sub { lc($_[0]{method}) eq 'get' }) - Q: Can I use regular expressions in DSL?
-
A: No. Use code reference:
# Regex requires code reference ->Where(sub { $_[0]{url} =~ m{^/api/} })
Compatibility Questions
- Q: Does this work on Perl 5.6?
-
A: Yes. Tested on Perl 5.005_03 through 5.40+.
- Q: Do I need to install any CPAN modules?
-
A: No. Pure Perl with no dependencies beyond core.
- Q: Can I use this on Windows?
-
A: Yes. Pure Perl works on all platforms.
- Q: Why support such old Perl versions?
-
A: Many production systems cannot upgrade. This module provides modern query capabilities without requiring upgrades.
COOKBOOK
Common Patterns
- Find top N by value
-
->OrderByDescending(sub { $_[0]{score} }) ->Take(10) ->ToArray() - Group and count
-
->GroupBy(sub { $_[0]{category} }) ->Select(sub { { Category => $_[0]{Key}, Count => scalar(@{$_[0]{Elements}}) } }) ->ToArray() - Running total
-
my $total = 0; ->Select(sub { $total += $_[0]{amount}; { %{$_[0]}, running_total => $total } }) - Pagination
-
# Page 3, size 20 ->Skip(40)->Take(20)->ToArray() - Unique values
-
->Select(sub { $_[0]{category} }) ->Distinct() ->ToArray() - Conditional aggregation
-
Note: A query object can only be consumed once. To compute multiple aggregations over the same source, materialise it first with
ToArray().my @all = LTSV::LINQ->FromLTSV("access.log")->ToArray(); my $success_avg = LTSV::LINQ->From(\@all) ->Where(status => '200') ->Average(sub { $_[0]{response_time} }); my $error_avg = LTSV::LINQ->From(\@all) ->Where(sub { $_[0]{status} >= 400 }) ->Average(sub { $_[0]{response_time} }); - Iterator consumption: when to snapshot with ToArray()
-
A query object wraps a single-pass iterator. Once consumed, it is exhausted and subsequent terminal operations return empty results or die.
# WRONG - $q is exhausted after the first Count() my $q = LTSV::LINQ->FromLTSV("access.log")->Where(status => '200'); my $n = $q->Count(); # OK my $first = $q->First(); # WRONG: iterator already at EOF # RIGHT - snapshot into array, then query as many times as needed my @rows = LTSV::LINQ->FromLTSV("access.log")->Where(status => '200')->ToArray(); my $n = LTSV::LINQ->From(\@rows)->Count(); my $first = LTSV::LINQ->From(\@rows)->First();The snapshot approach is also the correct pattern for any multi-pass computation such as computing both average and standard deviation, comparing the same sequence against two different filters, or iterating once to validate and once to transform.
- Efficient large-file pattern
-
For files too large to fit in memory, keep the chain fully lazy by ensuring only one terminal operation is performed per pass:
# One pass - pick only what you need my @slow = LTSV::LINQ->FromLTSV("access.log") ->Where(sub { $_[0]{response_time} > 1000 }) ->OrderByNum(sub { $_[0]{response_time} }) ->Take(20) ->ToArray(); # Never do two passes on the same FromLTSV object - # open the file again for a second pass: my $count = LTSV::LINQ->FromLTSV("access.log")->Count(); my $sum = LTSV::LINQ->FromLTSV("access.log") ->Sum(sub { $_[0]{bytes} });
DESIGN PHILOSOPHY
Historical Compatibility: Perl 5.005_03
This module maintains compatibility with Perl 5.005_03 (released 1999-03-28), following the Universal Consensus 1998 for primetools.
Why maintain such old compatibility?
Long-term stability
Code written in 1998-era Perl should still run in 2026 and beyond. This demonstrates Perl's commitment to backwards compatibility.
Embedded systems and traditional environments
Some production systems, embedded devices, and enterprise environments cannot easily upgrade Perl. Maintaining compatibility ensures this module remains useful in those contexts.
Minimal dependencies
By avoiding modern Perl features, this module has zero non-core dependencies. It works with only the Perl core that has existed since 1999.
Technical implications:
No
ourkeyword - uses package variablesNo
warningspragma - useslocal $^W=1No
use strict 'subs'improvements from 5.6+All features implemented with Perl 5.005-era constructs
The code comment # use 5.008001; # Lancaster Consensus 2013 for toolchains marks where modern code would typically start. We intentionally stay below this line.
US-ASCII Only Policy
All source code is strictly US-ASCII (bytes 0x00-0x7F). No UTF-8, no extended characters.
Rationale:
Universal portability
US-ASCII works everywhere - ancient terminals, modern IDEs, web browsers, email systems. No encoding issues, ever.
No locale dependencies
The code behaves identically regardless of system locale settings.
Clear separation of concerns
Source code (ASCII) vs. data (any encoding). The module processes LTSV data in any encoding, but its own code remains pure ASCII.
This policy is verified by t/010_ascii_only.t.
The $VERSION = $VERSION Idiom
You may notice:
$VERSION = '1.05';
$VERSION = $VERSION;
This is intentional, not a typo. Under use strict, a variable used only once triggers a warning. The self-assignment ensures $VERSION appears twice, silencing the warning without requiring our (which doesn't exist in Perl 5.005).
This is a well-known idiom from the pre-our era.
Design Principles
Lazy evaluation by default
Operations return query objects, not arrays. Data is processed on-demand when terminal operations (
ToArray,Count, etc.) are called.Method chaining
All query operations return new query objects, enabling fluent syntax:
$query->Where(...)->Select(...)->OrderBy(...)->ToArray()No side effects
Query operations never modify the source data. They create new lazy iterators.
Perl idioms, LINQ semantics
We follow LINQ's method names and semantics, but use Perl idioms for implementation (closures for iterators, hash refs for records).
Zero dependencies
This module has zero non-core dependencies. It works with only the Perl core that has existed since 1999. Even
warnings.pmis optional (stubbed for Perl < 5.6). This ensures installation succeeds on minimal Perl installations, avoids dependency chain vulnerabilities, and provides permanence - the code will work decades into the future.
LIMITATIONS AND KNOWN ISSUES
Current Limitations
Iterator Consumption
Query objects can only be consumed once. The iterator is exhausted after terminal operations.
Workaround: Create new query object or save ToArray() result.
Undef Values in Sequences
Due to iterator-based design, undef cannot be distinguished from end-of-sequence. Sequences containing undef values may not work correctly with all operations.
This is not a practical limitation for LTSV data (which uses hash references), but affects operations on plain arrays containing undef.
# Works fine (LTSV data - hash references) LTSV::LINQ->FromLTSV("file.ltsv")->Contains({status => '200'}) # Limitation (plain array with undef) LTSV::LINQ->From([1, undef, 3])->Contains(undef) # May not workNo Parallel Execution
All operations execute sequentially in a single thread.
No Index Support
All filtering requires full scan. No index optimization.
Distinct Uses String Keys
Distinct with custom comparer uses stringified keys. May not work correctly for complex objects.
DefaultIfEmpty(undef) Cannot Be Distinguished from End-of-Sequence
Because the iterator protocol uses
undefto signal end-of-sequence,DefaultIfEmpty(undef)cannot reliably deliver itsundefdefault to downstream operations.# Works correctly (non-undef default) LTSV::LINQ->From([])->DefaultIfEmpty(0)->ToArray() # (0) LTSV::LINQ->From([])->DefaultIfEmpty({})->ToArray() # ({}) # Does NOT work (undef default is indistinguishable from EOF) LTSV::LINQ->From([])->DefaultIfEmpty(undef)->ToArray() # () - empty!Workaround: Use a sentinel value such as
0,'', or{}instead ofundef, and treat it as "no element" after the fact.
Not Implemented
The following LINQ methods from the .NET standard library are intentionally not implemented in LTSV::LINQ. This section explains the design rationale for each omission.
Parallel LINQ (PLINQ) Methods
The following methods belong to Parallel LINQ (PLINQ), the .NET parallel-execution extension to LINQ introduced in .NET 4.0. They exist to distribute query execution across multiple CPU cores using the .NET Thread Pool and Task Parallel Library.
Perl does not have native shared-memory multithreading that maps onto this execution model. Perl threads (threads.pm) copy the interpreter state and communicate through shared variables, making them unsuitable for the fine-grained, automatic work-stealing parallelism that PLINQ provides. LTSV::LINQ's iterator-based design assumes a single sequential execution context; introducing PLINQ semantics would require a completely different architecture and would add heavy dependencies.
Furthermore, the primary use case for LTSV::LINQ -- parsing and querying LTSV log files -- is typically I/O-bound rather than CPU-bound. Parallelizing I/O over a single file provides little benefit and considerable complexity.
For these reasons, the entire PLINQ surface is omitted:
AsParallel
Entry point for PLINQ. Converts an
IEnumerable<T> into aParallelQuery<T> that the .NET runtime executes in parallel using multiple threads. Not applicable: Perl lacks the runtime infrastructure.AsSequential
Converts a
ParallelQuery<T> back to a sequentialIEnumerable<T>, forcing subsequent operators to run on a single thread. SinceAsParallelis not implemented,AsSequentialhas no counterpart to convert from.AsOrdered
Instructs PLINQ to preserve the source order in the output even during parallel execution. This is a hint to the PLINQ scheduler; it does not exist outside of PLINQ. Not applicable.
AsUnordered
Instructs PLINQ that output order does not need to match source order, potentially allowing more efficient parallel execution. Not applicable.
ForAll
PLINQ terminal operator that applies an action to each element in parallel, without collecting results. It is the parallel equivalent of
ForEach. LTSV::LINQ providesForEachfor sequential iteration. A parallelForAllis not applicable.WithCancellation
Attaches a .NET
CancellationTokento aParallelQuery<T>, allowing cooperative cancellation of a running parallel query. Cancellation tokens are a .NET threading primitive. Not applicable.WithDegreeOfParallelism
Sets the maximum number of concurrent tasks that PLINQ may use. A tuning knob for the PLINQ scheduler. Not applicable.
WithExecutionMode
Controls whether PLINQ may choose sequential execution for efficiency (
Default) or is forced to parallelize (ForceParallelism). Not applicable.WithMergeOptions
Controls how PLINQ merges results from parallel partitions back into the output stream (buffered, auto-buffered, or not-buffered). Not applicable.
.NET Type System Methods
The following methods are specific to .NET's static type system. They exist to work with .NET generics and interface hierarchies, which have no Perl equivalent.
Cast
Casts each element of a non-generic
IEnumerableto a specified typeT, returningIEnumerable<T>. In .NET,Cast<T> is needed when working with legacy APIs that returnIEnumerable(without a type parameter) and you need to treat the elements as a specific type.Perl is dynamically typed. Every Perl value already holds type information at runtime (scalar, reference, blessed object), and Perl does not have a concept of a "non-generic enumerable" that needs to be explicitly cast before it can be queried. There is no meaningful operation to implement.
OfType
Filters elements of a non-generic
IEnumerable, returning only those that can be successfully cast to a specified typeT. LikeCast, it exists to bridge generic and non-generic .NET APIs.In LTSV::LINQ, all records from
FromLTSVare hash references. Records fromFromare whatever the caller puts in the array. Perl'sref(),UNIVERSAL::isa(), or aWherepredicate can perform any type-based filtering the caller needs. A dedicatedOfTypeadds no expressiveness.# Perl equivalent of OfType for blessed objects of class "Foo": $query->Where(sub { ref($_[0]) && $_[0]->isa('Foo') })
64-bit and Large-Count Methods
LongCount
Returns the number of elements as a 64-bit integer (
Int64in .NET). On 32-bit .NET platforms, a sequence can theoretically contain more than2**31 - 1(~2 billion) elements, which would overflowint; hence the need forLongCount.In Perl, integers are represented as native signed integers or floating- point doubles (
NV). On 64-bit Perl (which is universal in practice today), the native integer type is 64 bits, soCountalready handles any realistic sequence length. On 32-bit Perl, the floating-pointNVprovides 53 bits of integer precision (~9 quadrillion), far exceeding any in-memory sequence. There is no semantic gap betweenCountandLongCountin Perl.
IEnumerable Conversion Method
AsEnumerable
In .NET,
AsEnumerable<T> is used to force evaluation of a query asIEnumerable<T> rather than, for example,IQueryable<T> (which might be translated to SQL). It is a type-cast at the interface level, not a data transformation.LTSV::LINQ has only one query type:
LTSV::LINQ. There is noIQueryablecounterpart that would benefit from being downgraded toIEnumerable. The method has no meaningful semantics to implement.
BUGS
Please report any bugs or feature requests to:
Email:
ina@cpan.org
SUPPORT
Documentation
Full documentation is available via:
perldoc LTSV::LINQ
CPAN
https://metacpan.org/pod/LTSV::LINQ
SEE ALSO
LTSV specification
http://ltsv.org/
Microsoft LINQ documentation
https://learn.microsoft.com/en-us/dotnet/csharp/linq/
AUTHOR
INABA Hitoshi <ina@cpan.org>
Contributors
Contributions are welcome! See file: CONTRIBUTING.
ACKNOWLEDGEMENTS
LINQ Technology
This module is inspired by LINQ (Language Integrated Query), which was developed by Microsoft Corporation for the .NET Framework.
LINQ(R) is a registered trademark of Microsoft Corporation.
We are grateful to Microsoft for pioneering the LINQ technology and making it a widely recognized programming pattern. The elegance and power of LINQ has influenced query interfaces across many programming languages, and this module brings that same capability to LTSV data processing in Perl.
This module is not affiliated with, endorsed by, or sponsored by Microsoft Corporation.
References
This module was inspired by:
Microsoft LINQ (Language Integrated Query)
LTSV specification
COPYRIGHT AND LICENSE
Copyright (c) 2026 INABA Hitoshi
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
License Details
This module is released under the same license as Perl itself:
Artistic License 1.0
GNU General Public License version 1 or later
You may choose either license.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.