NAME
CSV::LINQ - LINQ-style query interface for CSV files
VERSION
Version 1.00
SYNOPSIS
use CSV::LINQ;
# Read CSV file and query
my @results = CSV::LINQ->FromCSV("sales.csv")
->Where(sub { $_[0]{amount} > 1000 })
->Select(sub { $_[0]{name} })
->Distinct()
->ToArray();
# DSL syntax for simple filtering
my @tokyo = CSV::LINQ->FromCSV("users.csv")
->Where(city => 'Tokyo')
->ToArray();
# Grouping and aggregation
my @stats = CSV::LINQ->FromCSV("sales.csv")
->GroupBy(sub { $_[0]{category} })
->Select(sub {
my $g = shift;
return {
Category => $g->{Key},
Count => scalar(@{$g->{Elements}}),
Total => CSV::LINQ->From($g->{Elements})
->Sum(sub { $_[0]{amount} }),
};
})
->OrderByNumDescending(sub { $_[0]{Total} })
->ToArray();
TABLE OF CONTENTS
"METHODS" - Complete method reference (60 methods)
"EXAMPLES" - Practical examples
"FEATURES" - Lazy evaluation, method chaining, DSL
"ARCHITECTURE" - Iterator design, execution flow
"PERFORMANCE" - Memory usage, optimization tips
"COMPATIBILITY" - Perl 5.005+ support, pure Perl
"DIAGNOSTICS" - Error messages
"COOKBOOK" - Common patterns
"LIMITATIONS AND KNOWN ISSUES" - Iterator consumption, undef values
"BUGS" - Bug reports
DESCRIPTION
CSV::LINQ provides a LINQ-style query interface for CSV (Comma-Separated Values) files. It offers a fluent, chainable API for filtering, transforming, and aggregating CSV data.
Key features:
Lazy evaluation - O(1) memory usage for most operations
Method chaining - Fluent, readable query composition
DSL syntax - Simple key-value filtering
RFC 4180 compliant - Proper CSV parsing including quoted fields
60 LINQ methods - Comprehensive query capabilities
Pure Perl - No XS dependencies
Perl 5.005_03+ - Works on ancient and modern Perl
What is CSV?
CSV (Comma-Separated Values) is the most widely used format for tabular data exchange. The first row is treated as a header row containing column names. Each subsequent row contains values for those columns.
Example:
name,age,city
Alice,30,Tokyo
Bob,25,Osaka
Carol,35,Tokyo
What is LINQ?
LINQ (Language Integrated Query) is a query syntax in C# and .NET. This module brings LINQ-style querying to Perl for CSV data.
For more information: https://learn.microsoft.com/en-us/dotnet/csharp/linq/
METHODS
Complete Method Reference
This module implements 60 LINQ-style methods organized into 15 categories:
Data Sources (5): From, FromCSV, Range, Empty, Repeat
Filtering (1): Where (with DSL)
Projection (2): Select, SelectMany
Concatenation (2): Concat, Zip
Partitioning (4): Take, Skip, TakeWhile, SkipWhile
Ordering (7): OrderBy, OrderByDescending, OrderByStr, OrderByStrDescending, OrderByNum, OrderByNumDescending, Reverse
Secondary Ordering (6): ThenBy, ThenByDescending, ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending (via CSV::LINQ::Ordered)
Grouping (1): GroupBy
Set Operations (4): Distinct, Union, Intersect, Except
Join Operations (2): Join, GroupJoin
Quantifiers (4): All, Any, Contains, SequenceEqual
Element Access (8): First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault
Aggregation (7): Count, Sum, Min, Max, Average, AverageOrDefault, Aggregate
Conversion (6): ToArray, ToList, ToCSV, DefaultIfEmpty, ToDictionary, ToLookup
Utility (1): ForEach
Data Source Methods
- From(\@array)
-
Create a query from an array.
my $query = CSV::LINQ->From([{name => 'Alice'}, {name => 'Bob'}]); - FromCSV($file [, %opts])
-
Create a query from a CSV file. The first line is used as column names (header row), and each data row is returned as a hash reference.
Options:
sep- Field separator (default:','). Use"\t"for TSV.headers- Array reference of column names. If given, the first data line is used as data (no header in file). Combine withskip_headerto skip an existing header line.skip_header- If true, skip the first line even whenheadersis given.
# Standard CSV my $q = CSV::LINQ->FromCSV("data.csv"); # Tab-separated (TSV) my $q = CSV::LINQ->FromCSV("data.tsv", sep => "\t"); # Explicit headers (headerless CSV) my $q = CSV::LINQ->FromCSV("noheader.csv", headers => [qw(name age city)]); - Range($start, $count)
-
Generate a sequence of integers.
my $q = CSV::LINQ->Range(1, 10); # 1, 2, ..., 10 - Empty()
-
Return an empty sequence.
my $q = CSV::LINQ->Empty(); - Repeat($element, $count)
-
Return a sequence that repeats $element $count times.
my $q = CSV::LINQ->Repeat({value => 0}, 5);
Filtering Methods
- Where($predicate)
- Where(key = value, ...)>
-
Filter elements. Accepts either a code reference or DSL form.
Code Reference Form:
->Where(sub { $_[0]{age} >= 20 }) ->Where(sub { $_[0]{city} eq 'Tokyo' && $_[0]{age} > 30 })DSL Form (string equality, AND):
->Where(city => 'Tokyo') ->Where(city => 'Tokyo', role => 'admin')
Projection Methods
- Select($selector)
-
Transform each element.
->Select(sub { $_[0]{name} }) ->Select(sub { { Name => $_[0]{name}, Age => $_[0]{age} } }) - SelectMany($selector)
-
Flatten nested sequences. Selector must return an ARRAY reference.
->SelectMany(sub { $_[0]{tags} })
Concatenation Methods
- Concat($second)
-
Concatenate two sequences.
$q1->Concat($q2)->ToArray() - Zip($second, $selector)
-
Combine two sequences element by element.
$q1->Zip($q2, sub { [$_[0], $_[1]] })->ToArray()
Partitioning Methods
- Take($count)
-
Take first N elements.
- Skip($count)
-
Skip first N elements.
- TakeWhile($predicate)
-
Take while predicate is true (stops at first false).
- SkipWhile($predicate)
-
Skip while predicate is true.
Ordering Methods
Note: All ordering methods are materializing (load all data into memory).
- OrderBy($key_selector)
-
Sort ascending (smart comparison: numeric keys sort numerically, string keys sort with
cmp; numeric values sort before strings when types differ). UseOrderByStrto force pure string comparison. - OrderByDescending($key_selector)
-
Sort descending (smart comparison, same rules as
OrderBy). UseOrderByStrDescendingto force pure string comparison. - OrderByStr($key_selector)
-
Sort ascending (pure string comparison with
cmp). - OrderByStrDescending($key_selector)
-
Sort descending (pure string comparison with
cmp). - OrderByNum($key_selector)
-
Sort ascending (numeric comparison with
<=>). - OrderByNumDescending($key_selector)
-
Sort descending (numeric comparison).
- Reverse()
-
Reverse the order.
ThenBy methods (available after OrderBy* via CSV::LINQ::Ordered):
ThenBy and ThenByDescending use smart comparison (same rules as OrderBy). ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending use string and numeric comparison respectively.
Grouping Methods
- GroupBy($key_selector [, $element_selector])
-
Group elements. Returns query of hashrefs with
KeyandElementsfields.->GroupBy(sub { $_[0]{city} })
Set Operations
- Distinct([$comparer])
-
Remove duplicates.
- Union($second [, $comparer])
-
Set union (no duplicates).
- Intersect($second [, $comparer])
-
Set intersection.
- Except($second [, $comparer])
-
Set difference.
Join Operations
- Join($inner, $outer_key, $inner_key, $result_selector)
-
Inner join. Inner sequence is fully buffered.
$orders->Join( $customers, sub { $_[0]{customer_id} }, sub { $_[0]{id} }, sub { { Order => $_[0], Customer => $_[1] } } ) - GroupJoin($inner, $outer_key, $inner_key, $result_selector)
-
Left outer join. Inner group passed as re-iterable CSV::LINQ object.
Quantifier Methods
- All($predicate)
-
True if all elements satisfy predicate.
- Any([$predicate])
-
True if any element satisfies predicate (or sequence non-empty).
- Contains($value [, $comparer])
-
True if sequence contains value.
- SequenceEqual($second [, $comparer])
-
True if both sequences have same elements in same order.
Element Access Methods
- First([$predicate])
-
First element. Dies if empty.
- FirstOrDefault([$predicate,] $default)
-
First element or default.
- Last([$predicate])
-
Last element. Dies if empty.
- LastOrDefault([$predicate])
-
Last element or undef.
- Single([$predicate])
-
The only element. Dies if not exactly one.
- SingleOrDefault([$predicate])
-
The only element or undef.
- ElementAt($index)
-
Element at zero-based index. Dies if out of range.
- ElementAtOrDefault($index)
-
Element at index or undef.
Aggregation Methods
- Count([$predicate])
-
Count elements.
- Sum([$selector])
-
Sum of numeric values.
- Min([$selector])
-
Minimum value.
- Max([$selector])
-
Maximum value.
- Average([$selector])
-
Arithmetic mean. Dies if empty.
- AverageOrDefault([$selector])
-
Arithmetic mean or undef if empty.
- Aggregate([$seed,] $func [, $result_selector])
-
General fold/reduce operation.
Conversion Methods
- ToArray()
-
Convert to list.
my @arr = $query->ToArray(); - ToList()
-
Convert to array reference.
my $aref = $query->ToList(); - ToCSV($file [, %opts])
-
Write sequence to CSV file.
Options:
sep(default','),headers(arrayref),label_order(arrayref, alias forheaders),no_header(bool).$query->ToCSV("output.csv"); $query->ToCSV("output.tsv", sep => "\t"); $query->ToCSV("output.csv", headers => [qw(name age city)]); - DefaultIfEmpty([$default])
-
Return default if sequence is empty.
- ToDictionary($key_selector [, $value_selector])
-
Convert to hash reference (key => element or transformed value).
- ToLookup($key_selector [, $value_selector])
-
Convert to hash reference (key => [elements]).
Utility Methods
- ForEach($action)
-
Execute action for each element (void context).
$query->ForEach(sub { print $_[0]{name}, "\n" });
EXAMPLES
Basic CSV Query
use CSV::LINQ;
# sales.csv:
# name,amount,category
# Alice,1500,A
# Bob,800,B
# Carol,2000,A
my @high_sales = CSV::LINQ->FromCSV("sales.csv")
->Where(sub { $_[0]{amount} > 1000 })
->OrderByNumDescending(sub { $_[0]{amount} })
->ToArray();
Grouping and Aggregation
my @by_category = CSV::LINQ->FromCSV("sales.csv")
->GroupBy(sub { $_[0]{category} })
->Select(sub {
my $g = shift;
return {
Category => $g->{Key},
Count => scalar(@{$g->{Elements}}),
Total => CSV::LINQ->From($g->{Elements})
->Sum(sub { $_[0]{amount} }),
};
})
->OrderByStrDescending(sub { $_[0]{Total} })
->ToArray();
Join Two CSV Files
# orders.csv: id,customer_id,amount
# customers.csv: id,name,city
my $orders = CSV::LINQ->FromCSV("orders.csv");
my $customers = CSV::LINQ->FromCSV("customers.csv");
my @joined = $orders->Join(
$customers,
sub { $_[0]{customer_id} },
sub { $_[0]{id} },
sub { { Name => $_[1]{name}, Amount => $_[0]{amount} } }
)->ToArray();
TSV Support
my @data = CSV::LINQ->FromCSV("data.tsv", sep => "\t")
->Where(status => 'active')
->ToArray();
Transform and Write
CSV::LINQ->FromCSV("input.csv")
->Select(sub {
my $r = shift;
return { %{$r}, processed => 1 };
})
->ToCSV("output.csv");
FEATURES
Lazy Evaluation
All query operations use lazy evaluation via iterators. Data is processed on-demand, not all at once.
# Only reads 10 records from file
my @top10 = CSV::LINQ->FromCSV("huge.csv")
->Take(10)
->ToArray();
RFC 4180 Compliant CSV Parsing
Correctly handles:
Quoted fields containing commas
Quoted fields containing double-quotes (escaped as
"")Quoted fields containing newlines
Empty fields
DSL Syntax
Simple key-value filtering without code references:
->Where(city => 'Tokyo', role => 'admin')
ARCHITECTURE
Iterator-Based Design
Each query operation returns a new query object wrapping an iterator (a code reference that produces one element per call, returning undef to signal end-of-sequence).
Memory Characteristics
Constant Memory Operations: Where, Select, SelectMany, Concat, Zip, Take, Skip, TakeWhile, SkipWhile, Distinct, ForEach, Count, Sum, Min, Max, Average, First, Any, All.
Linear Memory Operations: ToArray, ToList, ToCSV, OrderBy*, GroupBy, Last, Reverse.
PERFORMANCE
Filter early with Where before OrderBy or GroupBy.
Use Take to limit processing of large files.
Reuse ToArray() result rather than iterating the query twice.
COMPATIBILITY
This module is compatible with Perl 5.00503 and later.
Uses only Perl core features. No CPAN dependencies required.
Build system: pmake.bat (Perl 5.005_03 on Windows lacks make).
DIAGNOSTICS
Where() DSL requires even number of arguments-
Where() was called in DSL form with an odd number of arguments. DSL form requires key-value pairs:
->Where(key => value, ...). From() requires ARRAY reference-
From() was called with a non-array-reference argument.
Cannot open '<filename>': <reason>-
FromCSV() or ToCSV() could not open the file.
Sequence contains no elements-
First(), Last(), Average(), Single() called on empty sequence.
No element satisfies the condition-
First() or Last() with predicate found no matching element.
Sequence contains more than one element-
Single() found more than one element.
SelectMany: selector must return an ARRAY reference-
The selector passed to SelectMany() returned a non-array-reference.
ElementAt: index out of range-
ElementAt() was called with a negative or out-of-range index.
COOKBOOK
Top N by numeric field
->OrderByNumDescending(sub { $_[0]{score} })
->Take(10)
->ToArray()
Group and count
->GroupBy(sub { $_[0]{category} })
->Select(sub {
{
Category => $_[0]{Key},
Count => scalar(@{$_[0]{Elements}}),
}
})
->ToArray()
Pagination
# Page 3, size 20
->Skip(40)->Take(20)->ToArray()
Unique values of a column
->Select(sub { $_[0]{category} })
->Distinct()
->ToArray()
CSV round-trip
CSV::LINQ->FromCSV("input.csv")
->Where(sub { $_[0]{active} eq '1' })
->ToCSV("active.csv");
LIMITATIONS AND KNOWN ISSUES
ToCSV Column Order Without
headersWhen writing hash-reference sequences with
ToCSV()and noheadersoption, column order is determined bysort keysof the first record. To guarantee a specific column order, always pass theheadersoption:$query->ToCSV("out.csv", headers => [qw(name age city)]);Iterator Consumption
Query objects can only be consumed once. The iterator is exhausted after terminal operations. Create a new query or save ToArray() result to reuse.
Undef Values
Due to the iterator-based design, undef signals end-of-sequence. Sequences containing undef values may not work correctly with all operations. This is not a practical limitation for CSV data (which uses hash references).
Multi-line CSV Fields
FromCSV() reads files one line at a time. CSV fields that span multiple lines (embedded newlines within double-quoted fields) are not yet supported.
No Parallel Execution
All operations execute sequentially in a single thread.
BUGS
Please report any bugs or feature requests to:
Email: ina@cpan.org
SEE ALSO
LTSV::LINQ - LINQ-style query interface for LTSV files
JSON::LINQ - LINQ-style query interface for JSON/JSONL files
RFC 4180: https://www.ietf.org/rfc/rfc4180.txt
Microsoft LINQ documentation: https://learn.microsoft.com/en-us/dotnet/csharp/linq/
AUTHOR
INABA Hitoshi <ina@cpan.org>
Contributors
Contributions are welcome! See file: CONTRIBUTING.
ACKNOWLEDGEMENTS
LINQ Technology
This module is inspired by LINQ (Language Integrated Query), developed by Microsoft Corporation for the .NET Framework.
LINQ(R) is a registered trademark of Microsoft Corporation.
References
Microsoft LINQ: https://learn.microsoft.com/en-us/dotnet/csharp/linq/
RFC 4180 (CSV): https://www.ietf.org/rfc/rfc4180.txt
LTSV::LINQ (inspiration): https://metacpan.org/pod/LTSV::LINQ
COPYRIGHT AND LICENSE
Copyright (c) 2026 INABA Hitoshi
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
License Details
Artistic License 1.0: http://dev.perl.org/licenses/artistic.html
GNU General Public License version 1 or later: http://www.gnu.org/licenses/gpl-1.0.html
You may choose either license.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.