NAME

CSV::LINQ - LINQ-style query interface for CSV files

VERSION

Version 1.00

SYNOPSIS

use CSV::LINQ;

# Read CSV file and query
my @results = CSV::LINQ->FromCSV("sales.csv")
    ->Where(sub { $_[0]{amount} > 1000 })
    ->Select(sub { $_[0]{name} })
    ->Distinct()
    ->ToArray();

# DSL syntax for simple filtering
my @tokyo = CSV::LINQ->FromCSV("users.csv")
    ->Where(city => 'Tokyo')
    ->ToArray();

# Grouping and aggregation
my @stats = CSV::LINQ->FromCSV("sales.csv")
    ->GroupBy(sub { $_[0]{category} })
    ->Select(sub {
        my $g = shift;
        return {
            Category => $g->{Key},
            Count    => scalar(@{$g->{Elements}}),
            Total    => CSV::LINQ->From($g->{Elements})
                            ->Sum(sub { $_[0]{amount} }),
        };
    })
    ->OrderByNumDescending(sub { $_[0]{Total} })
    ->ToArray();

TABLE OF CONTENTS

DESCRIPTION

CSV::LINQ provides a LINQ-style query interface for CSV (Comma-Separated Values) files. It offers a fluent, chainable API for filtering, transforming, and aggregating CSV data.

Key features:

  • Lazy evaluation - O(1) memory usage for most operations

  • Method chaining - Fluent, readable query composition

  • DSL syntax - Simple key-value filtering

  • RFC 4180 compliant - Proper CSV parsing including quoted fields

  • 60 LINQ methods - Comprehensive query capabilities

  • Pure Perl - No XS dependencies

  • Perl 5.005_03+ - Works on ancient and modern Perl

What is CSV?

CSV (Comma-Separated Values) is the most widely used format for tabular data exchange. The first row is treated as a header row containing column names. Each subsequent row contains values for those columns.

Example:

name,age,city
Alice,30,Tokyo
Bob,25,Osaka
Carol,35,Tokyo

What is LINQ?

LINQ (Language Integrated Query) is a query syntax in C# and .NET. This module brings LINQ-style querying to Perl for CSV data.

For more information: https://learn.microsoft.com/en-us/dotnet/csharp/linq/

METHODS

Complete Method Reference

This module implements 60 LINQ-style methods organized into 15 categories:

  • Data Sources (5): From, FromCSV, Range, Empty, Repeat

  • Filtering (1): Where (with DSL)

  • Projection (2): Select, SelectMany

  • Concatenation (2): Concat, Zip

  • Partitioning (4): Take, Skip, TakeWhile, SkipWhile

  • Ordering (7): OrderBy, OrderByDescending, OrderByStr, OrderByStrDescending, OrderByNum, OrderByNumDescending, Reverse

  • Secondary Ordering (6): ThenBy, ThenByDescending, ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending (via CSV::LINQ::Ordered)

  • Grouping (1): GroupBy

  • Set Operations (4): Distinct, Union, Intersect, Except

  • Join Operations (2): Join, GroupJoin

  • Quantifiers (4): All, Any, Contains, SequenceEqual

  • Element Access (8): First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault

  • Aggregation (7): Count, Sum, Min, Max, Average, AverageOrDefault, Aggregate

  • Conversion (6): ToArray, ToList, ToCSV, DefaultIfEmpty, ToDictionary, ToLookup

  • Utility (1): ForEach

Data Source Methods

From(\@array)

Create a query from an array.

my $query = CSV::LINQ->From([{name => 'Alice'}, {name => 'Bob'}]);
FromCSV($file [, %opts])

Create a query from a CSV file. The first line is used as column names (header row), and each data row is returned as a hash reference.

Options:

sep - Field separator (default: ','). Use "\t" for TSV.
headers - Array reference of column names. If given, the first data line is used as data (no header in file). Combine with skip_header to skip an existing header line.
skip_header - If true, skip the first line even when headers is given.
# Standard CSV
my $q = CSV::LINQ->FromCSV("data.csv");

# Tab-separated (TSV)
my $q = CSV::LINQ->FromCSV("data.tsv", sep => "\t");

# Explicit headers (headerless CSV)
my $q = CSV::LINQ->FromCSV("noheader.csv",
    headers    => [qw(name age city)]);
Range($start, $count)

Generate a sequence of integers.

my $q = CSV::LINQ->Range(1, 10);  # 1, 2, ..., 10
Empty()

Return an empty sequence.

my $q = CSV::LINQ->Empty();
Repeat($element, $count)

Return a sequence that repeats $element $count times.

my $q = CSV::LINQ->Repeat({value => 0}, 5);

Filtering Methods

Where($predicate)
Where(key = value, ...)>

Filter elements. Accepts either a code reference or DSL form.

Code Reference Form:

->Where(sub { $_[0]{age} >= 20 })
->Where(sub { $_[0]{city} eq 'Tokyo' && $_[0]{age} > 30 })

DSL Form (string equality, AND):

->Where(city => 'Tokyo')
->Where(city => 'Tokyo', role => 'admin')

Projection Methods

Select($selector)

Transform each element.

->Select(sub { $_[0]{name} })
->Select(sub { { Name => $_[0]{name}, Age => $_[0]{age} } })
SelectMany($selector)

Flatten nested sequences. Selector must return an ARRAY reference.

->SelectMany(sub { $_[0]{tags} })

Concatenation Methods

Concat($second)

Concatenate two sequences.

$q1->Concat($q2)->ToArray()
Zip($second, $selector)

Combine two sequences element by element.

$q1->Zip($q2, sub { [$_[0], $_[1]] })->ToArray()

Partitioning Methods

Take($count)

Take first N elements.

Skip($count)

Skip first N elements.

TakeWhile($predicate)

Take while predicate is true (stops at first false).

SkipWhile($predicate)

Skip while predicate is true.

Ordering Methods

Note: All ordering methods are materializing (load all data into memory).

OrderBy($key_selector)

Sort ascending (smart comparison: numeric keys sort numerically, string keys sort with cmp; numeric values sort before strings when types differ). Use OrderByStr to force pure string comparison.

OrderByDescending($key_selector)

Sort descending (smart comparison, same rules as OrderBy). Use OrderByStrDescending to force pure string comparison.

OrderByStr($key_selector)

Sort ascending (pure string comparison with cmp).

OrderByStrDescending($key_selector)

Sort descending (pure string comparison with cmp).

OrderByNum($key_selector)

Sort ascending (numeric comparison with <=>).

OrderByNumDescending($key_selector)

Sort descending (numeric comparison).

Reverse()

Reverse the order.

ThenBy methods (available after OrderBy* via CSV::LINQ::Ordered):

ThenBy and ThenByDescending use smart comparison (same rules as OrderBy). ThenByStr, ThenByStrDescending, ThenByNum, ThenByNumDescending use string and numeric comparison respectively.

Grouping Methods

GroupBy($key_selector [, $element_selector])

Group elements. Returns query of hashrefs with Key and Elements fields.

->GroupBy(sub { $_[0]{city} })

Set Operations

Distinct([$comparer])

Remove duplicates.

Union($second [, $comparer])

Set union (no duplicates).

Intersect($second [, $comparer])

Set intersection.

Except($second [, $comparer])

Set difference.

Join Operations

Join($inner, $outer_key, $inner_key, $result_selector)

Inner join. Inner sequence is fully buffered.

$orders->Join(
    $customers,
    sub { $_[0]{customer_id} },
    sub { $_[0]{id} },
    sub { { Order => $_[0], Customer => $_[1] } }
)
GroupJoin($inner, $outer_key, $inner_key, $result_selector)

Left outer join. Inner group passed as re-iterable CSV::LINQ object.

Quantifier Methods

All($predicate)

True if all elements satisfy predicate.

Any([$predicate])

True if any element satisfies predicate (or sequence non-empty).

Contains($value [, $comparer])

True if sequence contains value.

SequenceEqual($second [, $comparer])

True if both sequences have same elements in same order.

Element Access Methods

First([$predicate])

First element. Dies if empty.

FirstOrDefault([$predicate,] $default)

First element or default.

Last([$predicate])

Last element. Dies if empty.

LastOrDefault([$predicate])

Last element or undef.

Single([$predicate])

The only element. Dies if not exactly one.

SingleOrDefault([$predicate])

The only element or undef.

ElementAt($index)

Element at zero-based index. Dies if out of range.

ElementAtOrDefault($index)

Element at index or undef.

Aggregation Methods

Count([$predicate])

Count elements.

Sum([$selector])

Sum of numeric values.

Min([$selector])

Minimum value.

Max([$selector])

Maximum value.

Average([$selector])

Arithmetic mean. Dies if empty.

AverageOrDefault([$selector])

Arithmetic mean or undef if empty.

Aggregate([$seed,] $func [, $result_selector])

General fold/reduce operation.

Conversion Methods

ToArray()

Convert to list.

my @arr = $query->ToArray();
ToList()

Convert to array reference.

my $aref = $query->ToList();
ToCSV($file [, %opts])

Write sequence to CSV file.

Options: sep (default ','), headers (arrayref), label_order (arrayref, alias for headers), no_header (bool).

$query->ToCSV("output.csv");
$query->ToCSV("output.tsv", sep => "\t");
$query->ToCSV("output.csv", headers => [qw(name age city)]);
DefaultIfEmpty([$default])

Return default if sequence is empty.

ToDictionary($key_selector [, $value_selector])

Convert to hash reference (key => element or transformed value).

ToLookup($key_selector [, $value_selector])

Convert to hash reference (key => [elements]).

Utility Methods

ForEach($action)

Execute action for each element (void context).

$query->ForEach(sub { print $_[0]{name}, "\n" });

EXAMPLES

Basic CSV Query

use CSV::LINQ;

# sales.csv:
#   name,amount,category
#   Alice,1500,A
#   Bob,800,B
#   Carol,2000,A

my @high_sales = CSV::LINQ->FromCSV("sales.csv")
    ->Where(sub { $_[0]{amount} > 1000 })
    ->OrderByNumDescending(sub { $_[0]{amount} })
    ->ToArray();

Grouping and Aggregation

my @by_category = CSV::LINQ->FromCSV("sales.csv")
    ->GroupBy(sub { $_[0]{category} })
    ->Select(sub {
        my $g = shift;
        return {
            Category => $g->{Key},
            Count    => scalar(@{$g->{Elements}}),
            Total    => CSV::LINQ->From($g->{Elements})
                            ->Sum(sub { $_[0]{amount} }),
        };
    })
    ->OrderByStrDescending(sub { $_[0]{Total} })
    ->ToArray();

Join Two CSV Files

# orders.csv: id,customer_id,amount
# customers.csv: id,name,city

my $orders    = CSV::LINQ->FromCSV("orders.csv");
my $customers = CSV::LINQ->FromCSV("customers.csv");

my @joined = $orders->Join(
    $customers,
    sub { $_[0]{customer_id} },
    sub { $_[0]{id} },
    sub { { Name => $_[1]{name}, Amount => $_[0]{amount} } }
)->ToArray();

TSV Support

my @data = CSV::LINQ->FromCSV("data.tsv", sep => "\t")
    ->Where(status => 'active')
    ->ToArray();

Transform and Write

CSV::LINQ->FromCSV("input.csv")
    ->Select(sub {
        my $r = shift;
        return { %{$r}, processed => 1 };
    })
    ->ToCSV("output.csv");

FEATURES

Lazy Evaluation

All query operations use lazy evaluation via iterators. Data is processed on-demand, not all at once.

# Only reads 10 records from file
my @top10 = CSV::LINQ->FromCSV("huge.csv")
    ->Take(10)
    ->ToArray();

RFC 4180 Compliant CSV Parsing

Correctly handles:

  • Quoted fields containing commas

  • Quoted fields containing double-quotes (escaped as "")

  • Quoted fields containing newlines

  • Empty fields

DSL Syntax

Simple key-value filtering without code references:

->Where(city => 'Tokyo', role => 'admin')

ARCHITECTURE

Iterator-Based Design

Each query operation returns a new query object wrapping an iterator (a code reference that produces one element per call, returning undef to signal end-of-sequence).

Memory Characteristics

Constant Memory Operations: Where, Select, SelectMany, Concat, Zip, Take, Skip, TakeWhile, SkipWhile, Distinct, ForEach, Count, Sum, Min, Max, Average, First, Any, All.

Linear Memory Operations: ToArray, ToList, ToCSV, OrderBy*, GroupBy, Last, Reverse.

PERFORMANCE

  • Filter early with Where before OrderBy or GroupBy.

  • Use Take to limit processing of large files.

  • Reuse ToArray() result rather than iterating the query twice.

COMPATIBILITY

This module is compatible with Perl 5.00503 and later.

Uses only Perl core features. No CPAN dependencies required.

Build system: pmake.bat (Perl 5.005_03 on Windows lacks make).

DIAGNOSTICS

Where() DSL requires even number of arguments

Where() was called in DSL form with an odd number of arguments. DSL form requires key-value pairs: ->Where(key => value, ...).

From() requires ARRAY reference

From() was called with a non-array-reference argument.

Cannot open '<filename>': <reason>

FromCSV() or ToCSV() could not open the file.

Sequence contains no elements

First(), Last(), Average(), Single() called on empty sequence.

No element satisfies the condition

First() or Last() with predicate found no matching element.

Sequence contains more than one element

Single() found more than one element.

SelectMany: selector must return an ARRAY reference

The selector passed to SelectMany() returned a non-array-reference.

ElementAt: index out of range

ElementAt() was called with a negative or out-of-range index.

COOKBOOK

Top N by numeric field

->OrderByNumDescending(sub { $_[0]{score} })
  ->Take(10)
  ->ToArray()

Group and count

->GroupBy(sub { $_[0]{category} })
  ->Select(sub {
      {
          Category => $_[0]{Key},
          Count    => scalar(@{$_[0]{Elements}}),
      }
  })
  ->ToArray()

Pagination

# Page 3, size 20
->Skip(40)->Take(20)->ToArray()

Unique values of a column

->Select(sub { $_[0]{category} })
  ->Distinct()
  ->ToArray()

CSV round-trip

CSV::LINQ->FromCSV("input.csv")
    ->Where(sub { $_[0]{active} eq '1' })
    ->ToCSV("active.csv");

LIMITATIONS AND KNOWN ISSUES

  • ToCSV Column Order Without headers

    When writing hash-reference sequences with ToCSV() and no headers option, column order is determined by sort keys of the first record. To guarantee a specific column order, always pass the headers option:

    $query->ToCSV("out.csv", headers => [qw(name age city)]);
  • Iterator Consumption

    Query objects can only be consumed once. The iterator is exhausted after terminal operations. Create a new query or save ToArray() result to reuse.

  • Undef Values

    Due to the iterator-based design, undef signals end-of-sequence. Sequences containing undef values may not work correctly with all operations. This is not a practical limitation for CSV data (which uses hash references).

  • Multi-line CSV Fields

    FromCSV() reads files one line at a time. CSV fields that span multiple lines (embedded newlines within double-quoted fields) are not yet supported.

  • No Parallel Execution

    All operations execute sequentially in a single thread.

BUGS

Please report any bugs or feature requests to:

Email: ina@cpan.org

SEE ALSO

  • LTSV::LINQ - LINQ-style query interface for LTSV files

  • JSON::LINQ - LINQ-style query interface for JSON/JSONL files

  • RFC 4180: https://www.ietf.org/rfc/rfc4180.txt

  • Microsoft LINQ documentation: https://learn.microsoft.com/en-us/dotnet/csharp/linq/

AUTHOR

INABA Hitoshi <ina@cpan.org>

Contributors

Contributions are welcome! See file: CONTRIBUTING.

ACKNOWLEDGEMENTS

LINQ Technology

This module is inspired by LINQ (Language Integrated Query), developed by Microsoft Corporation for the .NET Framework.

LINQ(R) is a registered trademark of Microsoft Corporation.

References

  • Microsoft LINQ: https://learn.microsoft.com/en-us/dotnet/csharp/linq/

  • RFC 4180 (CSV): https://www.ietf.org/rfc/rfc4180.txt

  • LTSV::LINQ (inspiration): https://metacpan.org/pod/LTSV::LINQ

COPYRIGHT AND LICENSE

Copyright (c) 2026 INABA Hitoshi

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

License Details

  • Artistic License 1.0: http://dev.perl.org/licenses/artistic.html

  • GNU General Public License version 1 or later: http://www.gnu.org/licenses/gpl-1.0.html

You may choose either license.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.