Sort::DJB Performance

Benchmark results comparing Sort::DJB against other Perl sorting implementations.

Test environment:

Key Findings

Summary: n=100,000 int32 elements

With system-installed djbsort (AVX2 SIMD)

| Rank | Implementation | Rate | Speedup vs Perl sort | |------|-------------------------------------|---------|----------------------| | 1 | Sort::DJB (AVX2 bitonic network) | 407/s | 9.0x | | 2 | Sort::Packed (pack + sort + unpack) | 147/s | 3.2x | | 3 | Sort::Key::Radix (O(n) radix sort) | 103/s | 2.3x | | 4 | Perl builtin sort | 45.5/s | 1.0x (baseline) | | 5 | Sort::Key (key-cached mergesort) | 34.3/s | 0.75x | | 6 | Sort::DJB::Pure (Perl bitonic net) | 0.81/s | 0.018x |

With bundled portable C (no SIMD)

| Rank | Implementation | Rate | Speedup vs Perl sort | |------|-------------------------------------|---------|----------------------| | 1 | Sort::Packed (pack + sort + unpack) | 150/s | 3.4x | | 2 | Sort::DJB XS (portable4 bitonic) | 105/s | 2.4x | | 3 | Sort::Key::Radix (O(n) radix sort) | 98.4/s | 2.2x | | 4 | Perl builtin sort | 44.3/s | 1.0x (baseline) | | 5 | Sort::Key (key-cached mergesort) | 34.1/s | 0.77x | | 6 | Sort::DJB::Pure (Perl bitonic net) | 0.65/s | 0.015x |

Detailed Benchmarks (portable C build)

Naming

┌─────────────┬──────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Label │ Module │ What is it? │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ XS_int32 │ Sort::DJB │ djbsort C library via XS, sorting signed 32-bit integers │
├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ XS_float64 │ Sort::DJB │ djbsort C library via XS, sorting 64-bit doubles │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Pure_int32 │ Sort::DJB::Pure │ Pure Perl bitonic sorting network, sorting signed 32-bit integers │
├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PackedR_i32 │ Sort::Packed │ Raw pre-packed buffer sort -- the data is already pack("l*")ed before the benchmark loop, so it measures only sort_packed('l', $buf) with no pack/unpack overhead. int32. │
├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PackedR_i64 │ Sort::Packed │ Same raw pre-packed sort, but on pack("q*") int64 buffers │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PackedR_f64 │ Sort::Packed │ Same raw pre-packed sort, but on pack("d*") float64 buffers │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ SKey_int │ Sort::Key │ Sort::Key::nsort() on int32 data -- used in the cross-type summary section where the suffix indicates the test data, not the function │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ SKey_float │ Sort::Key │ Sort::Key::nsort() on float64 data -- same function, different input │ ├─────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Radix_float │ Sort::Key::Radix │ Sort::Key::Radix::fsort() -- radix sort on float64 data │ └─────────────┴──────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

int32 sorting (signed 32-bit integers)

  n =      10 elements:
                  Rate    Pure_int32 Packed_int32 SKey_nsort Radix_isort Perl_sort XS_int32
  Pure_int32     58119/s         --         -93%       -95%        -97%      -97%     -98%
  Packed_int32  824560/s      1319%           --       -34%        -57%      -58%     -70%
  SKey_nsort   1254593/s      2059%          52%         --        -35%      -36%     -55%
  Radix_isort  1931575/s      3223%         134%        54%          --       -2%     -30%
  Perl_sort    1974992/s      3298%         140%        57%          2%        --     -29%
  XS_int32     2778978/s      4682%         237%       122%         44%       41%       --

  n =     100 elements:
                 Rate   Pure_int32 SKey_nsort Radix_isort Packed_int32 Perl_sort XS_int32
  Pure_int32     3722/s         --       -97%        -97%         -98%      -98%     -99%
  SKey_nsort   138099/s      3610%         --         -3%         -28%      -38%     -54%
  Radix_isort  142736/s      3735%         3%          --         -25%      -36%     -53%
  Packed_int32 190788/s      5026%        38%         34%           --      -14%     -37%
  Perl_sort    221362/s      5847%        60%         55%          16%        --     -27%
  XS_int32     301306/s      7995%       118%        111%          58%       36%       --

  n =    1000 elements:
                Rate   Pure_int32 SKey_nsort Perl_sort Radix_isort Packed_int32 XS_int32
  Pure_int32     184/s         --       -98%      -98%        -99%         -99%     -99%
  SKey_nsort    8508/s      4535%         --      -21%        -56%         -57%     -63%
  Perl_sort    10755/s      5760%        26%        --        -44%         -45%     -53%
  Radix_isort  19328/s     10431%       127%       80%          --          -2%     -16%
  Packed_int32 19692/s     10629%       131%       83%          2%           --     -14%
  XS_int32     22980/s     12420%       170%      114%         19%          17%       --

  n =   10000 elements:
               Rate   Pure_int32 SKey_nsort Perl_sort Radix_isort XS_int32 Packed_int32
  Pure_int32   11.9/s         --       -98%      -98%        -99%     -99%         -99%
  SKey_nsort    600/s      4924%         --      -19%        -62%     -63%         -67%
  Perl_sort     743/s      6122%        24%        --        -53%     -54%         -59%
  Radix_isort  1591/s     13228%       165%      114%          --      -1%         -13%
  XS_int32     1600/s     13303%       167%      115%          1%       --         -13%
  Packed_int32 1833/s     15255%       206%      147%         15%      15%           --

  n =  100000 elements:
               Rate    SKey_nsort   Perl_sort Radix_isort    XS_int32 Packed_int32
  SKey_nsort   34.1/s          --        -23%        -65%        -67%         -77%
  Perl_sort    44.3/s         30%          --        -55%        -58%         -71%
  Radix_isort  98.4/s        189%        122%          --         -6%         -35%
  XS_int32      105/s        207%        136%          6%          --         -31%
  Packed_int32  150/s        341%        240%         53%         44%           --

int32down sorting (signed 32-bit integers, descending)

  n =     100 elements:
                 Rate   Pure_int32dn Radix_risrt SKey_rnsort Perl_sort_rv XS_int32dn
  Pure_int32dn   2993/s           --        -98%        -98%         -99%       -99%
  Radix_risrt  138606/s        4531%          --         -8%         -33%       -58%
  SKey_rnsort  150651/s        4934%          9%          --         -27%       -54%
  Perl_sort_rv 206259/s        6792%         49%         37%           --       -37%
  XS_int32dn   326249/s       10802%        135%        117%          58%         --

  n =   10000 elements:
               Rate   Pure_int32dn SKey_rnsort Perl_sort_rv XS_int32dn Radix_risrt
  Pure_int32dn 10.9/s           --        -98%         -99%       -99%        -99%
  SKey_rnsort   561/s        5065%          --         -24%       -62%        -63%
  Perl_sort_rv  734/s        6658%         31%           --       -50%        -52%
  XS_int32dn   1469/s       13425%        162%         100%         --         -3%
  Radix_risrt  1517/s       13866%        170%         107%         3%          --

  n =  100000 elements:
               Rate    SKey_rnsort Perl_sort_rv  Radix_risrt   XS_int32dn
  SKey_rnsort  31.1/s           --         -42%         -67%         -71%
  Perl_sort_rv 53.9/s          74%           --         -43%         -49%
  Radix_risrt  94.7/s         205%          76%           --         -11%
  XS_int32dn    106/s         241%          97%          12%           --

uint32 sorting (unsigned 32-bit integers)

  n =     100 elements:
                Rate   Pure_uint32 SKey_nsort Packed_u32 Perl_sort Radix_usort XS_uint32
  Pure_uint32   2853/s          --       -98%       -98%      -98%        -99%      -99%
  SKey_nsort  134825/s       4626%         --       -25%      -26%        -45%      -53%
  Packed_u32  180959/s       6243%        34%         --       -1%        -27%      -37%
  Perl_sort   183338/s       6327%        36%         1%        --        -26%      -36%
  Radix_usort 246272/s       8533%        83%        36%       34%          --      -15%
  XS_uint32   288653/s      10018%       114%        60%       57%         17%        --

  n =   10000 elements:
              Rate   Pure_uint32 SKey_nsort Perl_sort XS_uint32 Packed_u32 Radix_usort
  Pure_uint32 10.6/s          --       -98%      -98%      -99%       -99%        -99%
  SKey_nsort   469/s       4343%         --      -31%      -64%       -71%        -71%
  Perl_sort    676/s       6297%        44%        --      -48%       -58%        -59%
  XS_uint32   1306/s      12263%       178%       93%        --       -19%        -20%
  Packed_u32  1611/s      15155%       243%      138%       23%         --         -1%
  Radix_usort 1631/s      15342%       248%      141%       25%         1%          --

  n =  100000 elements:
              Rate    SKey_nsort   Perl_sort Radix_usort   XS_uint32  Packed_u32
  SKey_nsort  32.4/s          --        -27%        -64%        -67%        -77%
  Perl_sort   44.2/s         37%          --        -50%        -55%        -69%
  Radix_usort 88.8/s        174%        101%          --         -9%        -37%
  XS_uint32   97.9/s        203%        122%         10%          --        -30%
  Packed_u32   141/s        334%        218%         58%         44%          --

int64 sorting (signed 64-bit integers)

  n =     100 elements:
                Rate   Pure_int64 Packed_i64 SKey_nsort Radix_isort Perl_sort XS_int64
  Pure_int64    3109/s         --       -97%       -98%        -98%      -98%     -99%
  Packed_i64   96399/s      3000%         --       -24%        -35%      -48%     -65%
  SKey_nsort  126704/s      3975%        31%         --        -15%      -31%     -54%
  Radix_isort 149419/s      4705%        55%        18%          --      -19%     -45%
  Perl_sort   184004/s      5818%        91%        45%         23%        --     -33%
  XS_int64    273685/s      8702%       184%       116%         83%       49%       --

  n =   10000 elements:
              Rate   Pure_int64 SKey_nsort Perl_sort Packed_i64 Radix_isort XS_int64
  Pure_int64  10.1/s         --       -98%      -99%       -99%        -99%     -99%
  SKey_nsort   521/s      5076%         --      -24%       -51%        -56%     -66%
  Perl_sort    687/s      6730%        32%        --       -36%        -42%     -56%
  Packed_i64  1068/s     10515%       105%       55%         --        -10%     -31%
  Radix_isort 1191/s     11738%       129%       73%        12%          --     -23%
  XS_int64    1548/s     15287%       197%      125%        45%         30%       --

  n =  100000 elements:
              Rate    SKey_nsort   Perl_sort    XS_int64 Radix_isort  Packed_i64
  SKey_nsort  30.4/s          --        -29%        -66%        -68%        -75%
  Perl_sort   42.7/s         40%          --        -52%        -55%        -65%
  XS_int64    88.3/s        190%        107%          --         -8%        -28%
  Radix_isort 95.7/s        215%        124%          8%          --        -21%
  Packed_i64   122/s        301%        186%         38%         27%          --

float64 sorting (64-bit doubles)

  n =     100 elements:
                Rate   Pure_flt64 Perl_sort Packed_f64 SKey_nsort Radix_fsort XS_float64
  Pure_flt64    3349/s         --      -97%       -97%       -98%        -98%       -99%
  Perl_sort   112783/s      3268%        --        -7%       -18%        -24%       -62%
  Packed_f64  121740/s      3536%        8%         --       -11%        -18%       -59%
  SKey_nsort  137452/s      4005%       22%        13%         --         -8%       -54%
  Radix_fsort 148652/s      4339%       32%        22%         8%          --       -50%
  XS_float64  297964/s      8798%      164%       145%       117%        100%         --

  n =   10000 elements:
              Rate   Pure_flt64 Perl_sort SKey_nsort Radix_fsort Packed_f64 XS_float64
  Pure_flt64  9.90/s         --      -98%       -98%        -99%       -99%       -99%
  Perl_sort    467/s      4619%        --       -18%        -58%       -63%       -71%
  SKey_nsort   569/s      5644%       22%         --        -49%       -55%       -64%
  Radix_fsort 1123/s     11234%      140%        97%          --       -11%       -30%
  Packed_f64  1263/s     12656%      170%       122%         13%         --       -21%
  XS_float64  1600/s     16052%      242%       181%         43%        27%         --

  n =  100000 elements:
              Rate     Perl_sort  SKey_nsort Radix_fsort  XS_float64  Packed_f64
  Perl_sort   21.9/s          --        -33%        -58%        -78%        -80%
  SKey_nsort  32.5/s         49%          --        -37%        -67%        -70%
  Radix_fsort 51.6/s        136%         59%          --        -48%        -52%
  XS_float64  99.7/s        356%        207%         93%          --         -7%
  Packed_f64   108/s        392%        231%        109%          8%          --

Cross-type comparison at n=100,000

              Rate   Perl_float SKey_int SKey_float Perl_int Radix_float XS_float64 XS_int64 Radix_int XS_uint32 XS_int32
  Perl_float  20.6/s         --     -30%       -32%     -53%        -60%       -75%     -77%      -77%      -78%     -79%
  SKey_int    29.3/s        42%       --        -3%     -33%        -43%       -64%     -68%      -68%      -69%     -70%
  SKey_float  30.2/s        47%       3%         --     -30%        -41%       -63%     -67%      -67%      -68%     -69%
  Perl_int    43.5/s       111%      49%        44%       --        -16%       -46%     -52%      -52%      -54%     -55%
  Radix_float 51.5/s       150%      76%        70%      18%          --       -36%     -43%      -44%      -46%     -47%
  XS_float64  81.0/s       292%     177%       168%      86%         57%         --     -11%      -11%      -14%     -16%
  XS_int64    91.1/s       342%     211%       201%     110%         77%        13%       --       -0%       -4%      -6%
  Radix_int   91.3/s       343%     212%       202%     110%         77%        13%       0%        --       -3%      -5%
  XS_uint32   94.6/s       359%     223%       213%     118%         84%        17%       4%        4%        --      -2%
  XS_int32    96.5/s       367%     229%       219%     122%         87%        19%       6%        6%        2%       --

Call overhead for small arrays

  n = 5 elements:
                 Rate    Pure_int32 SKey_nsort Radix_isort   Perl_sort    XS_int32
  Pure_int32   122736/s          --       -94%        -95%        -96%        -97%
  SKey_nsort  1894758/s       1444%         --        -25%        -31%        -55%
  Radix_isort 2535828/s       1966%        34%          --         -8%        -40%
  Perl_sort   2749454/s       2140%        45%          8%          --        -35%
  XS_int32    4221744/s       3340%       123%         66%         54%          --

  n = 10 elements:
                 Rate    Pure_int32 SKey_nsort Radix_isort   Perl_sort    XS_int32
  Pure_int32    46795/s          --       -95%        -96%        -97%        -98%
  SKey_nsort  1031797/s       2105%         --        -21%        -43%        -53%
  Radix_isort 1302285/s       2683%        26%          --        -28%        -41%
  Perl_sort   1815337/s       3779%        76%         39%          --        -18%
  XS_int32    2203017/s       4608%       114%         69%         21%          --

Sort::Packed raw buffer performance (no pack/unpack overhead)

  n =    1000 elements:
               Rate   PackedR_f64 PackedR_i64 PackedR_i32
  PackedR_f64 16483/s          --        -33%        -50%
  PackedR_i64 24596/s         49%          --        -25%
  PackedR_i32 32905/s        100%         34%          --

  n =   10000 elements:
              Rate   PackedR_i64 PackedR_f64 PackedR_i32
  PackedR_i64 1473/s          --        -21%        -58%
  PackedR_f64 1866/s         27%          --        -47%
  PackedR_i32 3500/s        138%         88%          --

  n =  100000 elements:
             Rate   PackedR_f64 PackedR_i64 PackedR_i32
  PackedR_f64 164/s          --        -21%        -44%
  PackedR_i64 208/s         26%          --        -29%
  PackedR_i32 293/s         78%         41%          --