NAME
Tie::CArray - Space-efficient, typed, external C Arrays (Alpha)
SYNOPSIS
use Tie::CArray;
$dblarr = new Tie::CDoubleArray(10000);
@values = (0..10000);
$dblarr = new Tie::CIntArray(10000,\@values);
ref $dblarr eq 'Tie::CIntArray' and
$dblarr->set(0,1) and
$dblarr->get(0) == 1;
tie @array, 'Tie::CDoubleArray', 10000, \@values;
print $array[0], join ', ', @dbl[1..20];
DESCRIPTION
Several XS classes and methods to deal with typed, space-efficient C arrays are provided. Range checked and tieable.
There are hand-optimized, fast XS versions for the three basic C-types array of INT, DOUBLE and STRING and some sequential aggregate types int[2][], int[3][], int[4][], double[2][] and double[3][].
This roughly reflects to:
CArray
CIntArray int[]
CInt2Array int[][2]
CInt3Array int[][3]
CInt4Array int[][4]
CDoubleArray double[]
CDouble2Array double[][2]
CDouble3Array double[][3]
CStringArray *char[]
Typed C arrays need about three times less space then untyped perl arrays. Such as various computional geometry modules dealing with 10.000 - 200.000 double[3]. Modification is done in-place and preferably in bulk.
It might also be easier to write XSUBs by converting the data to CArray's before, pass this pointer to the C func, and handle the results in Perl then.
The Fetch/Store operations with tied arrays copy the scalars to perl and back, so it shouldn't be abused for BIG data.
Perl's safemalloc/safefree is used.
EFFICIENT GROW
CArray's are efficiently growable, which is needed for several algorithms, such as placing extra sentinels at the end, adding three points for a super-triangle for Delaunay triangulation, ...
Extra space is always allocated to fit nicely into the page boundary, defined by the system granularity. For now it's 2048, half of the usual 4096, but this can be tweaked (e.g. for many small arrays) in the C function freesize().
CLASS METHODS
- new ( size, [ template, [ values ]] )
-
The new method is provided for all classes, the optional arrayref initarg applies only to the base
Array
classes, not the aggregate.The constructor creates a new
Tie::CArray
object. For theArray
classes the second optional argument is used to initialize it with an array. The second argument may also be used by a seperate init call. If the optionally provided values arrayref is shorter that the allocated size, the rest will stay uninitialized.$D = new Tie::CDoubleArray( 1000, ,[0..999] );
- len ()
-
The len method returns the length of the array, 1+ the index of the last element. To enlarge the array grow() should be used.
$D = new Tie::CDoubleArray(5000); for my $j (0 .. $D->len-1) { $D->set($_, 0.0)); } $D->len; # => 5000
- get ( index )
-
get returns the value at the given index, which will be scalar or a list. Croaks with "index out of range" on wrong index.
$I = new Tie::CIntArray(2,[0,1]); print $I->get(1); # => 1 print $I->get(2); => croak "index out of range" $I2 = new Tie::CInt2Array(2,[[0,1]]); print $I->get(0); # => (0 1)
- set ( index, value )
-
The set method is provided for all classes. It changes the value at the given index. The value should be either a scalar or an arrayref. Croaks with "index out of range" on wrong index. Returns nothing.
$I = new Tie::CIntArray(100); map { $I->set($_,$i[$_]) } (0..99); $I->set(99,-1); $I->set(100); => "index out of range" $I2 = Tie::CInt2Array->new(2); $I2->set(0, [1,0]); $I2->set(1, [0,1]);
- list ()
-
Returns the content of the flat array representation as arrayref.
- init ( ARRAYREF )
-
Initializes the array with the values from the arrayref. Returns nothing.
This is the same as the second new argument. If the provided values arrayref is shorter that the allocated size, the rest will stay uninitialized.
$I = Tie::CIntArray::new(100) ; $I->init( [0..99] );
- grow ( n )
-
Adds room for n elements to the array. These elements must be initialized extra with set. To support faster grow() a certain number of already pre-allocated items at the end of the array will be used. (see free) Returns nothing.
- delete ( index )
-
Deletes the item at the given index. free is incremented and the remaining array items are shifted. Returns nothing.
- get_grouped_by ( size, index )
-
Returns a list of subsequent values. It returns a list of size indices starting at size * index. This is useful to abuse the unstructured array as typed array of the same type, such as *double[3] or *int[2].
But this is normally not used since fast get methods are provided for the sequential classes, and those methods can be used on flat arrays as well. (Internally all sequential arrays are flat).
Tie::CInt3Array::get($I,0) == $I->get_grouped_by(3,0)
$ptr->get_grouped_by(2,4) returns the 4-th pair if the array is seen as list of pairs.
$ptr->get_grouped_by(3,$i) => (ptr[i*3] ptr[i*3+1] ptr[i*3+2] )
- slice ( start, size, [ stride=1 ] )
-
C++ like slice operator on a flat array. - In contrast to get_grouped_by() which semantics are as on a grouped array.
Returns a list of size items, starting at start, with interim offsets of stride which defaults to 1. This is useful to return columns or rows of a flat matrix.
$I = new Tie::CIntArray (9, [0..8]); $I->slice ( 0, 3, 3 ); # 1st column => (0 3 6) $I->slice ( 0, 3, 1 ); # 1st row => (0 1 2) $I->get_grouped_by(3, 0); => (0 1 2)
- isort ( [ cmpfunc ] )
-
"Indirect sort" (numerically ascending only for now)
Returns a fresh sorted index list of integers (0 .. len-1) resp. a CIntArray object in scalar context.
The optional cmpfunc argument is not yet implemented.
- nreverse ()
-
"Reverse in place". (The name comes from lisp, where `n' denotes the destructive version). Destructively swaps all array items. Returns nothing.
To perform a copying reverse define
sub reverse { nreverse($_[0]->copy()) }
SOME SEQUENTIAL CLASSES and CONVERSION
To mix and change parallel and sequential data structures, some sequential types (int[2],int[3],int[4],double[2],double[3]) are derived from their base classes with fast, hand-optimized get and set methods to return and accept lists instead of scalars.
The input argument must be an arrayref, the result will be an array in list context and an arrayref in scalar context.
Conversion
The Arrays for Int2, Int3, Int4, Double2 and Double3 can also be converted from and to parallel base arrays with fast XS methods. Parallel arrays are sometimes preferred over structured arrays, but delete/ insert of structures in parallel arrays is costly.
# three parallel CIntArray's
$X = new Tie::CIntArray(1000);
$Y = new Tie::CIntArray(1000);
$Z = new Tie::CIntArray(1000);
# copy to one sequential *int[3], new memory
$I = $X->ToInt3($Y,$Z);
# or to an existing array
$I = new Tie::CIntArray(3000);
$I = $X->ToInt3($Y,$Z,$I);
# copies back with allocating new memory
($X, $Y, $Z) = $I->ToPar();
# copies back with reusing some existing memory (not checked!)
($X, $Y, $Z) = $I->ToPar($X,$Z); # Note: I3 will be fresh.
- ToPar ( SeqArray, [ Tie::CArray,... ] )
-
This returns a list of Tie::CArray objects, copied from the sequential object to plain parallel CArray objects. This is a fast slice.
*int[2] => (*int, *int) Tie::CInt2Array::ToPar Tie::CInt3Array::ToPar Tie::CInt4Array::ToPar Tie::CDouble2Array::ToPar Tie::CDouble3Array::ToPar
If the optional CArray args are given the memory for the returned objects are not new allocated, the space from the given objects is used instead.
- To$Type$Num ( CArray, ..., [ CArray ] )
-
This returns a sequential CArray object copied from the parallel objects given as arguments to one sequential CArray. This is a fast map.
*int, *int => *int[2] Tie::CIntArray::ToInt2 Tie::CIntArray::ToInt3 Tie::CIntArray::ToInt4 Tie::CDoubleArray::ToDouble2 Tie::CDoubleArray::ToDouble3
If the last optional CArray arg is defined the memory for the returned object is not new allocated, the space from the given object is used instead.
ARBITRARY STRUCTURED ARRAYS, PACK-STYLE TEMPLATES (not yet)
Some special sequential arrays are hand-optimized for speed but can hold only limited data types (int[2] .. double[3]).
To support arbitrary structured arrays a second template argument may be provided which must be a arrayref of a hash, where its keys name the accessor and the values pack-style letters.
This does not work yet!
tie @A, 'Tie::CArray', 200,
[ x => 'd',
y => 'd',
z => 'd',
attr => [ age => 'i',
dirty => 'i',
owner => 's' ],
refcount => 'i' ];
$A->init ...
for (my $i = 0; $i < @A; $i++) {
printf("x,y,z: (%d %d %d),\nattr: (age=%d, dirty=%d, owner=%s)\nrefcount=%d",
$A[$i]->{x}, $A[$i]->{y}, $A[$i]->{z},
$A[$i]->{attr}->{age}, $A[$i]->{attr}->{dirty}, $A[$i]->{attr}->{owner},
$A[$i]->{refcount}
);
}
tie @utmp, 'Tie::CArray', 100,
[ ut_type => 's',
ut_pid => 'i',
ut_line => 'a12',
ut_id => 'a4',
ut_user => 'a32',
ut_host => 'a256',
ut_exit => [ # struct exit_status
e_termination => 's',
e_exit => 's' ],
ut_session => 'l',
ut_tv => [ # struct timeval
tv_sec => 'l'
tv_usec => 'l' ],
ut_addr_v6 => 'l4',
pad => 'a20' ];
The following subset of pack() template letters is supported:
- i
-
signed integer (default)
- I
-
unsigned integer
- c
-
signed character (one byte integer)
- c
-
unsigned character (one byte integer)
- s
-
signed short integer
- S
-
unsigned short integer
- n
-
unsigned short integer in network byte order
- l
-
signed long integer
- L
-
unsigned long integer
- N
-
unsigned long integer in network byte order
- q
-
signed long long integer (long long/int64)
(only if the system has quads and perl was compiled for 64 bit)
- Q
-
unsigned long long integer (unsigned long long/uint64)
(only if the system has quads and perl was compiled for 64 bit)
- L
-
unsigned long integer
- f
-
float
- d
-
double
- aN
-
fixed-length, null-padded ASCII string of length N
- AN
-
fixed-length, space-padded ASCII string of length N
- ZN
-
fixed-length, null-terminated ASCII string of length N
INTERNAL METHODS
- DESTROY ()
-
This used to crash on certain DEBUGGING perl's, but seems to be okay now. Returns nothing.
- Tie::CArray::itemsize ( )
- Tie::CStringArray::itemsize ( [index] )
-
Returns the size in bytes per item stored in the array. This is only used internally to optimize memory allocation and the free list.
A CStringArray object accepts the optional index argument, which returns the string length at the given index. Without argument it returns the size in bytes of a char * pointer (which is 4 on 32 bit systems).
- copy ()
-
Returns a freshly allocated copy of the array with the same contents.
- _freelen ()
-
Internal only. Returns the number of free elements at the end of the array. If grow() needs less or equal than free elements to be added, no new room will be allocated.
This is primarly for performance measures.
TIEARRAY METHODS
Not tested yet!
- tie (var, type, size)
-
After tying a array variable to an
Tie::CArray
class the variable can be used just as any normal perl array.tie @array, 'Tie::CDoubleArray', 200; print $array[200]; => croak "index out of range"
SEE ALSO
perlxs(1), "tie" in perlfunc, Tie::Array(3), Geometry::Points(3), C::Dynalib::Poke(3), Tie::MmapArray(3)
TODO
Not all pack letters are implemented yet.
AUTHOR
Reini Urban <rurban@x-ray.at>
Andrew Ford wrote the arbitrary structure code.
COPYRIGHT
Copyright (c) 1999 Reini Urban.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
WARNING
The author makes NO WARRANTY, implied or otherwise, about the suitability of this software for safety or security purposes.
CArrays are now always ranged checked which cannot be turned off, so it's not that dangerous anymore to read or write to not-owned memory areas.
The author shall not in any case be liable for special, incidental, consequential, indirect or other similar damages arising from the use of this software.
Your mileage will vary. If in any doubt DO NOT USE IT. You've been warned.
BUGS
There are certainly some. Not fully tested yet. Tests for copy, grow, delete, tie are pending. Also some more conversion tests, esp. with double and degenerate (grow, cut) cases.
realloc() in string_set() with DEBUGGING perl fails sometimes.
An implicit DESTROY invocation sometimes asserts a DEBUGGING perl, regardless if PERL_MALLOC or the WinNT msvcrt.dll malloc is used. (5.00502 - 5.00558) Esp. on perl shutdown, when freeing the extra objects at the second GC.
This became much better in 0.08 than in previous versions.
This is alpha, not fully tested yet!
VERSION
$Revision 0.13 $ $Date 2008-01-20 $