NAME

Panda::Lib - Collection of useful functions and classes with Perl and C interface.

DESCRIPTION

Panda::Lib contains a number of very fast useful functions, written in C. You can use it from Perl or directly from XS code. Also it contains several C++ classes.

SYNOPSIS

use Panda::Lib qw/ hash_merge merge compare clone fclone crypt_xor string_hash string_hash32 /;
                   
$result = hash_merge($dest, $source, $flags);
$result = merge($dest, $source, $flags);
$is_equal = compare($hash1, $hash2);
$is_equal = compare($array1, $array2);
$cloned = lclone($data);
$cloned = fclone($data);
$crypted = crypt_xor($data, $key);
$val = string_hash($str);
$val = string_hash32($str);

C SYNOPSIS

 #include <xs/lib.h>
 using namespace xs::lib;
 
 HV* result = hash_merge(hvdest, hvsource, flags);
 SV* result = merge(hvdest, hvsource, flags);
 bool is_equal = hash_cmp(hv1, hv2);
 bool is_equal = av_cmp(av1, av2);
 SV* cloned = clone(sv, with_cross_checks);
 panda::string str = sv2string(sv, ref_type);
 
 #include <panda/lib.h>
 using namespace panda::lib;
 
 char* crypted = crypt_xor(source, slen, key, klen);
 uint32_t val = string_hash32(str, len);
 uint64_t val = string_hash(str, len);
 
 
 #include <panda/string.h>
 using panda::string;
 
 string abc("lala");
 ... // everything that std::string supports
 
 
 #include <panda/lib/memory.h>
 void* mem = StaticMemoryPool<128>::instance()->alloc(); // extremely fast memory allocations
 void* mem = StaticMemoryPool<128>::threaded_instance()->alloc(); // thread-safe version (still very fast)
 void* mem = ObjectAllocator::instance()->alloc(256); // dynamic-size fast allocator
 
 class MyClass : public AllocatedObject<MyClass> {}
 MyClass* obj = new MyClass(); // get fast and thread-safe allocations with less code

 class SingleThreadedClass : public AllocatedObject<SingleThreadedClass, false> {} // faster but thread-unsafe
 
 // override behaviour from thread-unsafe to thread-safe
 class MultiThreadedClass : public SingleThreadedClass, public AllocatedObject<MultiThreadedClass> {
     using AllocatedObject<MultiThreadClass>::operator new;
     using AllocatedObject<MultiThreadClass>::operator delete;
 }
 
 MemoryPool mypool(32);
 void* mem = mypool.alloc(); // custom pool with maximum speed, but thread-unsafe

PERL FUNCTIONS

hash_merge (\%dest, \%source, [$flags])

Merges hash $source into $dest. Merge is done extremely fast. $source and $dest must be HASHREFS or undefs. New keys from source are added to dest. Existing keys(values) are replaced. If a key contains HASHREF both in source and dest, they are merged recursively. Otherwise it gets replaced by value from source. Returns resulting hashref (it may or may not be the the same ref as $dest, depending on $flags provided).

$flags is a bitmask of these flags:

MERGE_ARRAY_CONCAT

By default, if a key contains ARRAYREF both in source and dest, it gets replaced by array from source. If you enable this flag, such arrays will be concatenated (like: push @{$dest->{key}}, @{$source->{key}).

MERGE_ARRAY_MERGE

If a key contains ARRAYREF both in source and dest, it gets merged. It means that $dest->{key}[0] is merged with $source->{key}[0], and so on. Values are merged using following rules: if both are hashrefs or arrayrefs, they are merged recursively, otherwise value in dest gets replaced.

MERGE_LAZY

If you set this flag, merge process won't override any existing and defined values in dest. Keep in mind that if you also set MERGE_ARRAY_MERGE, then the same is in effect while merging array elements.

my $hash1 = {a => 1, b => undef};
my $hash2 = {a => 2, b => 3, c => undef };
hash_merge($hash1, $hash2, MERGE_LAZY);
# $hash1 is {a => 1, b => 3, c => undef };
MERGE_SKIP_UNDEF

If enabled, values from source that are undefs won't replace anything in dest.

my $hash1 = {a => 1};
my $hash2 = {a => undef, b => undef, c => 2};
hash_merge($hash1, $hash2, MERGE_SKIP_UNDEF);
# $hash1 is {a => 1, c => 2};
MERGE_DELETE_UNDEF

If enabled, values from source that are undefs acts as a 'deleters', i.e. the corresponding values get deleted from dest.

my $hash1 = {a => 1, b => 2};
my $hash2 = {a => undef};
hash_merge($hash1, $hash2, MERGE_DELETE_UNDEF);
# $hash1 is {b => 2};
MERGE_COPY_DEST

Makes deep copy of $dest, merges it with source and returns this new hashref.

MERGE_COPY_SOURCE

By default, if any value from source replaces value from dest, it doesn't get deep copied. For example:

my $hash1 = {};
my $hash2 = {a => [1,2]};
hash_merge($hash1, $hash2);
shift @{$hash1->{a}};
say scalar @{$hash2->{a}}; # prints 1

Moreover, even primitive values are not copied, instead they get aliased for speed. For example:

my $hash1 = {};
my $hash2 = {a => 'mystring'};
hash_merge($hash1, $hash2);
substr($hash1->{a}, 0, 2);
say $hash2->{a}; # prints 'string'

If you enable this flag, replacing values from source will be copied (references - deep copied).

MERGE_COPY

It is MERGE_COPY_DEST + MERGE_COPY_SOURCE

This is how undefined $source or undefined $dest are handled:

If $source is undef

Nothing is merged, however if MERGE_COPY_DEST is set, deep copy of $dest is still returned. If $dest is also undef, then regardless of MERGE_COPY_DEST flag, empty hashref is returned.

If $dest is undef

Empty hashref is created, merged with $source and returned.

merge ($dest, $source, [$flags])

Acts much like 'hash_merge', but receives any scalar as $dest and $source, not only hashrefs. Returns merged value which may or may not be the same scalar (modified or not) as $dest.

This function does the same work as 'hash_merge' does for its elements. I.e. if both $dest and $source are HASHREFs then they are merged via 'hash_merge'. If both are ARRAYREFs, then depending on $flags, $dest are either replaced, concatenated or merged. Otherwise $source replaces $dest following the rules described in 'hash_merge' function with respect to flags MERGE_COPY_DEST, MERGE_COPY_SOURCE and MERGE_LAZY.

For example, if $source and $dest are scalars (not refs), and no flags provided, then $dest becomes equal $source. If MERGE_LAZY is provided and $dest is not an undef, $dest is unchanged. If MERGE_COPY_DEST is provided then $dest is unchaged and the result is returned in a new scalar. And so on.

However there is one difference: if $dest and $source are primitive scalars, instead of creating an alias, the $source variable is copied to $dest (or new result). If MERGE_COPY_SOURCE is disabled, copying is not deep, like $dest = $source.

lclone ($source)

Light clone: makes a deep copy of $source and returns it.

Does not handle cross-references: references to the same data will be different references. If a cycled reference is present in $source, it will croak.

Handles CODEREFs and IOREFs, but doesn't clone it, just copies pointer to the same CODE and IO into new reference. All other data types are cloned normally.

If clone encounters a blessed object and it has a HOOK_CLONE method, the return value of this method is used instead of a default behaviour. You can call [lf]clone($self) again from HOOK_CLONE if you need to, for example to prevent cloning some of your properties:

sub HOOK_CLONE {
    my $self = shift;
    my $tmp = delete local $self->{big_obj_backref};
    my $ret = lclone($self);
    $ret->{big_obj_backref} = $tmp;
    return $ret;
}

In this case second lclone() call won't call HOOK_CLONE again and will clone $self in a standart manner.

fclone ($source)

Full clone: same as lclone() but handles cross-references: references to the same data will be the same references. If a cycled reference is present in $source, it will remain cycled in cloned data.

clone ($source, [$with_cross_checks])

If $with_cross_checks is false or omitted, behaves like lclone(), otherwise like fclone()

compare ($data1, $data2)

Performs deep comparison and returns true if every element of $data1 is equal to corresponding element of $data2.

The rules of equality for two elements (including the top-level $data1 and $data2 itself):

If any of two elements is a reference.
If any of elements is a blessed object

If they are not objects of the same class, they're not equal

If class has overloaded '==' operation, it is used for checking equality. If not, objects' underlying data structures are compared.

If both elements are hash refs.

Equal if all of the key/value pairs are equal.

If both elements are array refs.

Equal if corresponding elements are equal (a[0] equal b[0], etc).

If both elements are code refs.

Equal if they are references to the same code.

If both elements are IOs (IO refs)

Equal if both IOs contain the same fileno.

If both elements are typeglobs

Equal if both are references to the same glob.

If both elements are refs to anything.

They are dereferenced and checked again from the beginning.

Otherwise (one is ref, another is not) they are not equal
If both elements are not references

Equal if perl's 'eq' or '==' (depending on data type) returns true.

crypt_xor ($string, $key)

Performs round-robin XOR $string with $key. Algorithm is symmetric, i.e.:

crypt_xor(crypt_xor($string, $key), $key) eq $string

string_hash ($string)

Calculates 64-bit hash value for $string. Currently uses MurMurHash64A algorithm (very fast).

string_hash32 ($string)

Calculates 32-bit hash value for $string. Currently uses jenkins_one_at_a_time_hash algorithm.

C FUNCTIONS

HV* xs::lib::hash_merge (HV* dest, HV* source, IV flags)

SV* xs::lib::merge (SV* dest, SV* source, IV flags)

SV* xs::lib::clone (SV* source, bool cross_references)

bool xs::lib::hv_compare (HV*, HV*)

bool xs::lib::av_compare (AV*, AV*)

bool xs::lib::sv_compare (SV*, SV*)

uint64_t panda::lib::string_hash (const char* str, size_t len)

uint64_t panda::lib::string_hash (const char* str)

uint32_t panda::lib::string_hash32 (const char* str, size_t len)

uint32_t panda::lib::string_hash32 (const char* str)

All functions above behaves like its perl equivalents. See PERL FUNCTIONS docs.

char* panda::lib::crypt_xor (const char* source, size_t slen, const char* key, size_t klen, char* dest = NULL)

Performs XOR crypt. If 'dest' is null, mallocs and returns new buffer. Buffer must be freed by user manually via 'free'. If 'dest' is not null, places result into this buffer. It must have enough space to hold the result.

panda::string xs::lib::sv2string (SV* svstr, panda::string::ref_t ref = panda::string::COPY)

Creates panda::string from SV string. If 'ref' is COPY then content of SV is copied to string. If 'ref' is REF, then returned string is a copy-on-write string holding SV's buffer. In this case you must NOT change or delete your SV until you're done with string.

Panda::Lib installs a typemap for panda::string, so it is okay to receive it in XS function params without copying.

using panda::string;

...

void
myfunc (string str)
PPCODE:
    // dont change ST(0), while working with str
    printf("string is %s, len is %d", str.data(), str.length());
    str.retain(); // it ok now to change ST(0), as str is detached from original string.
    ...

C++ CLASSES

panda::string

This string is fully compatible with std::string API, however it supports COW (copy-on-write) and therefore runs much faster in many cases. C++11 supports COW with other strings, but doesn't support COW with external pointers, which is meaningful when creating a string from literal: string("mystring"), or myhash["mykey"]

SYNOPSIS

using panda::string;

string str("abcd"); // "abcd" is not copied, COW mode.
str.append("ef"); // str is copied on modification.
cout << str; // prints 'abcdef'

char* mystr = new char[10];
memcpy(mystr, "hello, world", 13);
str.assign(mystr, 12); // COW mode, don't free mystr until you're done with str.
str.retain(); // abort COW, str is detached, buffer is copied.

string str2(mystr, string::COPY); // no-COW, std::string-like behaviour, mystr is copied to str2.
str2.resize(5);
cout << str2; // 'hello'

str = str2; // COW mode, buffer is not copied. Unlike for char* pointers, you can safely destroy str2 at any time
cout << str; // 'hello'
str.append('!'); // detach on modification
cout << str << str2; // 'hello!hello'

panda::string is converted into std::string on demand. Also it can be used in ostream's and istream's << >> operators.

METHODS

Only new methods or methods with additional params are listed. All other methods have the same syntax and meaning as in std::string.

string (const char* p, ref_t ref = REF)

string (const char* p, size_t len, ref_t ref = REF)

If 'ref' is REF, then newly created string will use COW mode with buffer 'p'. It's your responsibility to keep 'p' pointer valid until you're done with string or changed it anyhow.

If 'ref' is COPY, then 'p' is copied to string and it won't depend on 'p' pointer.

The default is REF, it saves time in such common cases as:

void myfunc (const string str) { ... }
myfunc("hello");

or

std::map<string, int> myhash;
iter = myhash["mykey"];

char* buf ()

Returns string buffer like 'data' or 'c_str' but this buffer is writable. Therefore if a string was in COW mode, it detaches. Common case: parse something directly into string:

string str;
str.reserve(1000);
char* buf = str.buf();
// fill buf
str.resize(actual_length);

string& retain ()

Detaches string if it's in COW mode. Does nothing otherwise. Returns the string itself.

string& assign (const char* p, ref_t ref = REF)

string& assign (const char* p, size_t len, ref_t ref = REF)

'ref' has the same meaning as in constructor.

panda::lib::MemoryPool

Base object for fast memory allocations of particular size (commonly used for small objects allocation). This class is thread-unsafe, you can only allocate memory using this object from single thread at one time. It is about from 10x to 40x times faster than new+delete.

METHODS

MemoryPool (size_t blocksize)

Creates object which allocates blocks of size blocksize. There is no memory overheat, because it doesn't store any additional data before/after a memory block. However if you pass blocksize less than 8 bytes, it will still allocate blocks large enough to hold 8 bytes.

void* alloc ()

Allocates new block. Can throw std::bad_alloc if no memory.

void dealloc (void* ptr)

Returns ptr back to pool. If ptr is a pointer that this object never allocated, the behaviour is undefined.

~MemoryPool

Frees internal storage and returns memory to system. All pointers ever allocated by this object become invalid.

template <int BLOCKSIZE> panda::lib::StaticMemoryPool

This class provides access to singleton memory pool objects for particular block size. It is recommended to use memory pools via this interface to reduce memory consumption and fragmentation.

METHODS

static MemoryPool* instance ()

Returns MemoryPool object for BLOCKSIZE which is global to the whole process. This object is thread-unsafe.

static MemoryPool* threaded_instance ()

Returns MemoryPool object for BLOCKSIZE which is global to the current thread. This object is thread-safe. Thread safeness is provided by TLS (thread local storage), without any mutexes, rwlocks and so on, so that perfomance is still great.

panda::lib::ObjectAllocator

Sometimes you don't know the size of a block at compile time and therefore can't use StaticMemoryPool. From the other hand, creating MemoryPool objects for particular size every time is expensive. This class provides interface for allocating memory block of an arbitrary size. It holds a colletion of MemoryPool objects of various size which are created on-demand.

METHODS

ObjectAllocator ()

Creates allocator object. However i would recommend using singleton interface via instance/threaded_instance, see below.

void* alloc (size_t size)

Allocates block size bytes long. If you pass size less than 8 bytes, it will still allocate 8 bytes.

void dealloc (void* ptr, size_t size)

Returns ptr back to pool. size is required because MemoryPool doesn't store block sizes before/after blocks to avoid memory overheat. If you pass wrong size, or a pointer that was never allocated via this object, the behaviour is undefined.

~ObjectAllocator ()

Frees internal storage of all pools and returns memory to system. All pointers ever allocated by this object become invalid.

static ObjectAllocator* instance ()

Returns ObjectAllocator object which is global to the whole process. This object is thread-unsafe.

static ObjectAllocator* threaded_instance ()

Returns ObjectAllocator object which is global to the current thread. This object is thread-safe.

template <class TARGET, bool THREAD_SAFE = true> panda::lib::AllocatedObject

This class is a helper base class. If you inherit from it, objects of your class will be allocated via memory pools instead of using default new/delete operators.

Normally, you would need to write this code in order to allocate your objects via MemoryPool:

class MyClass {
    static void* operator new (size_t size) {
        if (size == sizeof(MyClass)) return StaticMemoryPool<sizeof(MyClass)>::threaded_instance()->alloc();
        return ObjectAllocator::threaded_instance()->alloc(size);
    }
    static void operator delete (void* p, size_t size) {
        if (size == sizeof(MyClass)) StaticMemoryPool<sizeof(MyClass)>::threaded_instance()->dealloc(p);
        else ObjectAllocator::threaded_instance()->dealloc(p, size);
    }
    ...
};

Size check (if/else) is needed to support inheritance, because in that case, size won't match sizeof(MyClass). Mostly, programmers use default operator ::new/::delete in case when sizes don't match, however ObjectAllocator can handle dynamic sizes and is much faster than default operators, so even in this case we save time.

To avoid writing this code every time, just inherit from AllocatedObject passing your class name as a template parameter. You can pass false as a second param to template if you don't need thread-safe allocations to achieve even more perfomance.

class MyClass : public AllocatedObject<MyClass> { ... }

TIP

class MyChild : public MyClass { ... }

In this case we will still using memory pool, however via dynamic ObjectAllocator which is slightly slower. To restore original perfomance redefine new/delete operators again passing your child class name. We will also need to resolve multiple inheritance conflicts via using operator.

class MyChild : public MyClass, public AllocatedObject<MyChild> {
    using AllocatedObject<MyChild>::operator new;
    using AllocatedObject<MyChild>::operator delete;
    ...
}

This code will allocate MyChild objects via static memory pool.

REQUIREMENTS

UNIX (any)

For thread-safe instances to work correctly and without memory leaks, you must use pthreads interface to create and join threads. As far as i know now everybody use pthreads on UNIX, so this shouldn't be a problem i hope :-)

Windows

no requirements

TYPEMAPS

panda::string

std::string

string

typemap for panda::string or std::string or anything else you see as 'string' in your local scope. Such a class must have std::string-compatible API.

AUTHOR

Pronin Oleg <syber@crazypanda.ru>, Crazy Panda, CP Decision LTD

LICENSE

You may distribute this code under the same terms as Perl itself.