NAME

Mail::Pyzor::Digest::Pieces

DESCRIPTION

This module houses backend logic for Mail::Pyzor::Digest.

It reimplements logic found in pyzor’s digest.py module (https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py).

FUNCTIONS

$strings_ar = digest_payloads( $EMAIL_MIME )

This imitates the corresponding object method in digest.py. It returns a reference to an array of strings. Each string can be either a byte string or a character string (e.g., UTF-8 decoded).

NB: RFC 2822 stipulates that message bodies should use CRLF line breaks, not plain LF (nor plain CR). Email::MIME::Encodings will thus convert any plain CRs in a quoted-printable message body into CRLF. Python, though, doesn’t do this, so the output of our implementation of digest_payloads() diverges from that of the Python original. It doesn’t ultimately make a difference since the line-ending whitespace gets trimmed regardless, but it’s necessary to factor in when comparing the output of our implementation with the Python output.

normalize( $STRING )

This imitates the corresponding object method in digest.py. It modifies $STRING in-place.

As with the original implementation, if $STRING contains (decoded) Unicode characters, those characters will be parsed accordingly. So:

$str = "123\xc2\xa0";   # [ c2 a0 ] == \u00a0, non-breaking space

normalize($str);

The above will leave $str alone, but this:

utf8::decode($str);

normalize($str);

… will trim off the last two bytes from $str.

$yn = should_handle_line( $STRING )

This imitates the corresponding object method in digest.py. It returns a boolean.

$sr = assemble_lines( \@LINES )

This assembles a string buffer out of @LINES. The string is the buffer of octets that will be hashed to produce the message digest.

Each member of @LINES is expected to be an octet string, not a character string.

($main, $sub, $encoding, $checkval) = parse_content_type( $CONTENT_TYPE )

@lines = splitlines( $TEXT )

Imitates str.splitlines(). (cf. pydoc str)

Returns a plain list in list context. Returns the number of items to be returned in scalar context.