NAME

PHP::Strings - Implement some of PHP's string functions.

SYNOPSIS

use PHP::Strings;

my $slashed = addcslashes( $not_escaped, $charlist );
my $wordcount = str_word_count( $string );
my @words     = str_word_count( $string, 1 );
my %positions = str_word_count( $string, 2 );
my $clean = strip_tags( $html, '<a><b><i><u>' );
my $unslashed = stripcslashes( '\a\b\f\n\r\xae' );

DESCRIPTION

PHP has many functions. This is one of the main problems with PHP.

People do, however, get used to said functions and when they come to a better designed language they get lost because they have to implement some of these somewhat vapid functions themselves.

So I wrote PHP::Strings. It implements most of the strings functions of PHP. Those it doesn't implement it describes how to do in native Perl.

Any function that would be silly to implement has not been and has been marked as such in this documentation. They will still be exportable, but if you attempt to use said function you will get an error telling you to read these docs.

RELATED READING

ERROR HANDLING

All arguments are checked using Params::Validate. Bad arguments will cause an error to be thrown. If you wish to catch it, use eval.

Attempts to use functions I've decided to not implement (as distinct from functions that aren't implemented because I've not gotten around to either writing or deciding whether to write) will cause an error displaying the documentation for said function.

EXPORTS

By default, nothing is exported.

Each function and constant can be exported by explicit name.

use PHP::Strings qw( str_pad addcslashes );

To get a function and its associated constants as well, prefix them with a colon:

use PHP::Strings qw( :str_pad );
# This grabs str_pad, STR_PAD_LEFT, STR_PAD_BOTH, STR_PAD_RIGHT.

To export everything:

use PHP::Strings qw( :all );

For more information on what you can add there, consult "Specialised Import Lists" in Exporter.

FUNCTIONS

addcslashes

http://www.php.net/addcslashes

my $slashed = addcslashes( $not_escaped, $charlist );

Returns a string with backslashes before characters that are listed in $charlist.

addslashes

http://www.php.net/addslashes

PHP::Strings::addslashes WILL NOT BE IMPLEMENTED.

Returns a string with backslashes before characters that need to be quoted in SQL queries. You should never need this function. I mean, never.

DBI, the standard method of accessing databases with perl, does all this for you. It provides by a quote method to escape anything, and it provides placeholders and bind values so you don't even have to worry about escaping. In PHP, PEAR DB also provides this facility.

DBI is also aware that some databases don't escape in this method, such as mssql which uses doubled characters to escape (like some versions of BASIC). This function doesn't.

The less said about PHP's magic_quotes "feature", the better.

bin2hex

http://www.php.net/bin2hex

PHP::Strings::bin2hex WILL NOT BE IMPLEMENTED.

This is trivially implemented using pack.

my $hex = unpack "H*", $data;

chop

http://www.php.net/chop

PHP::Strings::chop WILL NOT BE IMPLEMENTED.

PHP's chop function is an alias to its "rtrim" function.

Perl has a builtin named chop. Thus we do not support the use of chop as an alias to "rtrim".

chr

http://www.php.net/chr

PHP::Strings::chr WILL NOT BE IMPLEMENTED.

PHP's and Perl's chr functions operate sufficiently identically.

Note that PHP's claims an ASCII value as input. Perl assumes Unicode. But ensure you see the documentation for a precise definition.

Note that it returns one character, which in some string encodings may not necessarily be one byte.

chunk_split

http://www.php.net/chunk_split

Returns the given string, split into smaller chunks.

my $split = chunk_split( $body [, $chunklen [, $end ] ] );

Where $body is the data to split, $chunklen is the optional length of data between each split (default 76), and $end is what to insert both between each split (default "\r\n") and on the end.

Also trivially implemented as a regular expression:

$body =~ s/(.{$chunklen})/$1$end/sg;
$body .= $end;

convert_cyr_string

http://www.php.net/convert_cyr_string

PHP::Strings::convert_cyr_string WILL NOT BE IMPLEMENTED.

Perl has the Encode module to convert between character encodings.

count_chars

http://www.php.net/count_chars

A somewhat daft function that returns counts of characters in a string.

It's daft because it assumes characters have values in the range 0-255. This is patently false in today's world of Unicode. In fact, the PHP documentation for this function happily talks about characters in one part and bytes in another, not realising the distinction.

So, I've implemented this function as if it were called count_bytes. It will count raw bytes, not characters.

Takes two arguments: the byte sequence to analyse and a 'mode' flag that indicates what sort of return value to return. The default mode is 0.

Mode  Return value
----  ------------
 0    Return hash of byte values and frequencies.
 1    As for 0, but hash does not contain bytes with frequency of 0.
 2    As for 0, but hash only contains bytes with frequency of 0.
 3    Return string composed of used byte-values.
 4    Return string composed of unused byte-values.

 my %freq = count_chars( $string, 1 );

crc32

http://www.php.net/crc32

PHP::Strings::crc32 WILL NOT BE IMPLEMENTED.

See the String::CRC32 module.

crypt

http://www.php.net/crypt

PHP::Strings::crypt WILL NOT BE IMPLEMENTED.

PHP's crypt is the same as Perl's. Thus there's no need for PHP::String to provide an implementation.

The CRYPT_* constants are not provided.

echo

http://www.php.net/echo

PHP::Strings::echo WILL NOT BE IMPLEMENTED.

See "print" in perlfunc.

explode

http://www.php.net/explode

PHP::Strings::explode WILL NOT BE IMPLEMENTED.

Use the \Q regex metachar and split.

my @pieces = split /\Q$separator/, $string, $limit;

See "split" in perlfunc for more details.

Note that split // will split between every character, rather than returning false. Note also that split "..." is the same as split /.../ which means to split everywhere three characters are matched. The first argument to split is always a regex.

fprintf

http://www.php.net/fprintf

PHP::Strings::fprintf WILL NOT BE IMPLEMENTED.

Perl's printf can be told to which file handle to print.

printf FILEHANDLE $format, @args;

See "printf" in perlfunc and "print" in perlfunc for details.

get_html_translation_table

http://www.php.net/get_html_translation_table

PHP::Strings::get_html_translation_table WILL NOT BE IMPLEMENTED.

Use the HTML::Entities module to escape and unescape characters.

hebrev

http://www.php.net/hebrev

PHP::Strings::hebrev WILL NOT BE IMPLEMENTED.

Use the Encode module to convert between character encodings.

hebrevc

http://www.php.net/hebrevc

PHP::Strings::hebrevc WILL NOT BE IMPLEMENTED.

Use the Encode module to convert between character encodings.

html_entity_decode

http://www.php.net/html_entity_decode

PHP::Strings::html_entity_decode WILL NOT BE IMPLEMENTED.

Use the HTML::Entities module to decode character entities.

htmlentities

http://www.php.net/htmlentities

PHP::Strings::htmlentities WILL NOT BE IMPLEMENTED.

Use the HTML::Entities module to encode character entities.

htmlspecialchars

http://www.php.net/htmlspecialchars

PHP::Strings::htmlspecialchars WILL NOT BE IMPLEMENTED.

Use the HTML::Entities module to encode character entities.

implode

http://www.php.net/implode

PHP::Strings::implode WILL NOT BE IMPLEMENTED.

See "join" in perlfunc. Note that join cannot accept its arguments in either order because that's just not how Perl arrays and lists work. Note also that the joining sequence is not optional.

join

http://www.php.net/join

PHP::Strings::join WILL NOT BE IMPLEMENTED.

PHP's join is an alias for implode. See "implode".

levenshtein

http://www.php.net/levenshtein

PHP::Strings::levenshtein WILL NOT BE IMPLEMENTED.

I have no idea why PHP has this function.

See Text::Levenshtein, Text::LevenshteinXS, String::Approx, Text::PHraseDistance and probably any number of other modules on CPAN.

ltrim

http://www.php.net/ltrim

PHP::Strings::ltrim WILL NOT BE IMPLEMENTED.

As per perlfaq:

$string =~ s/^\s+//;

A basic glance through perlretut or perlreref should give you an idea on how to change what characters get trimmed.

md5

http://www.php.net/md5

PHP::Strings::md5 WILL NOT BE IMPLEMENTED.

See Digest::MD5 which provides a number of functions for computing MD5 hashes from various sources and to various formats.

Note: the user notes for this function at http://www.php.net/md5 are among the most unintentionally funny and misinformed I've read.

md5_file

http://www.php.net/md5_file

PHP::Strings::md5_file WILL NOT BE IMPLEMENTED.

The Digest::MD5 module provides sufficient support.

use Digest::MD5;

sub md5_file
{
    my $filename = shift;
    my $ctx = Digest::MD5->new;
    open my $fh, '<', $filename or die $!;
    binmode( $fh );
    $ctx->addfile( $fh )->digest; # or hexdigest, or b64digest
}

Despite providing that possible implementation just above, I've chosen to not include it as an export due to the amount of flexibility of Digest::MD5 and the number of ways you may want to get your file handle. After all, you may want to use Digest::SHA1, or Digest::MD4 or some other digest mechanism.

Again, I wonder why PHP has the function as they so arbitrarily hobble it.

metaphone

http://www.php.net/metaphone

PHP::Strings::metaphone WILL NOT BE IMPLEMENTED.

Text::Metaphone and Text::DoubleMetaphone and Text::TransMetaphone all provide metaphonic calculations.

money_format

http://www.php.net/money_format

sprintf for money.

nl2br

http://www.php.net/nl2br

PHP::Strings::nl2br WILL NOT BE IMPLEMENTED.

This is trivially implemented as:

s,$,<br />,mg;

nl_langinfo

http://www.php.net/nl_langinfo

PHP::Strings::nl_langinfo WILL NOT BE IMPLEMENTED.

I18N::Langinfo has a langinfo command that corresponds to PHP's nl_langinfo function.

number_format

http://www.php.net/number_format

TBD

ord

http://www.php.net/ord

PHP::Strings::ord WILL NOT BE IMPLEMENTED.

See "ord" in perlfunc. Note that Perl returns Unicode value, not ASCII.

parse_str

http://www.php.net/parse_str

PHP::Strings::parse_str WILL NOT BE IMPLEMENTED.

See instead the CGI and URI modules which handles that sort of thing.

print

http://www.php.net/print

PHP::Strings::print WILL NOT BE IMPLEMENTED.

See "print" in perlfunc.

printf

http://www.php.net/printf

PHP::Strings::printf WILL NOT BE IMPLEMENTED.

See "printf" in perlfunc.

quoted_printable_decode

http://www.php.net/quoted_printable_decode

PHP::Strings::quoted_printable_decode WILL NOT BE IMPLEMENTED.

MIME::QuotedPrint provides functions for encoding and decoding quoted-printable strings.

quotemeta

http://www.php.net/quotemeta

PHP::Strings::quotemeta WILL NOT BE IMPLEMENTED.

See "quotemeta" in perlfunc.

rtrim

http://www.php.net/rtrim

PHP::Strings::rtrim WILL NOT BE IMPLEMENTED.

Another trivial regular expression:

$string =~ s/\s+$//;

See the notes on "ltrim".

setlocale

http://www.php.net/setlocale

PHP::Strings::setlocale WILL NOT BE IMPLEMENTED.

setlocale is provided by the POSIX module.

sha1

http://www.php.net/sha1

PHP::Strings::sha1 WILL NOT BE IMPLEMENTED.

See "md5", mentally substituting Digest::SHA1 for Digest::MD5, although the user notes are not as funny.

sha1_file

http://www.php.net/sha1_file

PHP::Strings::sha1_file WILL NOT BE IMPLEMENTED.

See "md5_file"

similar_text

http://www.php.net/similar_text

TBD

soundex

http://www.php.net/soundex

PHP::Strings::soundex WILL NOT BE IMPLEMENTED.

See Text::Soundex, which also happens to be a core module.

sprintf

http://www.php.net/sprintf

PHP::Strings::sprintf WILL NOT BE IMPLEMENTED.

See "sprintf" in perlfunc.

sscanf

http://www.php.net/sscanf

PHP::Strings::sscanf WILL NOT BE IMPLEMENTED.

This is a godawful function. You should be using regular expressions instead. See perlretut and perlre.

str_ireplace

http://www.php.net/str_ireplace

PHP::Strings::str_ireplace WILL NOT BE IMPLEMENTED.

Use the s/// operator instead. See perlop and perlre for details.

str_pad

http://www.php.net/str_pad

TBD

str_repeat

http://www.php.net/str_repeat

PHP::Strings::str_repeat WILL NOT BE IMPLEMENTED.

Instead, use the x operator. See perlop for details.

my $by_ten = "-=" x 10;

str_replace

http://www.php.net/str_replace

PHP::Strings::str_replace WILL NOT BE IMPLEMENTED.

See the s/// operator. perlop and perlre have details.

str_rot13

http://www.php.net/str_rot13

PHP::Strings::str_rot13 WILL NOT BE IMPLEMENTED.

This is rather trivially implemented as:

$message =~ tr/A-Za-z/N-ZA-Mn-za-m/

(As per "Programming Perl", 3rd edition, section 5.2.4.)

str_shuffle

http://www.php.net/str_shuffle

Implemented, against my better judgement. It's trivial, like so many of the others.

str_split

http://www.php.net/str_split

PHP::Strings::str_split WILL NOT BE IMPLEMENTED.

See "split" in perlfunc for details.

my @bits = split /(.{,$len})/, $string;

str_word_count

http://www.php.net/str_word_count

my $wordcount = str_word_count( $string );
my @words     = str_word_count( $string, 1 );
my %positions = str_word_count( $string, 2 );

With a single argument, returns the number of words in that string. Equivalent to:

my $wordcount = () = $string =~ m/(\S+)/g;

With 2 arguments, where the second is the value 0, returns the same as with no second argument.

With 2 arguments, where the second is the value 1, returns each of those words. Equivalent to:

my @words = $string =~ m/(\S+)/g;

With 2 arguments, where the second is the value 2, returns a hash where the values are the words, and the keys are their position in the string (offsets are 0 based).

If words are duplicated, then they are duplicated. The definition of a word is anything that isn't a space. When I say equivalent above, I mean that's the exact code this function uses.

This function should really be three different functions, but as PHP already has over 3000, I can only assume they wanted to restrain themselves. Implementation wise, it is three different functions. I just keep them in an array and dispatch appropriately.

strcasecmp

http://www.php.net/strcasecmp

PHP::Strings::strcasecmp WILL NOT BE IMPLEMENTED.

Equivalent to:

lc($a) cmp lc($b)

strchr

http://www.php.net/strchr

PHP::Strings::strchr WILL NOT BE IMPLEMENTED.

See "strstr"

strcmp

http://www.php.net/strcmp

PHP::Strings::strcmp WILL NOT BE IMPLEMENTED.

Equivalent to:

$a cmp $b

strcoll

http://www.php.net/strcoll

PHP::Strings::strcoll WILL NOT BE IMPLEMENTED.

Equivalent to:

use locale;

$a cmp $b

strcspn

http://www.php.net/strcspn

PHP::Strings::strcspn WILL NOT BE IMPLEMENTED.

Trivially equivalent to:

my $cspn;
$cspn = $-[0]-1 if $string =~ m/[chars]/;

strip_tags

http://www.php.net/strip_tags

my $clean = strip_tags( $html, '<a><b><i><u>' );
You really want L<HTML::Scrubber>.

This function tries to return a string with all HTML tags stripped from a given string. It errors on the side of caution in case of incomplete or bogus tags.

You can use the optional second parameter to specify tags which should not be stripped.

For more control, use HTML::Scrubber.

stripcslashes

http://www.php.net/stripcslashes

my $unslashed = stripcslashes( '\a\b\f\n\r\xae' );

Returns a string with backslashes stripped off. Recognizes C-like \n, \r ..., octal and hexadecimal representation.

stripos

http://www.php.net/stripos

PHP::Strings::stripos WILL NOT BE IMPLEMENTED.

Trivially implemented as:

my $pos    = index( lc $haystack, lc $needle );
my $second = index( lc $haystack, lc $needle, $pos );

Note that unlike stripos, index returns -1 if $needle is not found. This makes testing much simpler.

If you want the additional behaviour of non-strings being converted to integers and from there to characters of that value, then you're silly. If you want to find a character of particular value, explicitly use the chr function:

my $charpos = index( lc $haystack, lc chr $char );

stripslashes

http://www.php.net/stripslashes

PHP::Strings::stripslashes WILL NOT BE IMPLEMENTED.

If you can think of a good reason for this function, you have more imagination than I do.

stristr

http://www.php.net/stristr

PHP::Strings::stristr WILL NOT BE IMPLEMENTED.

Use substr() and index() instead.

my $strstr = substr( $haystack, index( lc $haystack, lc $needle ) );

Or a regex:

my ( $strstr ) = $haystack =~ /(\Q$needle\E.*$)/si;

strlen

http://www.php.net/strlen

PHP::Strings::strlen WILL NOT BE IMPLEMENTED.

See "length" in perldoc.

strnatcasecmp

http://www.php.net/strnatcasecmp

PHP::Strings::strnatcasecmp WILL NOT BE IMPLEMENTED.

See Sort::Naturally.

strnatcmp

http://www.php.net/strnatcmp

PHP::Strings::strnatcmp WILL NOT BE IMPLEMENTED.

See Sort::Naturally.

strncasecmp

http://www.php.net/strncasecmp

PHP::Strings::strncasecmp WILL NOT BE IMPLEMENTED.

Unnecessary. Perl is smart enough. Use substr.

strncmp

http://www.php.net/strncmp

PHP::Strings::strncmp WILL NOT BE IMPLEMENTED.

Unnecessary. Perl is smart enough. Use substr.

strpos

http://www.php.net/strpos

PHP::Strings::strpos WILL NOT BE IMPLEMENTED.

This function is Perl's index function, however index has a sensible return value.

strrchr

http://www.php.net/strrchr

PHP::Strings::strrchr WILL NOT BE IMPLEMENTED.

See "rindex" in perlfunc. Note that all characters in the $needle are used: if you just want to find the first character, then extract it.

strrev

http://www.php.net/strrev

PHP::Strings::strrev WILL NOT BE IMPLEMENTED.

See "reverse" in perlfunc. Note the note about scalar context.

my $derf = reverse "fred";
print scalar reverse "fred";

strripos

http://www.php.net/strripos

PHP::Strings::strripos WILL NOT BE IMPLEMENTED.

This is just getting silly.

See rindex and lc.

strrpos

http://www.php.net/strrpos

PHP::Strings::strrpos WILL NOT BE IMPLEMENTED.

See rindex.

strstr

http://www.php.net/strstr

PHP::Strings::strstr WILL NOT BE IMPLEMENTED.

Use substr() and index() instead.

my $strstr = substr( $haystack, index( $haystack, $needle ) );

Or a regex:

my ( $strstr ) = $haystack =~ /(\Q$needle\E.*$)/s;

strtolower

http://www.php.net/strtolower

PHP::Strings::strtolower WILL NOT BE IMPLEMENTED.

See "lc" in perlfunc.

strtoupper

http://www.php.net/strtoupper

PHP::Strings::strtoupper WILL NOT BE IMPLEMENTED.

See "uc" in perlfunc.

strtr

http://www.php.net/strtr

This function, like many in PHP, is really two functions.

The first is the same as the tr operator. And you really should use tr instead of this function.

The second is more complicated.

substr

http://www.php.net/substr

PHP::Strings::substr WILL NOT BE IMPLEMENTED.

See "substr" in perlfunc.

substr_compare

http://www.php.net/substr_compare

PHP::Strings::substr_compare WILL NOT BE IMPLEMENTED.

Use substr and the cmp operator.

substr_count

http://www.php.net/substr_count

PHP::Strings::substr_count WILL NOT BE IMPLEMENTED.

This is even in the FAQ.

http://faq.perl.org/perlfaq4.html#How_can_I_count_the_

my $count = () = $string =~ /regex/g;

substr_replace

http://www.php.net/substr_replace

PHP::Strings::substr_replace WILL NOT BE IMPLEMENTED.

See "substr" in perlfunc.

trim

http://www.php.net/trim

PHP::Strings::trim WILL NOT BE IMPLEMENTED.

Also in the FAQ.

http://faq.perl.org/perlfaq4.html#How_do_I_strip_blank

See also "rtrim" and "ltrim".

ucfirst

http://www.php.net/ucfirst

PHP::Strings::ucfirst WILL NOT BE IMPLEMENTED.

See "ucfirst" in perlfunc.

ucwords

http://www.php.net/ucwords

PHP::Strings::ucwords WILL NOT BE IMPLEMENTED.

Another Perl FAQ.

http://faq.perl.org/perlfaq4.html#How_do_I_capitalize_

vprintf

http://www.php.net/vprintf

PHP::Strings::vprintf WILL NOT BE IMPLEMENTED.

Unlike PHP, Perl isn't stupid. See printf.

vsprintf

http://www.php.net/vsprintf

PHP::Strings::vsprintf WILL NOT BE IMPLEMENTED.

Unlike PHP, Perl isn't stupid. See sprintf.

wordwrap

http://www.php.net/wordwrap

PHP::Strings::wordwrap WILL NOT BE IMPLEMENTED.

See Text::Wrap, a core module.

FUNCTIONS ACTUALLY IMPLEMENTED

Just in case you missed which functions were actually implemented in that huge mass of unimplemented functions, here's the condensed list of implemented functions:

BAD EGGS

All functions that I think are worthless are still exportable, with the exception of any that would clash with a Perl builtin function.

If you try to actually use said function, a big fat error will result.

FOR THOSE WHO HAVE READ THIS FAR

Yes, this module is mostly a joke. I wrote a lot of it after being asked for the hundredth time: What's the equivalent to PHP's X in Perl?

That said, although it's a joke, I'm happy to receive amendments, additions and such. It's incomplete at present, and I would like to see it complete at some point.

In particular, the test suite needs a lot of work. (If you feel like it. Hint Hint.)

If you want to implement some of the functions that I've said will not be implemented, then I'll be happy to include them. After all, what I think is worthless is my opinion.

BUGS, REQUESTS, COMMENTS

Log them via the CPAN RT system via the web or email:

http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PHP-Strings
( shorter URL: http://xrl.us/4at )

bug-php-strings@rt.cpan.org

This makes it much easier for me to track things and thus means your problem is less likely to be neglected.

THANKS

Andy Lester (PETDANCE) for taking care of Iain's modules.

Juerd Waalboer (JUERD) for suggesting a link, and the assorted regex functions.

Matthew Persico (PERSICOM) for the idea of having the functions give their documentation as their error.

LICENCE AND COPYRIGHT

PHP::Strings modifications from version 0.27 are copyright © Petras Kudaras. All rights reserved.

PHP::Strings is copyright © Iain Truskett, 2003. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.000 or, at your option, any later version of Perl 5 you may have available.

The full text of the licences can be found in the Artistic and COPYING files included with this module, or in perlartistic and perlgpl as supplied with Perl 5.8.1 and later.

AUTHORS

Iain Truskett <spoon@cpan.org> Petras Kudaras <kudarasp@cpan.org>

SEE ALSO

perl, php.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 2929:

Non-ASCII character seen before =encoding in 'façade'. Assuming UTF-8