Security Advisories (8)
CVE-2020-12723 (2020-06-05)

regcomp.c in Perl before 5.30.3 allows a buffer overflow via a crafted regular expression because of recursive S_study_chunk calls.

CVE-2020-10878 (2020-06-05)

Perl before 5.30.3 has an integer overflow related to mishandling of a "PL_regkind[OP(n)] == NOTHING" situation. A crafted regular expression could lead to malformed bytecode with a possibility of instruction injection.

CVE-2020-10543 (2020-06-05)

Perl before 5.30.3 on 32-bit platforms allows a heap-based buffer overflow because nested regular expression quantifiers have an integer overflow.

CVE-2018-6798 (2018-04-17)

An issue was discovered in Perl 5.22 through 5.26. Matching a crafted locale dependent regular expression can cause a heap-based buffer over-read and potentially information disclosure.

CVE-2023-47100

In Perl before 5.38.2, S_parse_uniprop_string in regcomp.c can write to unallocated space because a property name associated with a \p{...} regular expression construct is mishandled. The earliest affected version is 5.30.0.

CVE-2024-56406 (2025-04-13)

A heap buffer overflow vulnerability was discovered in Perl. When there are non-ASCII bytes in the left-hand-side of the `tr` operator, `S_do_trans_invmap` can overflow the destination pointer `d`.    $ perl -e '$_ = "\x{FF}" x 1000000; tr/\xFF/\x{100}/;'    Segmentation fault (core dumped) It is believed that this vulnerability can enable Denial of Service and possibly Code Execution attacks on platforms that lack sufficient defenses.

CVE-2025-40909 (2025-05-30)

Perl threads have a working directory race condition where file operations may target unintended paths. If a directory handle is open at thread creation, the process-wide current working directory is temporarily changed in order to clone that handle for the new thread, which is visible from any third (or more) thread already running. This may lead to unintended operations such as loading code or accessing files from unexpected locations, which a local attacker may be able to exploit. The bug was introduced in commit 11a11ecf4bea72b17d250cfb43c897be1341861e and released in Perl version 5.13.6

CVE-2023-47039 (2023-10-30)

Perl for Windows relies on the system path environment variable to find the shell (cmd.exe). When running an executable which uses Windows Perl interpreter, Perl attempts to find and execute cmd.exe within the operating system. However, due to path search order issues, Perl initially looks for cmd.exe in the current working directory. An attacker with limited privileges can exploit this behavior by placing cmd.exe in locations with weak permissions, such as C:\ProgramData. By doing so, when an administrator attempts to use this executable from these compromised locations, arbitrary code can be executed.

NAME

CharClass::Matcher -- Generate C macros that match character classes efficiently

SYNOPSIS

perl Porting/regcharclass.pl

DESCRIPTION

Dynamically generates macros for detecting special charclasses in latin-1, utf8, and codepoint forms. Macros can be set to return the length (in bytes) of the matched codepoint, and/or the codepoint itself.

To regenerate regcharclass.h, run this script from perl-root. No arguments are necessary.

Using WHATEVER as an example the following macros can be produced, depending on the input parameters (how to get each is described by internal comments at the __DATA__ line):

is_WHATEVER(s,is_utf8)
is_WHATEVER_safe(s,e,is_utf8)

Do a lookup as appropriate based on the is_utf8 flag. When possible comparisons involving octect<128 are done before checking the is_utf8 flag, hopefully saving time.

The version without the _safe suffix should be used only when the input is known to be well-formed.

is_WHATEVER_utf8(s)
is_WHATEVER_utf8_safe(s,e)

Do a lookup assuming the string is encoded in (normalized) UTF8.

The version without the _safe suffix should be used only when the input is known to be well-formed.

is_WHATEVER_latin1(s)
is_WHATEVER_latin1_safe(s,e)

Do a lookup assuming the string is encoded in latin-1 (aka plan octets).

The version without the _safe suffix should be used only when it is known that s contains at least one character.

is_WHATEVER_cp(cp)

Check to see if the string matches a given codepoint (hypothetically a U32). The condition is constructed as to "break out" as early as possible if the codepoint is out of range of the condition.

IOW:

(cp==X || (cp>X && (cp==Y || (cp>Y && ...))))

Thus if the character is X+1 only two comparisons will be done. Making matching lookups slower, but non-matching faster.

what_len_WHATEVER_FOO(arg1, ..., len)

A variant form of each of the macro types described above can be generated, in which the code point is returned by the macro, and an extra parameter (in the final position) is added, which is a pointer for the macro to set the byte length of the returned code point.

These forms all have a what_len prefix instead of the is_, for example what_len_WHATEVER_safe(s,e,is_utf8,len) and what_len_WHATEVER_utf8(s,len).

These forms should not be used except on small sets of mostly widely separated code points; otherwise the code generated is inefficient. For these cases, it is best to use the is_ forms, and then find the code point with utf8_to_uvchr_buf(). This program can fail with a "deep recursion" message on the worst of the inappropriate sets. Examine the generated macro to see if it is acceptable.

what_WHATEVER_FOO(arg1, ...)

A variant form of each of the is_ macro types described above can be generated, in which the code point and not the length is returned by the macro. These have the same caveat as "what_len_WHATEVER_FOO(arg1, ..., len)", plus they should not be used where the set contains a NULL, as 0 is returned for two different cases: a) the set doesn't include the input code point; b) the set does include it, and it is a NULL.

The above isn't quite complete, as for specialized purposes one can get a macro like is_WHATEVER_utf8_no_length_checks(s), which assumes that it is already known that there is enough space to hold the character starting at s, but otherwise checks that it is well-formed. In other words, this is intermediary in checking between is_WHATEVER_utf8(s) and is_WHATEVER_utf8_safe(s,e).

CODE FORMAT

perltidy -st -bt=1 -bbt=0 -pt=0 -sbt=1 -ce -nwls== "%f"

AUTHOR

Author: Yves Orton (demerphq) 2007. Maintained by Perl5 Porters.

BUGS

No tests directly here (although the regex engine will fail tests if this code is broken). Insufficient documentation and no Getopts handler for using the module as a script.

LICENSE

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the README file.