Unicode has some property-value pairs that currently don't match anything. This happens generally either because they are obsolete, or for symmetry with other forms, but no language has yet been encoded that uses them. In this version of Unicode, the following match zero code points:

    $zero_matches

END }

# Generate list of properties that we don't accept, grouped by the reasons
# why.  This is so only put out the 'why' once, and then list all the
# properties that have that reason under it.

my %why_list;   # The keys are the reasons; the values are lists of
                # properties that have the key as their reason

# For each property, add it to the list that are suppressed for its reason
# The sort will cause the alphabetically first properties to be added to
# each list first, so each list will be sorted.
foreach my $property (sort keys %why_suppressed) {
    push @{$why_list{$why_suppressed{$property}}}, $property;
}

# For each reason (sorted by the first property that has that reason)...
my @bad_re_properties;
foreach my $why (sort { $why_list{$a}->[0] cmp $why_list{$b}->[0] }
                 keys %why_list)
{
    # Add to the output, all the properties that have that reason.  Start
    # with an empty line.
    push @bad_re_properties, "\n\n";

    my $has_item = 0;   # Flag if actually output anything.
    foreach my $name (@{$why_list{$why}}) {

        # Split compound names into $property and $table components
        my $property = $name;
        my $table;
        if ($property =~ / (.*) = (.*) /x) {
            $property = $1;
            $table = $2;
        }

        # This release of Unicode may not have a property that is
        # suppressed, so don't reference a non-existent one.
        $property = property_ref($property);
        next if ! defined $property;

        # And since this list is only for match tables, don't list the
        # ones that don't have match tables.
        next if ! $property->to_create_match_tables;

        # Find any abbreviation, and turn it into a compound name if this
        # is a property=value pair.
        my $short_name = $property->name;
        $short_name .= '=' . $property->table($table)->name if $table;

        # And add the property as an item for the reason.
        push @bad_re_properties, "\n=item I<$name> ($short_name)\n";
        $has_item = 1;
    }

    # And add the reason under the list of properties, if such a list
    # actually got generated.  Note that the header got added
    # unconditionally before.  But pod ignores extra blank lines, so no
    # harm.
    push @bad_re_properties, "\n$why\n" if $has_item;

} # End of looping through each reason.

# Generate a list of the properties whose map table we output, from the
# global @map_properties.
my @map_tables_actually_output;
my $info_indent = 20;       # Left column is narrower than \p{} table.
foreach my $property (@map_properties) {

    # Get the path to the file; don't output any not in the standard
    # directory.
    my @path = $property->file_path;
    next if $path[0] ne $map_directory;
    shift @path;    # Remove the standard name

    my $file = join '/', @path; # In case is in sub directory
    my $info = $property->full_name;
    my $short_name = $property->name;
    if ($info ne $short_name) {
        $info .= " ($short_name)";
    }
    foreach my $more_info ($property->description,
                           $property->note,
                           $property->status_info)
    {
        next unless $more_info;
        $info =~ s/\.\Z//;
        $info .= ".  $more_info";
    }
    push @map_tables_actually_output, format_pod_line($info_indent,
                                                      $file,
                                                      $info,
                                                      $property->status);
}

# Sort alphabetically, and fold for output
@map_tables_actually_output = sort
                        pod_alphanumeric_sort @map_tables_actually_output;
@map_tables_actually_output
                    = simple_fold(\@map_tables_actually_output,
                                    ' ',
                                    $info_indent,
                                    $automatic_pod_indent);

# Generate a list of the formats that can appear in the map tables.
my @map_table_formats;
foreach my $format (sort keys %map_table_formats) {
    push @map_table_formats, " $format    $map_table_formats{$format}\n";
}

# Everything is ready to assemble.
my @OUT = << "END";
=begin comment

$HEADER

To change this file, edit $0 instead.

NAME

$pod_file - Index of Unicode Version $string_version properties in Perl

DESCRIPTION

There are many properties in Unicode, and Perl provides access to almost all of them, as well as some additional extensions and short-cut synonyms.

And just about all of the few that aren't accessible through the Perl core are accessible through the modules: Unicode::Normalize and Unicode::UCD, and for Unihan properties, via the CPAN module Unicode::Unihan.

This document merely lists all available properties and does not attempt to explain what each property really means. There is a brief description of each Perl extension. There is some detail about Blocks, Scripts, General_Category, and Bidi_Class in perlunicode, but to find out about the intricacies of the Unicode properties, refer to the Unicode standard. A good starting place is $unicode_reference_url. More information on the Perl extensions is in perlrecharclass.

Note that you can define your own properties; see "User-Defined Character Properties" in perlunicode.

Properties accessible through \\p{} and \\P{}

The Perl regular expression \\p{} and \\P{} constructs give access to most of the Unicode character properties. The table below shows all these constructs, both single and compound forms.

Compound forms consist of two components, separated by an equals sign or a colon. The first component is the property name, and the second component is the particular value of the property to match against, for example, '\\p{Script: Greek}' or '\\p{Script=Greek}' both mean to match characters whose Script property is Greek.

Single forms, like '\\p{Greek}', are mostly Perl-defined shortcuts for their equivalent compound forms. The table shows these equivalences. (In our example, '\\p{Greek}' is a just a shortcut for '\\p{Script=Greek}'.) There are also a few Perl-defined single forms that are not shortcuts for a compound form. One such is \\p{Word}. These are also listed in the table.

In parsing these constructs, Perl always ignores Upper/lower case differences everywhere within the {braces}. Thus '\\p{Greek}' means the same thing as '\\p{greek}'. But note that changing the case of the 'p' or 'P' before the left brace completely changes the meaning of the construct, from "match" (for '\\p{}') to "doesn't match" (for '\\P{}'). Casing in this document is for improved legibility.

Also, white space, hyphens, and underscores are also normally ignored everywhere between the {braces}, and hence can be freely added or removed even if the /x modifier hasn't been specified on the regular expression. But $a_bold_stricter at the beginning of an entry in the table below means that tighter (stricter) rules are used for that entry:

Single form (\\p{name}) tighter rules:

White space, hyphens, and underscores ARE significant except for:

  • white space adjacent to a non-word character

  • underscores separating digits in numbers

That means, for example, that you can freely add or remove white space adjacent to (but within) the braces without affecting the meaning.

Compound form (\\p{name=value} or \\p{name:value}) tighter rules:

The tighter rules given above for the single form apply to everything to the right of the colon or equals; the looser rules still apply to everything to the left.

That means, for example, that you can freely add or remove white space adjacent to (but within) the braces and the colon or equal sign.

Some properties are considered obsolete, but still available. There are several varieties of obsolesence:

Obsolete

Properties marked with $a_bold_obsolete in the table are considered obsolete. At the time of this writing (Unicode version 5.2) there is no information in the Unicode standard about the implications of a property being obsolete.

Stabilized

Obsolete properties may be stabilized. This means that they are not actively maintained by Unicode, and will not be extended as new characters are added to the standard. Such properties are marked with $a_bold_stabilized in the table. At the time of this writing (Unicode version 5.2) there is no further information in the Unicode standard about the implications of a property being stabilized.

Deprecated

Obsolete properties may be deprecated. This means that their use is strongly discouraged, so much so that a warning will be issued if used, unless the regular expression is in the scope of a no warnings 'deprecated' statement. $A_bold_deprecated flags each such entry in the table, and the entry there for the longest, most descriptive version of the property will give the reason it is deprecated, and perhaps advice. Perl may issue such a warning, even for properties that aren't officially deprecated by Unicode, when there used to be characters or code points that were matched by them, but no longer. This is to warn you that your program may not work like it did on earlier Unicode releases.

A deprecated property may be made unavailable in a future Perl version, so it is best to move away from them.

Some Perl extensions are present for backwards compatibility and are discouraged from being used, but not obsolete. $A_bold_discouraged flags each such entry in the table.

@block_warning

The table below has two columns. The left column contains the \\p{} constructs to look up, possibly preceeded by the flags mentioned above; and the right column contains information about them, like a description, or synonyms. It shows both the single and compound forms for each property that has them. If the left column is a short name for a property, the right column will give its longer, more descriptive name; and if the left column is the longest name, the right column will show any equivalent shortest name, in both single and compound forms if applicable.

The right column will also caution you if a property means something different than what might normally be expected.

Numbers in (parentheses) indicate the total number of code points matched by the property. For emphasis, those properties that match no code points at all are listed as well in a separate section following the table.

There is no description given for most non-Perl defined properties (See $unicode_reference_url for that).

For compactness, '*' is used as a wildcard instead of showing all possible combinations. For example, entries like:

\\p{Gc: *}                                  \\p{General_Category: *}

mean that 'Gc' is a synonym for 'General_Category', and anything that is valid for the latter is also valid for the former. Similarly,

\\p{Is_*}                                   \\p{*}

means that if and only if, for example, \\p{Foo} exists, then \\p{Is_Foo} and \\p{IsFoo} are also valid and all mean the same thing. And similarly, \\p{Foo=Bar} means the same as \\p{Is_Foo=Bar} and \\p{IsFoo=Bar}. '*' here is restricted to something not beginning with an underscore.

Also, in binary properties, 'Yes', 'T', and 'True' are all synonyms for 'Y'. And 'No', 'F', and 'False' are all synonyms for 'N'. The table shows 'Y*' and 'N*' to indicate this, and doesn't have separate entries for the other possibilities. Note that not all properties which have values 'Yes' and 'No' are binary, and they have all their values spelled out without using this wild card, and a NOT clause in their description that highlights their not being binary. These also require the compound form to match them, whereas true binary properties have both single and compound forms available.

Note that all non-essential underscores are removed in the display of the short names below.

Summary legend:

* is a wild-card
(\\d+) in the info column gives the number of code points matched by this property.
$DEPRECATED means this is deprecated.
$OBSOLETE means this is obsolete.
$STABILIZED means this is stabilized.
$STRICTER means tighter (stricter) name matching applies.
$DISCOURAGED means use of this form is discouraged.

$formatted_properties

$zero_matches

Properties not accessible through \\p{} and \\P{}

A few properties are accessible in Perl via various function calls only. These are: Lowercase_Mapping lc() and lcfirst() Titlecase_Mapping ucfirst() Uppercase_Mapping uc()

Case_Folding is accessible through the /i modifier in regular expressions.

The Name property is accessible through the \\N{} interpolation in double-quoted strings and regular expressions, but both usages require a use charnames; to be specified, which also contains related functions viacode() and vianame().

Unicode regular expression properties that are NOT accepted by Perl

Perl will generate an error for a few character properties in Unicode when used in a regular expression. The non-Unihan ones are listed below, with the reasons they are not accepted, perhaps with work-arounds. The short names for the properties are listed enclosed in (parentheses).

    @bad_re_properties

An installation can choose to allow any of these to be matched by changing the controlling lists contained in the program \$Config{privlib}/unicore/$0 and then re-running $0. (\%Config is available from the Config module).

Files in the To directory (for serious hackers only)

All Unicode properties are really mappings (in the mathematical sense) from code points to their respective values. As part of its build process, Perl constructs tables containing these mappings for all properties that it deals with. But only a few of these are written out into files. Those written out are in the directory \$Config{privlib}/unicore/To/ (%Config is available from the Config module).

Those ones written are ones needed by Perl internally during execution, or for which there is some demand, and those for which there is no access through the Perl core. Generally, properties that can be used in regular expression matching do not have their map tables written, like Script. Nor are the simplistic properties that have a better, more complete version, such as Simple_Uppercase_Mapping (Uppercase_Mapping is written instead).

None of the properties in the To directory are currently directly accessible through the Perl core, although some may be accessed indirectly. For example, the uc() function implements the Uppercase_Mapping property and uses the Upper.pl file found in this directory.

The available files with their properties (short names in parentheses), and any flags or comments about them, are:

@map_tables_actually_output

An installation can choose to change which files are generated by changing the controlling lists contained in the program \$Config{privlib}/unicore/$0 and then re-running $0.

Each of these files defines two hash entries to help reading programs decipher it. One of them looks like this:

\$utf8::SwashInfo{'ToNAME'}{'format'} = 's';

where 'NAME' is a name to indicate the property. For backwards compatibility, this is not necessarily the property's official Unicode name. (The 'To' is also for backwards compatibility.) The hash entry gives the format of the mapping fields of the table, currently one of the following:

@map_table_formats

This format applies only to the entries in the main body of the table. Entries defined in hashes or ones that are missing from the list can have a different format.

The value that the missing entries have is given by the other SwashInfo hash entry line; it looks like this:

\$utf8::SwashInfo{'ToNAME'}{'missing'} = 'NaN';

This example line says that any Unicode code points not explicitly listed in the file have the value 'NaN' under the property indicated by NAME. If the value is the special string <code point>, it means that the value for any missing code point is the code point itself. This happens, for example, in the file for Uppercase_Mapping (To/Upper.pl), in which code points like the character 'A', are missing because the uppercase of 'A' is itself.

SEE ALSO

$unicode_reference_url

perlrecharclass

perlunicode

END

# And write it.
main::write([ $pod_directory, "$pod_file.pod" ], @OUT);
return;
}

sub make_Heavy () { # Create and write Heavy.pl, which passes info about the tables to # utf8_heavy.pl

my @heavy = <<END;
$HEADER
$INTERNAL_ONLY

# This file is for the use of utf8_heavy.pl

# Maps property names in loose standard form to its standard name \%utf8::loose_property_name_of = ( END

push @heavy, simple_dumper (\%loose_property_name_of, ' ' x 4);
push @heavy, <<END;
);

# Maps property, table to file for those using stricter matching \%utf8::stricter_to_file_of = ( END push @heavy, simple_dumper (\%stricter_to_file_of, ' ' x 4); push @heavy, <<END; );

# Maps property, table to file for those using loose matching \%utf8::loose_to_file_of = ( END push @heavy, simple_dumper (\%loose_to_file_of, ' ' x 4); push @heavy, <<END; );

# Maps floating point to fractional form \%utf8::nv_floating_to_rational = ( END push @heavy, simple_dumper (\%nv_floating_to_rational, ' ' x 4); push @heavy, <<END; );

# If a floating point number doesn't have enough digits in it to get this # close to a fraction, it isn't considered to be that fraction even if all the # digits it does have match. \$utf8::max_floating_slop = $MAX_FLOATING_SLOP;

# Deprecated tables to generate a warning for. The key is the file containing # the table, so as to avoid duplication, as many property names can map to the # file, but we only need one entry for all of them. \%utf8::why_deprecated = ( END

push @heavy, simple_dumper (\%utf8::why_deprecated, ' ' x 4);
push @heavy, <<END;
);

1; END

main::write("Heavy.pl", @heavy);
return;
}

sub write_all_tables() { # Write out all the tables generated by this program to files, as well as # the supporting data structures, pod file, and .t file.

my @writables;              # List of tables that actually get written
my %match_tables_to_write;  # Used to collapse identical match tables
                            # into one file.  Each key is a hash function
                            # result to partition tables into buckets.
                            # Each value is an array of the tables that
                            # fit in the bucket.

# For each property ...
# (sort so that if there is an immutable file name, it has precedence, so
# some other property can't come in and take over its file name.  If b's
# file name is defined, will return 1, meaning to take it first; don't
# care if both defined, as they had better be different anyway)
PROPERTY:
foreach my $property (sort { defined $b->file } property_ref('*')) {
    my $type = $property->type;

    # And for each table for that property, starting with the mapping
    # table for it ...
    TABLE:
    foreach my $table($property,

                    # and all the match tables for it (if any), sorted so
                    # the ones with the shortest associated file name come
                    # first.  The length sorting prevents problems of a
                    # longer file taking a name that might have to be used
                    # by a shorter one.  The alphabetic sorting prevents
                    # differences between releases
                    sort {  my $ext_a = $a->external_name;
                            return 1 if ! defined $ext_a;
                            my $ext_b = $b->external_name;
                            return -1 if ! defined $ext_b;
                            my $cmp = length $ext_a <=> length $ext_b;

                            # Return result if lengths not equal
                            return $cmp if $cmp;

                            # Alphabetic if lengths equal
                            return $ext_a cmp $ext_b
                    } $property->tables
                )
    {

        # Here we have a table associated with a property.  It could be
        # the map table (done first for each property), or one of the
        # other tables.  Determine which type.
        my $is_property = $table->isa('Property');

        my $name = $table->name;
        my $complete_name = $table->complete_name;

        # See if should suppress the table if is empty, but warn if it
        # contains something.
        my $suppress_if_empty_warn_if_not = grep { $complete_name eq $_ }
                                keys %why_suppress_if_empty_warn_if_not;

        # Calculate if this table should have any code points associated
        # with it or not.
        my $expected_empty =

            # $perl should be empty, as well as properties that we just
            # don't do anything with
            ($is_property
                && ($table == $perl
                    || grep { $complete_name eq $_ }
                                                @unimplemented_properties
                )
            )

            # Match tables in properties we skipped populating should be
            # empty
            || (! $is_property && ! $property->to_create_match_tables)

            # Tables and properties that are expected to have no code
            # points should be empty
            || $suppress_if_empty_warn_if_not
        ;

        # Set a boolean if this table is the complement of an empty binary
        # table
        my $is_complement_of_empty_binary =
            $type == $BINARY &&
            (($table == $property->table('Y')
                && $property->table('N')->is_empty)
            || ($table == $property->table('N')
                && $property->table('Y')->is_empty));


        # Some tables should match everything
        my $expected_full =
            ($is_property)
            ? # All these types of map tables will be full because
              # they will have been populated with defaults
              ($type == $ENUM || $type == $BINARY)

            : # A match table should match everything if its method
              # shows it should
              ($table->matches_all

              # The complement of an empty binary table will match
              # everything
              || $is_complement_of_empty_binary
              )
        ;

        if ($table->is_empty) {


            if ($suppress_if_empty_warn_if_not) {
                $table->set_status($SUPPRESSED,
                    $why_suppress_if_empty_warn_if_not{$complete_name});
            }

            # Suppress expected empty tables.
            next TABLE if $expected_empty;

            # And setup to later output a warning for those that aren't
            # known to be allowed to be empty.  Don't do the warning if
            # this table is a child of another one to avoid duplicating
            # the warning that should come from the parent one.
            if (($table == $property || $table->parent == $table)
                && $table->status ne $SUPPRESSED
                && ! grep { $complete_name =~ /^$_$/ }
                                                @tables_that_may_be_empty)
            {
                push @unhandled_properties, "$table";
            }
        }
        elsif ($expected_empty) {
            my $because = "";
            if ($suppress_if_empty_warn_if_not) {
                $because = " because $why_suppress_if_empty_warn_if_not{$complete_name}";
            }

            Carp::my_carp("Not expecting property $table$because.  Generating file for it anyway.");
        }

        my $count = $table->count;
        if ($expected_full) {
            if ($count != $MAX_UNICODE_CODEPOINTS) {
                Carp::my_carp("$table matches only "
                . clarify_number($count)
                . " Unicode code points but should match "
                . clarify_number($MAX_UNICODE_CODEPOINTS)
                . " (off by "
                .  clarify_number(abs($MAX_UNICODE_CODEPOINTS - $count))
                . ").  Proceeding anyway.");
            }

            # Here is expected to be full.  If it is because it is the
            # complement of an (empty) binary table that is to be
            # suppressed, then suppress this one as well.
            if ($is_complement_of_empty_binary) {
                my $opposing_name = ($name eq 'Y') ? 'N' : 'Y';
                my $opposing = $property->table($opposing_name);
                my $opposing_status = $opposing->status;
                if ($opposing_status) {
                    $table->set_status($opposing_status,
                                       $opposing->status_info);
                }
            }
        }
        elsif ($count == $MAX_UNICODE_CODEPOINTS) {
            if ($table == $property || $table->leader == $table) {
                Carp::my_carp("$table unexpectedly matches all Unicode code points.  Proceeding anyway.");
            }
        }

        if ($table->status eq $SUPPRESSED) {
            if (! $is_property) {
                my @children = $table->children;
                foreach my $child (@children) {
                    if ($child->status ne $SUPPRESSED) {
                        Carp::my_carp_bug("'$table' is suppressed and has a child '$child' which isn't");
                    }
                }
            }
            next TABLE;

        }
        if (! $is_property) {

            # Several things need to be done just once for each related
            # group of match tables.  Do them on the parent.
            if ($table->parent == $table) {

                # Add an entry in the pod file for the table; it also does
                # the children.
                make_table_pod_entries($table) if defined $pod_directory;

                # See if the the table matches identical code points with
                # something that has already been output.  In that case,
                # no need to have two files with the same code points in
                # them.  We use the table's hash() method to store these
                # in buckets, so that it is quite likely that if two
                # tables are in the same bucket they will be identical, so
                # don't have to compare tables frequently.  The tables
                # have to have the same status to share a file, so add
                # this to the bucket hash.  (The reason for this latter is
                # that Heavy.pl associates a status with a file.)
                my $hash = $table->hash . ';' . $table->status;

                # Look at each table that is in the same bucket as this
                # one would be.
                foreach my $comparison (@{$match_tables_to_write{$hash}})
                {
                    if ($table->matches_identically_to($comparison)) {
                        $table->set_equivalent_to($comparison,
                                                            Related => 0);
                        next TABLE;
                    }
                }

                # Here, not equivalent, add this table to the bucket.
                push @{$match_tables_to_write{$hash}}, $table;
            }
        }
        else {

            # Here is the property itself.
            # Don't write out or make references to the $perl property
            next if $table == $perl;

            if ($type != $STRING) {

                # There is a mapping stored of the various synonyms to the
                # standardized name of the property for utf8_heavy.pl.
                # Also, the pod file contains entries of the form:
                # \p{alias: *}         \p{full: *}
                # rather than show every possible combination of things.

                my @property_aliases = $property->aliases;

                # The full name of this property is stored by convention
                # first in the alias array
                my $full_property_name =
                            '\p{' . $property_aliases[0]->name . ': *}';
                my $standard_property_name = standardize($table->name);

                # For each synonym ...
                for my $i (0 .. @property_aliases - 1)  {
                    my $alias = $property_aliases[$i];
                    my $alias_name = $alias->name;
                    my $alias_standard = standardize($alias_name);

                    # Set the mapping for utf8_heavy of the alias to the
                    # property
                    if (exists ($loose_property_name_of{$alias_standard}))
                    {
                        Carp::my_carp("There already is a property with the same standard name as $alias_name: $loose_property_name_of{$alias_standard}.  Old name is retained");
                    }
                    else {
                        $loose_property_name_of{$alias_standard}
                                            = $standard_property_name;
                    }

                    # Now for the pod entry for this alias.  Skip if not
                    # outputting a pod; skip the first one, which is the
                    # full name so won't have an entry like: '\p{full: *}
                    # \p{full: *}', and skip if don't want an entry for
                    # this one.
                    next if $i == 0
                            || ! defined $pod_directory
                            || ! $alias->make_pod_entry;

                    push @match_properties,
                        format_pod_line($indent_info_column,
                                    '\p{' . $alias->name . ': *}',
                                    $full_property_name,
                                    $alias->status);
                }
            } # End of non-string-like property code


            # Don't output a mapping file if not desired.
            next if ! $property->to_output_map;
        }

        # Here, we know we want to write out the table, but don't do it
        # yet because there may be other tables that come along and will
        # want to share the file, and the file's comments will change to
        # mention them.  So save for later.
        push @writables, $table;

    } # End of looping through the property and all its tables.
} # End of looping through all properties.

# Now have all the tables that will have files written for them.  Do it.
foreach my $table (@writables) {
    my @directory;
    my $filename;
    my $property = $table->property;
    my $is_property = ($table == $property);
    if (! $is_property) {

        # Match tables for the property go in lib/$subdirectory, which is
        # the property's name.  Don't use the standard file name for this,
        # as may get an unfamiliar alias
        @directory = ($matches_directory, $property->external_name);
    }
    else {

        @directory = $table->directory;
        $filename = $table->file;
    }

    # Use specified filename if avaliable, or default to property's
    # shortest name.  We need an 8.3 safe filename (which means "an 8
    # safe" filename, since after the dot is only 'pl', which is < 3)
    # The 2nd parameter is if the filename shouldn't be changed, and
    # it shouldn't iff there is a hard-coded name for this table.
    $filename = construct_filename(
                            $filename || $table->external_name,
                            ! $filename,    # mutable if no filename
                            \@directory);

    register_file_for_name($table, \@directory, $filename);

    # Only need to write one file when shared by more than one
    # property
    next if ! $is_property && $table->leader != $table;

    # Construct a nice comment to add to the file
    $table->set_final_comment;

    $table->write;
}


# Write out the pod file
make_pod;

# And Heavy.pl
make_Heavy;

make_property_test_script() if $make_test_script;
return;
}

my @white_space_separators = ( # This used only for making the test script. "", ' ', "\t", ' ' );

sub generate_separator($) { # This used only for making the test script. It generates the colon or # equal separator between the property and property value, with random # white space surrounding the separator

my $lhs = shift;

return "" if $lhs eq "";  # No separator if there's only one (the r) side

# Choose space before and after randomly
my $spaces_before =$white_space_separators[rand(@white_space_separators)];
my $spaces_after = $white_space_separators[rand(@white_space_separators)];

# And return the whole complex, half the time using a colon, half the
# equals
return $spaces_before
        . (rand() < 0.5) ? '=' : ':'
        . $spaces_after;
}

sub generate_tests($$$$$$) { # This used only for making the test script. It generates test cases that # are expected to compile successfully in perl. Note that the lhs and # rhs are assumed to already be as randomized as the caller wants.

my $file_handle = shift;   # Where to output the tests
my $lhs = shift;           # The property: what's to the left of the colon
                           #  or equals separator
my $rhs = shift;           # The property value; what's to the right
my $valid_code = shift;    # A code point that's known to be in the
                           # table given by lhs=rhs; undef if table is
                           # empty
my $invalid_code = shift;  # A code point known to not be in the table;
                           # undef if the table is all code points
my $warning = shift;

# Get the colon or equal
my $separator = generate_separator($lhs);

# The whole 'property=value'
my $name = "$lhs$separator$rhs";

# Create a complete set of tests, with complements.
if (defined $valid_code) {
    printf $file_handle
                qq/Expect(1, $valid_code, '\\p{$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(0, $valid_code, '\\p{^$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(0, $valid_code, '\\P{$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(1, $valid_code, '\\P{^$name}', $warning);\n/;
}
if (defined $invalid_code) {
    printf $file_handle
                qq/Expect(0, $invalid_code, '\\p{$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(1, $invalid_code, '\\p{^$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(1, $invalid_code, '\\P{$name}', $warning);\n/;
    printf $file_handle
                qq/Expect(0, $invalid_code, '\\P{^$name}', $warning);\n/;
}
return;
}

sub generate_error($$$$) { # This used only for making the test script. It generates test cases that # are expected to not only not match, but to be syntax or similar errors

my $file_handle = shift;        # Where to output to.
my $lhs = shift;                # The property: what's to the left of the
                                # colon or equals separator
my $rhs = shift;                # The property value; what's to the right
my $already_in_error = shift;   # Boolean; if true it's known that the
                            # unmodified lhs and rhs will cause an error.
                            # This routine should not force another one
# Get the colon or equal
my $separator = generate_separator($lhs);

# Since this is an error only, don't bother to randomly decide whether to
# put the error on the left or right side; and assume that the rhs is
# loosely matched, again for convenience rather than rigor.
$rhs = randomize_loose_name($rhs, 'ERROR') unless $already_in_error;

my $property = $lhs . $separator . $rhs;

print $file_handle qq/Error('\\p{$property}');\n/;
print $file_handle qq/Error('\\P{$property}');\n/;
return;
}

# These are used only for making the test script # XXX Maybe should also have a bad strict seps, which includes underscore.

my @good_loose_seps = ( " ", "-", "\t", "", "_", ); my @bad_loose_seps = ( "/a/", ':=', );

sub randomize_stricter_name { # This used only for making the test script. Take the input name and # return a randomized, but valid version of it under the stricter matching # rules.

my $name = shift;
Carp::carp_extra_args(\@_) if main::DEBUG && @_;

# If the name looks like a number (integer, floating, or rational), do
# some extra work
if ($name =~ qr{ ^ ( -? ) (\d+ ( ( [./] ) \d+ )? ) $ }x) {
    my $sign = $1;
    my $number = $2;
    my $separator = $3;

    # If there isn't a sign, part of the time add a plus
    # Note: Not testing having any denominator having a minus sign
    if (! $sign) {
        $sign = '+' if rand() <= .3;
    }

    # And add 0 or more leading zeros.
    $name = $sign . ('0' x int rand(10)) . $number;

    if (defined $separator) {
        my $extra_zeros = '0' x int rand(10);

        if ($separator eq '.') {

            # Similarly, add 0 or more trailing zeros after a decimal
            # point
            $name .= $extra_zeros;
        }
        else {

            # Or, leading zeros before the denominator
            $name =~ s,/,/$extra_zeros,;
        }
    }
}

# For legibility of the test, only change the case of whole sections at a
# time.  To do this, first split into sections.  The split returns the
# delimiters
my @sections;
for my $section (split / ( [ - + \s _ . ]+ ) /x, $name) {
    trace $section if main::DEBUG && $to_trace;

    if (length $section > 1 && $section !~ /\D/) {

        # If the section is a sequence of digits, about half the time
        # randomly add underscores between some of them.
        if (rand() > .5) {

            # Figure out how many underscores to add.  max is 1 less than
            # the number of digits.  (But add 1 at the end to make sure
            # result isn't 0, and compensate earlier by subtracting 2
            # instead of 1)
            my $num_underscores = int rand(length($section) - 2) + 1;

            # And add them evenly throughout, for convenience, not rigor
            use integer;
            my $spacing = (length($section) - 1)/ $num_underscores;
            my $temp = $section;
            $section = "";
            for my $i (1 .. $num_underscores) {
                $section .= substr($temp, 0, $spacing, "") . '_';
            }
            $section .= $temp;
        }
        push @sections, $section;
    }
    else {

        # Here not a sequence of digits.  Change the case of the section
        # randomly
        my $switch = int rand(4);
        if ($switch == 0) {
            push @sections, uc $section;
        }
        elsif ($switch == 1) {
            push @sections, lc $section;
        }
        elsif ($switch == 2) {
            push @sections, ucfirst $section;
        }
        else {
            push @sections, $section;
        }
    }
}
trace "returning", join "", @sections if main::DEBUG && $to_trace;
return join "", @sections;
}

sub randomize_loose_name($;$) { # This used only for making the test script

my $name = shift;
my $want_error = shift;  # if true, make an error
Carp::carp_extra_args(\@_) if main::DEBUG && @_;

$name = randomize_stricter_name($name);

my @parts;
push @parts, $good_loose_seps[rand(@good_loose_seps)];
for my $part (split /[-\s_]+/, $name) {
    if (@parts) {
        if ($want_error and rand() < 0.3) {
            push @parts, $bad_loose_seps[rand(@bad_loose_seps)];
            $want_error = 0;
        }
        else {
            push @parts, $good_loose_seps[rand(@good_loose_seps)];
        }
    }
    push @parts, $part;
}
my $new = join("", @parts);
trace "$name => $new" if main::DEBUG && $to_trace;

if ($want_error) {
    if (rand() >= 0.5) {
        $new .= $bad_loose_seps[rand(@bad_loose_seps)];
    }
    else {
        $new = $bad_loose_seps[rand(@bad_loose_seps)] . $new;
    }
}
return $new;
}

# Used to make sure don't generate duplicate test cases. my %test_generated;

sub make_property_test_script() { # This used only for making the test script # this written directly -- it's huge.

print "Making test script\n" if $verbosity >= $PROGRESS;

# This uses randomness to test different possibilities without testing all
# possibilities.  To ensure repeatability, set the seed to 0.  But if
# tests are added, it will perturb all later ones in the .t file
srand 0;

$t_path = 'TestProp.pl' unless defined $t_path; # the traditional name

force_unlink ($t_path);
push @files_actually_output, $t_path;
my $OUT;
if (not open $OUT, "> $t_path") {
    Carp::my_carp("Can't open $t_path.  Skipping: $!");
    return;
}

# Keep going down an order of magnitude
# until find that adding this quantity to
# 1 remains 1; but put an upper limit on
# this so in case this algorithm doesn't
# work properly on some platform, that we
# won't loop forever.
my $digits = 0;
my $min_floating_slop = 1;
while (1+ $min_floating_slop != 1
        && $digits++ < 50)
{
    my $next = $min_floating_slop / 10;
    last if $next == 0; # If underflows,
                        # use previous one
    $min_floating_slop = $next;
}
print $OUT $HEADER, <DATA>;

foreach my $property (property_ref('*')) {
    foreach my $table ($property->tables) {

        # Find code points that match, and don't match this table.
        my $valid = $table->get_valid_code_point;
        my $invalid = $table->get_invalid_code_point;
        my $warning = ($table->status eq $DEPRECATED)
                        ? "'deprecated'"
                        : '""';

        # Test each possible combination of the property's aliases with
        # the table's.  If this gets to be too many, could do what is done
        # in the set_final_comment() for Tables
        my @table_aliases = $table->aliases;
        my @property_aliases = $table->property->aliases;
        my $max = max(scalar @table_aliases, scalar @property_aliases);
        for my $j (0 .. $max - 1) {

            # The current alias for property is the next one on the list,
            # or if beyond the end, start over.  Similarly for table
            my $property_name
                        = $property_aliases[$j % @property_aliases]->name;

            $property_name = "" if $table->property == $perl;
            my $table_alias = $table_aliases[$j % @table_aliases];
            my $table_name = $table_alias->name;
            my $loose_match = $table_alias->loose_match;

            # If the table doesn't have a file, any test for it is
            # already guaranteed to be in error
            my $already_error = ! $table->file_path;

            # Generate error cases for this alias.
            generate_error($OUT,
                            $property_name,
                            $table_name,
                            $already_error);

            # If the table is guaranteed to always generate an error,
            # quit now without generating success cases.
            next if $already_error;

            # Now for the success cases.
            my $random;
            if ($loose_match) {

                # For loose matching, create an extra test case for the
                # standard name.
                my $standard = standardize($table_name);

                # $test_name should be a unique combination for each test
                # case; used just to avoid duplicate tests
                my $test_name = "$property_name=$standard";

                # Don't output duplicate test cases.
                if (! exists $test_generated{$test_name}) {
                    $test_generated{$test_name} = 1;
                    generate_tests($OUT,
                                    $property_name,
                                    $standard,
                                    $valid,
                                    $invalid,
                                    $warning,
                                );
                }
                $random = randomize_loose_name($table_name)
            }
            else { # Stricter match
                $random = randomize_stricter_name($table_name);
            }

            # Now for the main test case for this alias.
            my $test_name = "$property_name=$random";
            if (! exists $test_generated{$test_name}) {
                $test_generated{$test_name} = 1;
                generate_tests($OUT,
                                $property_name,
                                $random,
                                $valid,
                                $invalid,
                                $warning,
                            );

                # If the name is a rational number, add tests for the
                # floating point equivalent.
                if ($table_name =~ qr{/}) {

                    # Calculate the float, and find just the fraction.
                    my $float = eval $table_name;
                    my ($whole, $fraction)
                                        = $float =~ / (.*) \. (.*) /x;

                    # Starting with one digit after the decimal point,
                    # create a test for each possible precision (number of
                    # digits past the decimal point) until well beyond the
                    # native number found on this machine.  (If we started
                    # with 0 digits, it would be an integer, which could
                    # well match an unrelated table)
                    PLACE:
                    for my $i (1 .. $min_floating_slop + 3) {
                        my $table_name = sprintf("%.*f", $i, $float);
                        if ($i < $MIN_FRACTION_LENGTH) {

                            # If the test case has fewer digits than the
                            # minimum acceptable precision, it shouldn't
                            # succeed, so we expect an error for it.
                            # E.g., 2/3 = .7 at one decimal point, and we
                            # shouldn't say it matches .7.  We should make
                            # it be .667 at least before agreeing that the
                            # intent was to match 2/3.  But at the
                            # less-than- acceptable level of precision, it
                            # might actually match an unrelated number.
                            # So don't generate a test case if this
                            # conflating is possible.  In our example, we
                            # don't want 2/3 matching 7/10, if there is
                            # a 7/10 code point.
                            for my $existing
                                    (keys %nv_floating_to_rational)
                            {
                                next PLACE
                                    if abs($table_name - $existing)
                                            < $MAX_FLOATING_SLOP;
                            }
                            generate_error($OUT,
                                        $property_name,
                                        $table_name,
                                        1   # 1 => already an error
                            );
                        }
                        else {

                            # Here the number of digits exceeds the
                            # minimum we think is needed.  So generate a
                            # success test case for it.
                            generate_tests($OUT,
                                            $property_name,
                                            $table_name,
                                            $valid,
                                            $invalid,
                                            $warning,
                            );
                        }
                    }
                }
            }
        }
    }
}

foreach my $test (@backslash_X_tests) {
    print $OUT "Test_X('$test');\n";
}

print $OUT "Finished();\n";
close $OUT;
return;
}

# This is a list of the input files and how to handle them. The files are # processed in their order in this list. Some reordering is possible if # desired, but the v0 files should be first, and the extracted before the # others except DAge.txt (as data in an extracted file can be over-ridden by # the non-extracted. Some other files depend on data derived from an earlier # file, like UnicodeData requires data from Jamo, and the case changing and # folding requires data from Unicode. Mostly, it safest to order by first # version releases in (except the Jamo). DAge.txt is read before the # extracted ones because of the rarely used feature $compare_versions. In the # unlikely event that there were ever an extracted file that contained the Age # property information, it would have to go in front of DAge. # # The version strings allow the program to know whether to expect a file or # not, but if a file exists in the directory, it will be processed, even if it # is in a version earlier than expected, so you can copy files from a later # release into an earlier release's directory. my @input_file_objects = ( Input_file->new('PropertyAliases.txt', v0, Handler => \&process_PropertyAliases, ), Input_file->new(undef, v0, # No file associated with this Progress_Message => 'Finishing property setup', Handler => \&finish_property_setup, ), Input_file->new('PropValueAliases.txt', v0, Handler => \&process_PropValueAliases, Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new('DAge.txt', v3.2.0, Has_Missings_Defaults => $NOT_IGNORED, Property => 'Age' ), Input_file->new("${EXTRACTED}DGeneralCategory.txt", v3.1.0, Property => 'General_Category', ), Input_file->new("${EXTRACTED}DCombiningClass.txt", v3.1.0, Property => 'Canonical_Combining_Class', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DNumType.txt", v3.1.0, Property => 'Numeric_Type', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DEastAsianWidth.txt", v3.1.0, Property => 'East_Asian_Width', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DLineBreak.txt", v3.1.0, Property => 'Line_Break', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DBidiClass.txt", v3.1.1, Property => 'Bidi_Class', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DDecompositionType.txt", v3.1.0, Property => 'Decomposition_Type', Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DBinaryProperties.txt", v3.1.0), Input_file->new("${EXTRACTED}DNumValues.txt", v3.1.0, Property => 'Numeric_Value', Each_Line_Handler => \&filter_numeric_value_line, Has_Missings_Defaults => $NOT_IGNORED, ), Input_file->new("${EXTRACTED}DJoinGroup.txt", v3.1.0, Property => 'Joining_Group', Has_Missings_Defaults => $NOT_IGNORED, ),

Input_file->new("${EXTRACTED}DJoinType.txt", v3.1.0,
                Property => 'Joining_Type',
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new('Jamo.txt', v2.0.0,
                Property => 'Jamo_Short_Name',
                Each_Line_Handler => \&filter_jamo_line,
                ),
Input_file->new('UnicodeData.txt', v1.1.5,
                Pre_Handler => \&setup_UnicodeData,

                # We clean up this file for some early versions.
                Each_Line_Handler => [ (($v_version lt v2.0.0 )
                                        ? \&filter_v1_ucd
                                        : ($v_version eq v2.1.5)
                                            ? \&filter_v2_1_5_ucd
                                            : undef),

                                        # And the main filter
                                        \&filter_UnicodeData_line,
                                     ],
                EOF_Handler => \&EOF_UnicodeData,
                ),
Input_file->new('ArabicShaping.txt', v2.0.0,
                Each_Line_Handler =>
                    [ ($v_version lt 4.1.0)
                                ? \&filter_old_style_arabic_shaping
                                : undef,
                    \&filter_arabic_shaping_line,
                    ],
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new('Blocks.txt', v2.0.0,
                Property => 'Block',
                Has_Missings_Defaults => $NOT_IGNORED,
                Each_Line_Handler => \&filter_blocks_lines
                ),
Input_file->new('PropList.txt', v2.0.0,
                Each_Line_Handler => (($v_version lt v3.1.0)
                                        ? \&filter_old_style_proplist
                                        : undef),
                ),
Input_file->new('Unihan.txt', v2.0.0,
                Pre_Handler => \&setup_unihan,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                    ),
Input_file->new('SpecialCasing.txt', v2.1.8,
                Each_Line_Handler => \&filter_special_casing_line,
                Pre_Handler => \&setup_special_casing,
                ),
Input_file->new(
                'LineBreak.txt', v3.0.0,
                Has_Missings_Defaults => $NOT_IGNORED,
                Property => 'Line_Break',
                # Early versions had problematic syntax
                Each_Line_Handler => (($v_version lt v3.1.0)
                                    ? \&filter_early_ea_lb
                                    : undef),
                ),
Input_file->new('EastAsianWidth.txt', v3.0.0,
                Property => 'East_Asian_Width',
                Has_Missings_Defaults => $NOT_IGNORED,
                # Early versions had problematic syntax
                Each_Line_Handler => (($v_version lt v3.1.0)
                                    ? \&filter_early_ea_lb
                                    : undef),
                ),
Input_file->new('CompositionExclusions.txt', v3.0.0,
                Property => 'Composition_Exclusion',
                ),
Input_file->new('BidiMirroring.txt', v3.0.1,
                Property => 'Bidi_Mirroring_Glyph',
                ),
Input_file->new("NormalizationTest.txt", v3.0.1,
                Skip => 1,
                ),
Input_file->new('CaseFolding.txt', v3.0.1,
                Pre_Handler => \&setup_case_folding,
                Each_Line_Handler =>
                    [ ($v_version lt v3.1.0)
                             ? \&filter_old_style_case_folding
                             : undef,
                       \&filter_case_folding_line
                    ],
                Post_Handler => \&post_fold,
                ),
Input_file->new('DCoreProperties.txt', v3.1.0,
                # 5.2 changed this file
                Has_Missings_Defaults => (($v_version ge v5.2.0)
                                        ? $NOT_IGNORED
                                        : $NO_DEFAULTS),
                ),
Input_file->new('Scripts.txt', v3.1.0,
                Property => 'Script',
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new('DNormalizationProps.txt', v3.1.0,
                Has_Missings_Defaults => $NOT_IGNORED,
                Each_Line_Handler => (($v_version lt v4.0.1)
                                  ? \&filter_old_style_normalization_lines
                                  : undef),
                ),
Input_file->new('HangulSyllableType.txt', v4.0.0,
                Has_Missings_Defaults => $NOT_IGNORED,
                Property => 'Hangul_Syllable_Type'),
Input_file->new("$AUXILIARY/WordBreakProperty.txt", v4.1.0,
                Property => 'Word_Break',
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new("$AUXILIARY/GraphemeBreakProperty.txt", v4.1.0,
                Property => 'Grapheme_Cluster_Break',
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new("$AUXILIARY/GCBTest.txt", v4.1.0,
                Handler => \&process_GCB_test,
                ),
Input_file->new("$AUXILIARY/LBTest.txt", v4.1.0,
                Skip => 1,
                ),
Input_file->new("$AUXILIARY/SBTest.txt", v4.1.0,
                Skip => 1,
                ),
Input_file->new("$AUXILIARY/WBTest.txt", v4.1.0,
                Skip => 1,
                ),
Input_file->new("$AUXILIARY/SentenceBreakProperty.txt", v4.1.0,
                Property => 'Sentence_Break',
                Has_Missings_Defaults => $NOT_IGNORED,
                ),
Input_file->new('NamedSequences.txt', v4.1.0,
                Handler => \&process_NamedSequences
                ),
Input_file->new('NameAliases.txt', v5.0.0,
                Property => 'Name_Alias',
                ),
Input_file->new("BidiTest.txt", v5.2.0,
                Skip => 1,
                ),
Input_file->new('UnihanIndicesDictionary.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanDataDictionaryLike.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanIRGSources.txt', v5.2.0,
                Optional => 1,
                Pre_Handler => \&setup_unihan,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanNumericValues.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanOtherMappings.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanRadicalStrokeCounts.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanReadings.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
Input_file->new('UnihanVariants.txt', v5.2.0,
                Optional => 1,
                Each_Line_Handler => \&filter_unihan_line,
                ),
);

# End of all the preliminaries. # Do it...

if ($compare_versions) { Carp::my_carp(<<END Warning. \$compare_versions is set. Output is not suitable for production END ); }

# Put into %potential_files a list of all the files in the directory structure # that could be inputs to this program, excluding those that we should ignore. # Use absolute file names because it makes it easier across machine types. my @ignored_files_full_names = map { File::Spec->rel2abs( internal_file_to_platform($_)) } keys %ignored_files; File::Find::find({ wanted=>sub { return unless /\.txt$/i; # Some platforms change the name's case my $full = lc(File::Spec->rel2abs($_)); $potential_files{$full} = 1 if ! grep { $full eq lc($_) } @ignored_files_full_names; return; } }, File::Spec->curdir());

my @mktables_list_output_files;

if ($write_unchanged_files) { print "Not checking file list '$file_list'.\n" if $verbosity >= $VERBOSE; } else { print "Reading file list '$file_list'\n" if $verbosity >= $VERBOSE; my $file_handle; if (! open $file_handle, "<", $file_list) { Carp::my_carp("Failed to open '$file_list' (this is expected to be missing the first time); turning on -globlist option instead: $!"); $glob_list = 1; } else { my @input;

    # Read and parse mktables.lst, placing the results from the first part
    # into @input, and the second part into @mktables_list_output_files
    for my $list ( \@input, \@mktables_list_output_files ) {
        while (<$file_handle>) {
            s/^ \s+ | \s+ $//xg;
            next if /^ \s* (?: \# .* )? $/x;
            last if /^ =+ $/x;
            my ( $file ) = split /\t/;
            push @$list, $file;
        }
        @$list = uniques(@$list);
        next;
    }

    # Look through all the input files
    foreach my $input (@input) {
        next if $input eq 'version'; # Already have checked this.

        # Ignore if doesn't exist.  The checking about whether we care or
        # not is done via the Input_file object.
        next if ! file_exists($input);

        # The paths are stored with relative names, and with '/' as the
        # delimiter; convert to absolute on this machine
        my $full = lc(File::Spec->rel2abs(internal_file_to_platform($input)));
        $potential_files{$full} = 1
                    if ! grep { lc($full) eq lc($_) } @ignored_files_full_names;
    }
}

close $file_handle;
}

if ($glob_list) {

# Here wants to process all .txt files in the directory structure.
# Convert them to full path names.  They are stored in the platform's
# relative style
my @known_files;
foreach my $object (@input_file_objects) {
    my $file = $object->file;
    next unless defined $file;
    push @known_files, File::Spec->rel2abs($file);
}

my @unknown_input_files;
foreach my $file (keys %potential_files) {
    next if grep { lc($file) eq lc($_) } @known_files;

    # Here, the file is unknown to us.  Get relative path name
    $file = File::Spec->abs2rel($file);
    push @unknown_input_files, $file;

    # What will happen is we create a data structure for it, and add it to
    # the list of input files to process.  First get the subdirectories
    # into an array
    my (undef, $directories, undef) = File::Spec->splitpath($file);
    $directories =~ s;/$;;;     # Can have extraneous trailing '/'
    my @directories = File::Spec->splitdir($directories);

    # If the file isn't extracted (meaning none of the directories is the
    # extracted one), just add it to the end of the list of inputs.
    if (! grep { $EXTRACTED_DIR eq $_ } @directories) {
        push @input_file_objects, Input_file->new($file, v0);
    }
    else {

        # Here, the file is extracted.  It needs to go ahead of most other
        # processing.  Search for the first input file that isn't a
        # special required property (that is, find one whose first_release
        # is non-0), and isn't extracted.  Also, the Age property file is
        # processed before the extracted ones, just in case
        # $compare_versions is set.
        for (my $i = 0; $i < @input_file_objects; $i++) {
            if ($input_file_objects[$i]->first_released ne v0
                && lc($input_file_objects[$i]->file) ne 'dage.txt'
                && $input_file_objects[$i]->file !~ /$EXTRACTED_DIR/i)
            {
                splice @input_file_objects, $i, 0,
                                            Input_file->new($file, v0);
                last;
            }
        }

    }
}
if (@unknown_input_files) {
    print STDERR simple_fold(join_lines(<<END

The following files are unknown as to how to handle. Assuming they are typical property files. You'll know by later error messages if it worked or not: END ) . " " . join(", ", @unknown_input_files) . "\n\n"); } } # End of looking through directory structure for more .txt files.

# Create the list of input files from the objects we have defined, plus # version my @input_files = 'version'; foreach my $object (@input_file_objects) { my $file = $object->file; next if ! defined $file; # Not all objects have files next if $object->optional && ! -e $file; push @input_files, $file; }

if ( $verbosity >= $VERBOSE ) { print "Expecting ".scalar( @input_files )." input files. ", "Checking ".scalar( @mktables_list_output_files )." output files.\n"; }

# We set $youngest to be the most recently changed input file, including this # program itself (done much earlier in this file) foreach my $in (@input_files) { my $age = -M $in; next unless defined $age; # Keep going even if missing a file $youngest = $age if $age < $youngest;

# See that the input files have distinct names, to warn someone if they
# are adding a new one
if ($make_list) {
    my ($volume, $directories, $file ) = File::Spec->splitpath($in);
    $directories =~ s;/$;;;     # Can have extraneous trailing '/'
    my @directories = File::Spec->splitdir($directories);
    my $base = $file =~ s/\.txt$//;
    construct_filename($file, 'mutable', \@directories);
}
}

my $ok = ! $write_unchanged_files && scalar @mktables_list_output_files; # If none known, rebuild

# Now we check to see if any output files are older than youngest, if # they are, we need to continue on, otherwise we can presumably bail. if ($ok) { foreach my $out (@mktables_list_output_files) { if ( ! file_exists($out)) { print "'$out' is missing.\n" if $verbosity >= $VERBOSE; $ok = 0; last; } #local $to_trace = 1 if main::DEBUG; trace $youngest, -M $out if main::DEBUG && $to_trace; if ( -M $out > $youngest ) { #trace "$out: age: ", -M $out, ", youngest: $youngest\n" if main::DEBUG && $to_trace; print "'$out' is too old.\n" if $verbosity >= $VERBOSE; $ok = 0; last; } } } if ($ok) { print "Files seem to be ok, not bothering to rebuild.\n"; exit(0); } print "Must rebuild tables.\n" if $verbosity >= $VERBOSE;

# Ready to do the major processing. First create the perl pseudo-property. $perl = Property->new('perl', Type => $NON_STRING, Perl_Extension => 1);

# Process each input file foreach my $file (@input_file_objects) { $file->run; }

# Finish the table generation.

print "Finishing processing Unicode properties\n" if $verbosity >= $PROGRESS; finish_Unicode();

print "Compiling Perl properties\n" if $verbosity >= $PROGRESS; compile_perl();

print "Creating Perl synonyms\n" if $verbosity >= $PROGRESS; add_perl_synonyms();

print "Writing tables\n" if $verbosity >= $PROGRESS; write_all_tables();

# Write mktables.lst if ( $file_list and $make_list ) {

print "Updating '$file_list'\n" if $verbosity >= $PROGRESS;
foreach my $file (@input_files, @files_actually_output) {
    my (undef, $directories, $file) = File::Spec->splitpath($file);
    my @directories = File::Spec->splitdir($directories);
    $file = join '/', @directories, $file;
}

my $ofh;
if (! open $ofh,">",$file_list) {
    Carp::my_carp("Can't write to '$file_list'.  Skipping: $!");
    return
}
else {
    print $ofh <<"END";
#
# $file_list -- File list for $0.
#
#   Autogenerated on @{[scalar localtime]}
#
# - First section is input files
#   ($0 itself is not listed but is automatically considered an input)
# - Section seperator is /^=+\$/
# - Second section is a list of output files.
# - Lines matching /^\\s*#/ are treated as comments
#   which along with blank lines are ignored.
#

# Input files:

END print $ofh "$_\n" for sort(@input_files); print $ofh "\n=================================\n# Output files:\n\n"; print $ofh "$_\n" for sort @files_actually_output; print $ofh "\n# ",scalar(@input_files)," input files\n", "# ",scalar(@files_actually_output)+1," output files\n\n", "# End list\n"; close $ofh or Carp::my_carp("Failed to close $ofh: $!");

    print "Filelist has ",scalar(@input_files)," input files and ",
        scalar(@files_actually_output)+1," output files\n"
        if $verbosity >= $VERBOSE;
}
}

# Output these warnings unless -q explicitly specified. if ($verbosity >= $NORMAL_VERBOSITY) { if (@unhandled_properties) { print "\nProperties and tables that unexpectedly have no code points\n"; foreach my $property (sort @unhandled_properties) { print $property, "\n"; } }

if (%potential_files) {
    print "\nInput files that are not considered:\n";
    foreach my $file (sort keys %potential_files) {
        print File::Spec->abs2rel($file), "\n";
    }
}
print "\nAll done\n" if $verbosity >= $VERBOSE;
}
exit(0);

# TRAILING CODE IS USED BY make_property_test_script() __DATA__

use strict; use warnings;

# Test qr/\X/ and the \p{} regular expression constructs. This file is # constructed by mktables from the tables it generates, so if mktables is # buggy, this won't necessarily catch those bugs. Tests are generated for all # feasible properties; a few aren't currently feasible; see # is_code_point_usable() in mktables for details.

# Standard test packages are not used because this manipulates SIG_WARN. It # exits 0 if every non-skipped test succeeded; -1 if any failed.

my $Tests = 0; my $Fails = 0;

my $non_ASCII = (ord('A') != 65);

# The 256 8-bit characters in ASCII ordinal order, with the ones that don't # have Perl names replaced by -1 my @ascii_ordered_chars = ( "\0", (-1) x 6, "\a", "\b", "\t", "\n", -1, # No Vt "\f", "\r", (-1) x 18, " ", "!", "\"", "#", '$', "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", (-1) x 129 );

sub ASCII_ord_to_native ($) { # Converts input ordinal number to the native one, if can be done easily. # Returns -1 otherwise.

my $ord = shift;

return $ord if $ord > 255 || ! $non_ASCII;
my $result = $ascii_ordered_chars[$ord];
return $result if $result eq '-1';
return ord($result);
}

sub Expect($$$$) { my $expected = shift; my $ord = shift; my $regex = shift; my $warning_type = shift; # Type of warning message, like 'deprecated' # or empty if none my $line = (caller)[2];

# Convert the non-ASCII code points expressible as characters to their
# ASCII equivalents, and skip the others.
$ord = ASCII_ord_to_native($ord);
if ($ord < 0) {
    $Tests++;
    print "ok $Tests - "
          . sprintf("\"\\x{%04X}\"", $ord)
          . " =~ $regex # Skipped: non-ASCII\n";
    return;
}

# Convert the code point to hex form
my $string = sprintf "\"\\x{%04X}\"", $ord;

my @tests = "";

# The first time through, use all warnings.  If the input should generate
# a warning, add another time through with them turned off
push @tests, "no warnings '$warning_type';" if $warning_type;

foreach my $no_warnings (@tests) {

    # Store any warning messages instead of outputting them
    local $SIG{__WARN__} = $SIG{__WARN__};
    my $warning_message;
    $SIG{__WARN__} = sub { $warning_message = $_[0] };

    $Tests++;

    # A string eval is needed because of the 'no warnings'.
    # Assumes no parens in the regular expression
    my $result = eval "$no_warnings
                        my \$RegObj = qr($regex);
                        $string =~ \$RegObj ? 1 : 0";
    if (not defined $result) {
        print "not ok $Tests - couldn't compile /$regex/; line $line: $@\n";
        $Fails++;
    }
    elsif ($result ^ $expected) {
        print "not ok $Tests - expected $expected but got $result for $string =~ qr/$regex/; line $line\n";
        $Fails++;
    }
    elsif ($warning_message) {
        if (! $warning_type || ($warning_type && $no_warnings)) {
            print "not ok $Tests - for qr/$regex/ did not expect warning message '$warning_message'; line $line\n";
            $Fails++;
        }
        else {
            print "ok $Tests - expected and got a warning message for qr/$regex/; line $line\n";
        }
    }
    elsif ($warning_type && ! $no_warnings) {
        print "not ok $Tests - for qr/$regex/ expected a $warning_type warning message, but got none; line $line\n";
        $Fails++;
    }
    else {
        print "ok $Tests - got $result for $string =~ qr/$regex/; line $line\n";
    }
}
return;
}

sub Error($) { my $regex = shift; $Tests++; if (eval { 'x' =~ qr/$regex/; 1 }) { $Fails++; my $line = (caller)[2]; print "not ok $Tests - re compiled ok, but expected error for qr/$regex/; line $line: $@\n"; } else { my $line = (caller)[2]; print "ok $Tests - got and expected error for qr/$regex/; line $line\n"; } return; }

# GCBTest.txt character that separates grapheme clusters my $breakable_utf8 = my $breakable = chr(0xF7); utf8::upgrade($breakable_utf8);

# GCBTest.txt character that indicates that the adjoining code points are part # of the same grapheme cluster my $nobreak_utf8 = my $nobreak = chr(0xD7); utf8::upgrade($nobreak_utf8);

sub Test_X($) { # Test qr/\X/ matches. The input is a line from auxiliary/GCBTest.txt # Each such line is a sequence of code points given by their hex numbers, # separated by the two characters defined just before this subroutine that # indicate that either there can or cannot be a break between the adjacent # code points. If there isn't a break, that means the sequence forms an # extended grapheme cluster, which means that \X should match the whole # thing. If there is a break, \X should stop there. This is all # converted by this routine into a match: # $string =~ /(\X)/, # Each \X should match the next cluster; and that is what is checked.

my $template = shift;

my $line   = (caller)[2];

# The line contains characters above the ASCII range, but in Latin1.  It
# may or may not be in utf8, and if it is, it may or may not know it.  So,
# convert these characters to 8 bits.  If knows is in utf8, simply
# downgrade.
if (utf8::is_utf8($template)) {
    utf8::downgrade($template);
} else {

    # Otherwise, if it is in utf8, but doesn't know it, the next lines
    # convert the two problematic characters to their 8-bit equivalents.
    # If it isn't in utf8, they don't harm anything.
    use bytes;
    $template =~ s/$nobreak_utf8/$nobreak/g;
    $template =~ s/$breakable_utf8/$breakable/g;
}

# Get rid of the leading and trailing breakables
$template =~ s/^ \s* $breakable \s* //x;
$template =~ s/ \s* $breakable \s* $ //x;

# And no-breaks become just a space.
$template =~ s/ \s* $nobreak \s* / /xg;

# Split the input into segments that are breakable between them.
my @segments = split /\s*$breakable\s*/, $template;

my $string = "";
my $display_string = "";
my @should_match;
my @should_display;

# Convert the code point sequence in each segment into a Perl string of
# characters
foreach my $segment (@segments) {
    my @code_points = split /\s+/, $segment;
    my $this_string = "";
    my $this_display = "";
    foreach my $code_point (@code_points) {
        my $ord = ASCII_ord_to_native(hex $code_point);
        if ($ord < 0) {
            $Tests++;
            print "ok $Tests - String containing $code_point =~ /(\\X)/g # Skipped: non-ASCII\n";
            return;
        }
        $this_string .= chr $ord;
        $this_display .= "\\x{$code_point}";
    }

    # The next cluster should match the string in this segment.
    push @should_match, $this_string;
    push @should_display, $this_display;
    $string .= $this_string;
    $display_string .= $this_display;
}

# If a string can be represented in both non-ut8 and utf8, test both cases
UPGRADE:
for my $to_upgrade (0 .. 1) {
    
    if ($to_upgrade) {

        # If already in utf8, would just be a repeat
        next UPGRADE if utf8::is_utf8($string);

        utf8::upgrade($string);
    }

    # Finally, do the \X match.
    my @matches = $string =~ /(\X)/g;

    # Look through each matched cluster to verify that it matches what we
    # expect.
    my $min = (@matches < @should_match) ? @matches : @should_match;
    for my $i (0 .. $min - 1) {
        $Tests++;
        if ($matches[$i] eq $should_match[$i]) {
            print "ok $Tests - ";
            if ($i == 0) {
                print "In \"$display_string\" =~ /(\\X)/g, \\X #1";
            } else {
                print "And \\X #", $i + 1,
            }
            print " correctly matched $should_display[$i]; line $line\n";
        } else {
            $matches[$i] = join("", map { sprintf "\\x{%04X}", $_ }
                                                unpack("U*", $matches[$i]));
            print "not ok $Tests - In \"$display_string\" =~ /(\\X)/g, \\X #",
                $i + 1,
                " should have matched $should_display[$i]",
                " but instead matched $matches[$i]",
                ".  Abandoning rest of line $line\n";
            next UPGRADE;
        }
    }

    # And the number of matches should equal the number of expected matches.
    $Tests++;
    if (@matches == @should_match) {
        print "ok $Tests - Nothing was left over; line $line\n";
    } else {
        print "not ok $Tests - There were ", scalar @should_match, " \\X matches expected, but got ", scalar @matches, " instead; line $line\n";
    }
}

return;
}

sub Finished() { print "1..$Tests\n"; exit($Fails ? -1 : 0); }

Error('\p{Script=InGreek}'); # Bug #69018 Test_X("1100 $nobreak 1161"); # Bug #70940

1 POD Error

The following errors were encountered while parsing the POD:

Around line 12205:

=end comment without matching =begin. (Stack: [empty])