NAME
Unicode::Towctrans - Generate small casefolding tables
SYNOPSIS
gen_wctrans
gen_wctrans --safec
gen_wctrans --musl
gen_wctrans -v 15
gen_wctrans -v 15 --cf CaseFolding.txt.15 --out towctrans-15.h
DESCRIPTION
gen_wctrans generates a towctrans.h header file, which is used by musl and safeclib to generate small and efficient case folding tables, to build the libc towupper() and towlower() functions and its secure variants towupper_s() and towlower_s().
If the code may run on a system with the turkish or azeri locale, you need to define -DHAVE_LOCALE_TR to check for the special turkish i locale and mappings at run-time.
If you know that your iswalpha() works correctly (only with musl), then use --with_iswalpha to get a lightly faster function. E.g. for benchmarking.
Planned also for the multi-byte folding tables for wcsfc_s() for safeclib. As the single-byte towupper and towlower conversions are meaningless for many multi-byte unicode mappings, those with status F - folding. Use a proper string foldcasing function instead.
PERFORMANCE
Currently it is still a bit un-optimized, but small and fast enough compared to the other implementations. And esp. correct compared to glibc, which ignores characters from other locales.
make -C examples
./bench
my: 160 [us]
musl-new: 352 [us]
musl-old: 286 [us]
glibc: 197 [us]
wc -c towctrans-*.o
5072 towctrans-my.o
7096 towctrans-musl-new.o
3408 towctrans-musl-old.o
97432 towctrans-glibc.o
INSTALLATION
Perl 5.12 or later is required.
This module does not need to be installed. running gen_wctrans is enough. However for full testing and global installation run this:
perl Makefile.PL
make
make test
make test-all
sudo make install
DEPENDENCIES
This module requires a CaseFolding.txt file from Unicode Character Database, which is automatically downloaded via wget if missing.
AUTHOR
Reini Urban <rurban@cpan.org>
Copyright(C) 2026 Reini Urban. All rights reserved
COPYRIGHT AND LICENSE
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The generated files are MIT licensed. See the generated files headers.
SEE ALSO
- https://www.unicode.org/reports/tr44/#Casemapping
- https://git.musl-libc.org/cgit/musl/tree/src/ctype/towctrans.c
- https://git.musl-libc.org/cgit/musl/tree/src/ctype/towctrans.c?id=e8aba58ab19a18f83d7f78e80d5e4f51e7e4e8a9
- https://github.com/rurban/safeclib/blob/master/src/extwchar/towctrans.c
- https://sourceware.org/git/?p=glibc.git;a=tree;f=wctype;;hb=HEAD