NAME
subst - Greple module for text search and substitution
VERSION
Version 2.37
SYNOPSIS
greple -Msubst --dict dictionary [ options ]
Dictionary:
--dict dictionary file
--dictdata dictionary data
--dictpair dictionary entry pair
Check:
--check=[ng,ok,any,outstand,all,none]
--select=N
--linefold
--stat
--with-stat
--stat-style=[default,dict]
--stat-item={match,expect,number,ok,ng,dict}=[0,1]
--subst
--[no-]warn-overlap
--[no-]warn-include
File Update:
--diff
--diffcmd command
--create
--replace
--overwrite
DESCRIPTION
This greple module supports check and substitution of text files based on dictionary data.
Dictionary file is given by --dict option and each line contains matching pattern and expected string pairs.
greple -Msubst --dict DICT
If the dictionary file contains following data:
colou?r color
cent(er|re) center
above command finds the first pattern which does not match the second string, that is "colour" and "centre" in this case.
In practice, the last two elements of a space-separated string are treated as a pattern and a replacement string, respectively.
Dictionary data can also be written separated by // as follows:
colou?r // color
cent(er|re) // center
There must be spaces before and after the //. In this format, strings before and after it are treated as a pattern and replacement string, rather than last two element. Leading spaces and spaces before and after // are ignored, but all other whitespace is valid.
You can use same file by greple's -f option and string after // is ignored as a comment in that case.
greple -f DICT ...
Option --dictdata can be used to provide dictionary data in the command line.
greple -Msubst \
--dictdata $'colou?r color\ncent(er|re) center\n'
Dictionary entry starting with a sharp sign (#) is a comment and ignored.
Option --dictpair can be used to provide raw dictionary entries in the command line. In this case, no processing is done regarding whitespace or comments.
greple -Msubst \
--dictpair 'colou?r' color \
--dictpair 'cent(er|re)' center
Overlapped pattern
When the matched string is same or shorter than previously matched string by another pattern, it is simply ignored (--no-warn-include by default). So, if you have to declare conflicted patterns, place the longer pattern earlier.
If the matched string overlaps with previously matched string, it is warned (--warn-overlap by default) and ignored.
Terminal color
This version uses Getopt::EX::termcolor module. It sets option --light-screen or --dark-screen depending on the terminal on which the command run, or TERM_BGCOLOR environment variable.
Some terminals (eg: "Apple_Terminal" or "iTerm") are detected automatically and no action is required. Otherwise set TERM_BGCOLOR environment to #000000 (black) to #FFFFFF (white) digit depending on terminal background color.
OPTIONS
- --dict=file
-
Specify dictionary file.
- --dictdata=data
-
Specify dictionary data by text.
- --dictpair pattern replacement
-
Specify dictionary entry pair. This option takes two parameters. The first is a pattern and the second is a substitution string.
- --check=
outstand|ng|ok|any|all|none -
Option --check takes argument from
ng,ok,any,outstand,allandnone.With default value
outstand, command will show information about both expected and unexpected words only when unexpected word was found in the same file.With value
ng, command will show information about unexpected words. With valueok, you will get information about expected words. Both with valueany.Value
allandnonemake sense only when used with --stat option, and display information about never matched pattern. - --select=N
-
Select Nth entry from the dictionary. Argument is interpreted by Getopt::EX::Numbers module. Range can be defined like --select=
1:3,7:9. You can get numbers by --stat option. - --linefold
-
If the target data is folded in the middle of text, use --linefold option. It creates regex patterns which matches string spread across lines. Substituted text does not include newline, though. Because it confuses regex behavior somewhat, avoid to use if possible.
- --stat
- --with-stat
-
Print statistical information. Works with --check option.
Option --with-stat print statistics after normal output, while --stat print only statistics.
- --stat-style=
default|dict -
Using --stat-style=dict option with --stat and --check=any, you can get dictionary style output for your working document.
- --stat-item item=[0,1]
-
Specify which item is shown up in stat information. Default values are:
match=1 expect=1 number=1 ng=1 ok=1 dict=0If you don't need to see pattern field, use like this:
--stat-item match=0Multiple parameters can be set at once:
--stat-item match=number=0,ng=1,ok=1 - --subst
-
Substitute unexpected matched pattern to expected string. Newline character in the matched string is ignored. Pattern without replacement string is not changed.
- --[no-]warn-overlap
-
Warn overlapped pattern. Default on.
- --[no-]warn-include
-
Warn included pattern. Default off.
FILE UPDATE OPTIONS
- --diff
- --diffcmd=command
-
Option --diff produce diff output of original and converted text.
Specify diff command name used by --diff option. Default is "diff -u".
- --create
-
Create new file and write the result. Suffix ".new" is appended to original filename.
- --replace
-
Replace the target file by converted result. Original file is renamed to backup name with ".bak" suffix.
- --overwrite
-
Overwrite the target file by converted result with no backup.
DICTIONARY
This module includes example dictionaries. They are installed share directory and accessed by --exdict option.
greple -Msubst --exdict jtca-katakana-guide-3.dict
- --exdict dictionary
-
Use dictionary flie in the distribution as a dictionary file.
- --exdictdir
-
Show dictionary directory.
- --exdict jtca-katakana-guide-3.dict
- --jtca-katakana-guide
-
Created from following guideline document.
外来語(カタカナ)表記ガイドライン 第3版 制定:2015年8月 発行:2015年9月 一般財団法人テクニカルコミュニケーター協会 Japan Technical Communicators Association https://jtca.org/tcwp/wp-content/uploads/2023/06/katakana_guide_3_20171222.pdf - --jtca
-
Customized --jtca-katakana-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
- --exdict jtf-style-guide-3.dict
- --jtf-style-guide
-
Created from following guideline document.
JTF日本語標準スタイルガイド(翻訳用) 第3.0版 2019年8月20日 一般社団法人 日本翻訳連盟(JTF) 翻訳品質委員会 https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf - --jtf
-
Customized --jtf-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
- --exdict sccc2.dict
- --sccc2
-
Dictionary used for "C/C++ セキュアコーディング 第2版" published in 2014.
https://www.jpcert.or.jp/securecoding_book_2nd.html - --exdict ms-style-guide.dict
- --ms-style-guide
-
Dictionary generated from Microsoft localization style guide.
https://www.microsoft.com/ja-jp/language/styleguidesData is generated from this article:
https://www.atmarkit.co.jp/news/200807/25/microsoft.html - --microsoft
-
Customized --ms-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
Amendment dictionary can be found here. Please raise an issue or send a pull-request if you have request to update.
JAPANESE
This module is originaly made for Japanese text editing support.
KATAKANA
Japanese KATAKANA word have a lot of variants to describe same word, so unification is important but it's quite tiresome work. In the next example,
イ[エー]ハトー?([ヴブボ]ォ?) // イーハトーヴォ
left pattern matches all following words.
イエハトブ
イーハトヴ
イーハトーヴ
イーハトーヴォ
イーハトーボ
イーハトーブ
This module helps to detect and correct them.
INSTALL
CPANMINUS
$ cpanm App::Greple::subst
SEE ALSO
https://github.com/kaz-utashiro/greple
https://github.com/kaz-utashiro/greple-subst
https://github.com/kaz-utashiro/greple-update
https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf
https://www.jtf.jp/jp/style_guide/styleguide_top.html, https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf
https://www.microsoft.com/ja-jp/language/styleguides, https://www.atmarkit.co.jp/news/200807/25/microsoft.html
文化庁 国語施策・日本語教育 国語施策情報 内閣告示・内閣訓令 外来語の表記
https://qiita.com/kaz-utashiro/items/85add653a71a7e01c415
AUTHOR
Kazumasa Utashiro
LICENSE
Copyright ©︎ 2017-2025 Kazumasa Utashiro.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.