Name
Dita::GB::Standard - The Gearhart-Brenan Dita Topic Naming Standard.
Synopsis
The GB Standard can be usefully applied to documents written in Dita.
The GB Standard creates a readable, deterministic file name which depends solely on the content to be stored in that file. Such file names are guaranteed to differ between files that contain differing content while being identical for files that contain identical content by the use of an md5 sum sum in the file name.
The GB Standard name looks like this:
human_readable_part_derived_from_content + _ + md5_sum_of_content + extension
The human readable part from content is derived from the content of the file by interpreting the file content as either Unicode or Ascii if the binary standard is being used and then, for files that do not contain a title tag:
- replacing all instances of <text> with single underscores
- replacing all runs of non a-z,0-9 alpha numeric characters with single
underscores
- replacing contiguous runs of underscores with a single underscore
- removing any leading or trailing underscores
- truncating the component if it extends beyond 128 characters.
For files that do contain a title tag the content of the title tag is processed as described above to obtain the human readable component of the file name.
The md5 sum component of the file name is calculated from the content of the file and presented as lowercase hexadecimal.
The file extension component is obtained from: https://en.wikipedia.org/wiki/List_of_filename_extensions
Thus if an xml file has content:
abc 𝝰𝝱𝝲
then the GB Standard name for the file is:
abc_541ddaddd3d82f73a30a666c285b7e92.xml
If the option to present the md5 sum sum as five letter English words is chosen then the standardized name for this content becomes:
abc_thInk_BUSHy_dRYER_spaCE_KNOwN_lepeR_SeNse_MaJor.xml
Companion Files
Each file produced by the GB Standard can have a companion file of the same name but without an extension. The companion file contains meta-data about the file such as its original location etc. which can be searched by grep or similar.
Benefits
The names generated by the GB Standard can be exploited in numerous ways to simplify the creation, conversion and management of large repositories of documents written to the Dita standard:
Parallel Processing
The name generated by the GB Standardis unique when computed by competing parallel processes so files that have the same name have the same content and can be safely overwritten by another process without attempting to coordinate names between processes. Likewise files that have different names have different content and so can be written separately.
Alternative systems relying on coordination between the parallel processes to choose names to avoid collisions and reuse identical content perform ever more badly as the number of files increases because there are that many more files to check for matching content and names. Coordination between parallel processes stops them from being truly parallel.
As a consequence, the GB Standard enables parallel Dita conversions to scale effectively.
File Flattening
Files are automatically flattened by the GB Standard as files with the same content have the same name and so can safely share one global folder without fear of name collisions or having multiple names for identical content.
Relocating Dita References After Folder Restructuring
In the ideal implementation all files named with the GB Standard occupy one global folder. In circumstances where this is not possible, such files can easily be moved into sub folders without fear of collisions, although, any Dita references between such files might have to be updated. This update is easily performed because only the path component has to be updated and the value of the new path can easily be found by searching for the base component of the topic file name using a utility such as find. For a more efficient method, see Data::Edit::Xml::Xref.
Similar Files Tend To Appear Close Together In Directory Listings.
Imagine the user has several files in different folders all starting:
<title>License Agreement</title>
The GB Standard computes the human readable component of the name in a consistent way using only the contents of each file. Once the name has been standardized, all these files can be placed in one folder to get a directory listing like:
license_agreement_a6e3...
license_agreement_b532...
license_agreement_c65d...
This grouping signals that these files are potentially similar to each other.
As the user applies the GB Standard to more files, more such matches occur.
Files name using the GB Standard behave like Bosons - they like to enter the same state to obtain a laser like focus.
Copying And Moving Files For Global Interoperability
Users can copy files named using the GB Standard around from folder to folder without fear of collisions or duplication obviating the need for time consuming checks and reportage before performing such actions. The meta data in the companion file can also be copied in a similar fearless manner.
Say two users want to share content: files named using the GB Standard can be incorporated directly into the other user's file system without fear of collisions or duplicating content thus promoting global content sharing and collaboration.
Guidization For Content Management Systems
Self constructed Content Management Systems using BitBucket, GitHub or Gitlab that rely on guidization to differentiate files placed in these repositories benefit immensely: the guid to use can be easily derived from the md5 sum sum in the GB Standard file name.
Using Dita Tags To Describe Content
The GB Standard encourages Dita users to use meta data tags to describe their documents so that content can be found by searching with grep rather than relying on lengthy file names in which the file meta data is encoded and then using find. Such file names quickly become very long and unmanageable: on the one hand they need spaces in them to make them readable, but on the other hand, the spaces make such files difficult to cut and paste or use from the command line.
Cut And Paste
As there are no spaces in the files names created using the GB Standard such file names can be selected by a mouse double click and thus easily copied and pasted into other documents.
Conversely, one has to use cut and paste to manipulate such file names making it impossible to misspell such file names in other documents.
Automatic File Versioning
Files named to the GB Standard File names change when their content changes. So if the content of a file changes its name must change as well. Thus an attempt to present an out-of-date version of a file produces a file name that cannot be found.
Enhanced Command Line Processing
As file names named with the GB Standard do not have spaces in them (such as zero width space) they work well on the command line and with the many command line tools that are used to manipulate such files enhancing the productivity leverage that command line has versus graphical user interface processing.
Locating Files by Their Original Names Or Other Meta-Data
The companion file contains information about a file named using the GB Standard such as its original file name and other meta data.
To find such a file use grep to find the companion file containing the searched for content, paste that file name into the command line after entering any command such as ll and then press the tab key to have the shell expand it to the get the GB Standard file that corresponds to the located companion file.
Alternate File Names
Most operating systems allow the use of links to supply alternate names for a file. Consequently, users who wish to impose a different file naming scheme might care to consider using links to implement their own file naming system on top of the GB Standard without disrupting the integrity of the GB Standard.
Implementation
The GB Standard has been implemented as a Perl package at:
http://metacpan.org/pod/Dita::GB::Standard
Binary vs Utf8
Files that are expected to contain data encoded with utf8 (eg .dita, .xml) should use method names that start with:
gbStandard
Files that are expected to contain binary data (eg .png, .jpg) should use method names that start with:
gbBinaryStandard
Description
The Gearhart-Brenan Dita Topic Naming Standard.
Version "20190504".
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Make and manage utf8 files
Make and manage files that conform to the GB Standard and are coded in utf8.
gbStandardFileName($$)
Return the GB Standard file name given the content and extension of a proposed file.
Parameter Description
1 $content Content
2 $extension Extension
Example:
if (1) {
if (useWords)
{ok 𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(q(abc 𝝰𝝱𝝲), q(xml)) eq q(abc_lEvEe_FOyER_JOhNs_teNoR_GeEky_sIDle_arMoR_sLING.xml);
}
else
{ok 𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(q(abc 𝝰𝝱𝝲), q(xml)) eq q(abc_541ddaddd3d82f73a30a666c285b7e92.xml);
}
}
gbStandardCompanionFileName($)
Return the name of the companion file given a file whose name complies with the GB Standard.
Parameter Description
1 $file L<GB Standard|http://metacpan.org/pod/Dita::GB::Standard> file name
Example:
ok 𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗼𝗻𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(q(a/b.c)) eq q(a/b);
gbStandardCreateFile($$$$)
Create a file in the specified $Folder whose name is the GB Standard name for the specified $content and return the file name, A companion file can, optionally, be created with the specified $companionContent
Parameter Description
1 $Folder Target folder or a file in that folder
2 $content Content of the file
3 $extension File extension
4 $companionContent Contents of the companion file.
Example:
if (1) {
my $s = q(abc 𝝰𝝱𝝲);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = 𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗿𝗲𝗮𝘁𝗲𝗙𝗶𝗹𝗲($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbStandardRename($F); # No rename required to standardize file name
gbStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbStandardRename($)
Check whether a file needs to be renamed to match the GB Standard. Return the correct name for the file or undef if the name is already correct.
Parameter Description
1 $file File to check
Example:
if (1) {
my $s = q(abc 𝝰𝝱𝝲);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗥𝗲𝗻𝗮𝗺𝗲($F); # No rename required to standardize file name
gbStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbStandardCopyFile($$)
Copy a file to the specified $target folder renaming it to the GB Standard. If no $Target folder is specified then rename the file in its current folder so that it does comply with the GB Standard.
Parameter Description
1 $source Source file
2 $target Target folder or a file in the target folder
Example:
if (1) {
my $s = q(abc 𝝰𝝱𝝲);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = 𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗼𝗽𝘆𝗙𝗶𝗹𝗲($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbStandardRename($F); # No rename required to standardize file name
gbStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbStandardDelete($)
Delete a file and its companion file if there is one.
Parameter Description
1 $file File to delete
Example:
if (1) {
my $s = q(abc 𝝰𝝱𝝲);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbStandardRename($F); # No rename required to standardize file name
𝗴𝗯𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗗𝗲𝗹𝗲𝘁𝗲($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
Make and manage binary files
Make and manage files that conform to the GB Standard and are in plain binary.
gbBinaryStandardFileName($$)
Return the GB Standard file name given the content and extension of a proposed file.
Parameter Description
1 $content Content
2 $extension Extension
Example:
if (1) {
if (useWords)
{ok 𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(qq(\0abc\1), q(xml)) eq q(abc_thInk_BUSHy_dRYER_spaCE_KNOwN_lepeR_SeNse_MaJor.xml);
}
else
{ok 𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(qq(\0abc\1), q(xml)) eq q(abc_2786f1147a331ec6ebf60c1ba636a458.xml);
}
}
gbBinaryStandardCompanionFileName($)
Return the name of the companion file given a file whose name complies with the GB Standard.
Parameter Description
1 $file L<GB Standard|http://metacpan.org/pod/Dita::GB::Standard> file name
Example:
ok 𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗼𝗻𝗙𝗶𝗹𝗲𝗡𝗮𝗺𝗲(q(a/b.c)) eq q(a/b);
gbBinaryStandardCreateFile($$$$)
Create a file in the specified $Folder whose name is the GB Standard name for the specified $content and return the file name, A companion file can, optionally, be created with the specified $companionContent.
Parameter Description
1 $Folder Target folder or a file in that folder
2 $content Content of the file
3 $extension File extension
4 $companionContent Contents of the companion file.
Example:
if (1) {
my $s = qq(\0abc\1);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = 𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗿𝗲𝗮𝘁𝗲𝗙𝗶𝗹𝗲($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbBinaryStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbBinaryStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbBinaryStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbBinaryStandardRename($F); # No rename required to standardize file name
gbBinaryStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbBinaryStandardRename($)
Check whether a file needs to be renamed to match the GB Standard. Return the correct name for the file or undef if the name is already correct.
Parameter Description
1 $file File to check
Example:
if (1) {
my $s = qq(\0abc\1);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbBinaryStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbBinaryStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbBinaryStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbBinaryStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗥𝗲𝗻𝗮𝗺𝗲($F); # No rename required to standardize file name
gbBinaryStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbBinaryStandardCopyFile($$)
Copy a file to the specified $target folder renaming it to the GB Standard. If no $Target folder is specified then rename the file in its current folder so that it does comply with the GB Standard.
Parameter Description
1 $source Source file
2 $target Target folder or a file in the target folder
Example:
if (1) {
my $s = qq(\0abc\1);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbBinaryStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbBinaryStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = 𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗖𝗼𝗽𝘆𝗙𝗶𝗹𝗲($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbBinaryStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbBinaryStandardRename($F); # No rename required to standardize file name
gbBinaryStandardDelete($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
gbBinaryStandardDelete($)
Delete a file and its companion file if there is one.
Parameter Description
1 $file File to delete
Example:
if (1) {
my $s = qq(\0abc\1);
my $S = q(Hello World);
my $d = q(out/);
my $D = q(out2/);
clearFolder($_, 10) for $d, $D;
my $f = gbBinaryStandardCreateFile($d, $s, q(xml), $S); # Create file
ok -e $f;
ok readFile($f) eq $s;
my $c = gbBinaryStandardCompanionFileName($f); # Check companion file
ok -e $c;
ok readFile($c) eq $S;
my $F = gbBinaryStandardCopyFile($f, $D); # Copy file
ok -e $F;
ok readFile($F) eq $s;
my $C = gbBinaryStandardCompanionFileName($F); # Check companion file
ok -e $C;
ok readFile($C) eq $S;
ok !gbBinaryStandardRename($F); # No rename required to standardize file name
𝗴𝗯𝗕𝗶𝗻𝗮𝗿𝘆𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗗𝗲𝗹𝗲𝘁𝗲($F); # Delete file and its companion file
ok !-e $F;
ok !-e $C;
clearFolder($_, 10) for $d, $D;
}
Index
1 gbBinaryStandardCompanionFileName - Return the name of the companion file given a file whose name complies with the GB Standard.
2 gbBinaryStandardCopyFile - Copy a file to the specified $target folder renaming it to the GB Standard.
3 gbBinaryStandardCreateFile - Create a file in the specified $Folder whose name is the GB Standard name for the specified $content and return the file name, A companion file can, optionally, be created with the specified $companionContent.
4 gbBinaryStandardDelete - Delete a file and its companion file if there is one.
5 gbBinaryStandardFileName - Return the GB Standard file name given the content and extension of a proposed file.
6 gbBinaryStandardRename - Check whether a file needs to be renamed to match the GB Standard.
7 gbStandardCompanionFileName - Return the name of the companion file given a file whose name complies with the GB Standard.
8 gbStandardCopyFile - Copy a file to the specified $target folder renaming it to the GB Standard.
9 gbStandardCreateFile - Create a file in the specified $Folder whose name is the GB Standard name for the specified $content and return the file name, A companion file can, optionally, be created with the specified $companionContent
10 gbStandardDelete - Delete a file and its companion file if there is one.
11 gbStandardFileName - Return the GB Standard file name given the content and extension of a proposed file.
12 gbStandardRename - Check whether a file needs to be renamed to match the GB Standard.
Installation
This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:
sudo cpan install Dita::GB::Standard
Author
Copyright
Copyright (c) 2016-2019 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.