* introduce setting regime
* parse option --licensecheck-regime
* parse environment variable LICENSECHECK_REGIME
* parse environment variable LICENSECHECK_DEFAULTS
* Maybe implement all this as a git module?
* implement environment variable LICENSECHECK_DEFAULTS
and setting=value pairs in env vars LICENSECHECK LICENSECHECK_DIFF
* implement licensefind
* parse options and arguments like GNU find (or busybox find?)
* fail on any option supported by find but unsupported here
* parse new options --licensecheck-*
and setting=value pairs in env vars LICENSECHECK LICENSECHECK_FIND
* by default output filenames (i.e. all with any license by default)
* implement licensegrep
* parse options and arguments like grep (or busybox/git grep?)
* fail on any option supported by diff but unsupported here
* parse new options --licensecheck-*
and setting=value pairs in env vars LICENSECHECK LICENSECHECK_GREP
* parse settings from env variables LICENSECHECK LICENSECHECK_GREP
(treated as fallback for --licensecheck-* options without prefix)
* by default output verbatim text, with filename prefixed if multiple
* maybe implement licensegreple
* like licensegrep, but with syntax matching greple
<https://metacpan.org/pod/greple>
* implement licensediff
* parse options and arguments like diff (or git diff?)
* fail on any option supported by diff but unsupported here
* parse new options --licensecheck-*
and setting=value pairs in env vars LICENSECHECK LICENSECHECK_DIFF
* by default output unified diff
* if file B omitted, compute from file A
(with minimal changes by default, or optionally optimized)
* implement licensesort
* parse options and arguments like sort
* fail on any option supported by sort but unsupported here
* parse new options --licensecheck-*
and setting=value pairs in env vars LICENSECHECK LICENSECHECK_DIFF
+ file1-format - debian spdx (default: guess from passed file1)
+ paths-debian - with format=debian try debian/copyright:copyright)
+ merge-copyright-years
+ merge-copyright-holders
+ merge-license-expressions
+ merge-license-parts
+ sort-copyright-years
+ sort-copyright-holders
+ sort-copyright-sections
+ sort-license-expressions
+ sort-license-parts
+ sort-license-sections
* if file B omitted, compute from file A + licensegrep
* if file A omitted, try default file for used format, or fail
(i.e. with format=debian try debian/copyright:copyright)
* support omitting copyright holders when permited by license
* Implement decoding options:
* --decode-html
regular expression for paths to parse as html.
Pass empty regexp to only enable support (see --decode-magic).
* --decode-exif
regular expression for paths to extract EXIF and other metadata from
(see exiftool and Image::ExifTool).
Pass empty regexp to only enable support (see --decode-magic).
* --decode-skip
regular expression for paths to read as-is.
* --decode-magic
Determine needed decoding using libmagic.
If needed decoding method is not enabled, then that file is skipped.
(see File::LibMagic).
* --decode-auto
enable all --decode-* options for common file extensions.
* Optionally (i.e. if available) consult File::Extension on failure
* Move detection code to separate module(s).
* Maybe extend Software::License.
* Fail when passed unknown options
* Implement search options:
* --traversal-type
Algorithm used to walk directories passed as arguments.
* Values: one any
* Default: one
* --match-type
Algorithm used for --include and --exclude options.
* Values: regex glob_deb
* Default: regex
* Implement strictness option:
* --strict implies...
* --machine (or --machine-deb if enabled)
* --include .* (or --include * with --match-type glob_deb)
* --exclude ''
* --traversal-type any
* --fast implies...
* --exclude-common
* --decode-none
* Implement extensibility through YAML/JSON file
Similar to license-reconsile, but adding/overriding DefHash objects:
* http://git.hands.com/?p=freeswitch.git;a=blob;f=debian/license-reconcile.yml;h=0e40cba01eeb67f82d18ca8f11210271848d0549;hb=refs/heads/copyright2
* https://lists.debian.org/87efl0kvzu.fsf@hands.com
* Implement smarter processing:
* Optionally spawn "workers" for a boost on multi-core systems,
e.g. using Parallel::ForkManager
* Gather statistics on files processed and objects detected,
and emit progress during long-running scans,
e.g. using Progress::Any or Time::Progress (see SeeAlso of Time::Progress).
* Detect non-commercial license.
(?i:(?:\w{4}|\W(?:[^oO]\w|\w[^rR]|[^aA]\w\w|\w[^nN]\w|\w\w[^dD])) non[-_ ]commercial)
* Detect bugroff license <http://tunes.org/legalese/bugroff.html>
* Compare against competitors
+ ripper
+ https://salsa.debian.org/stuart/package-license-checker
+ r-base /share/licenses/license.db
+ license-reconcile
+ https://wiki.debian.org/CopyrightReviewTools
+ https://docs.clearlydefined.io/clearly#licensed
+ http://copyfree.org/standard/licenses
+ https://wiki.debian.org/DFSGLicenses
+ http://voag.linkedmodel.org/2.0/doc/2015/REFDATA_voag-licenses-v2.0.html
+ https://github.com/hierynomus/license-gradle-plugin
+ ruby-licensee - http://ben.balter.com/licensee/
+ flict - https://github.com/vinland-technology/flict
* Warn about licensing conflicts
+ See adequate
* Sort Files sections to list common over exotic:
+ prefix of leftmost truncate wildcard (*)
+ suffix of leftmost truncate wildcard (*)
+ filecount when containing character wildcard (?)
+ filecount
+ License-shortnames
+ License-Grant
+ License inlined
+ Copyright
+ Filenames
* Optimize:
+ Support detection reversion, and first scan for grants then licenses - reverting embedded grants
* Test against challenging projects
+ ghostpdl
+ chromium
+ fpc
+ lazarus
+ boost
+ picolibc <https://keithp.com/cgit/picolibc.git/>
* Maybe use libdata-binary-perl
* Maybe use Text::Locus to track (and emit in verbose/debug mode) where patterns are detected?
* Quality flagging
+ ambiguous: license ref pointing to multiple license fulltexts (e.g. "MIT" or "GNU" or "GPL"
+ unlicensed: copyright holder(s) but no licensing
+ ungranted: license fullref requiring explicit grant, but no corresponding license grant
+ incomplete: fractions of license fullref, but no complete fullref
+ alien: license label but no license name
+ unowned: license but no copyright holder
+ uncertain: license ref and more unknown text in same sentence/paragraph/section
+ buried: license or copyright not at top of file
+ unstructured: license/copyright not at ideal place of data structure
(e.g. in commend field of EXIF data, or in content or comment of html)
+ unaligned: license/copyright out of sync between layers of structure
(e.g. ICC data and EXIF data of PNG, or content and metadata of PDF/HTML)
+ imperfect: license ref not following format documented in license fulltext
+ conflict: incompatible licenses (e.g. GPL-3+ and GPL-2-only, or OpenSSL and GPL)
* use nano-style configurable wordchars/punct/brackets/matchbrackets chars and quotestr regex
e.g. to determine sentences
(see "paragraphs" and "justify" in "man nanorc")