NAME

Git::Repository::Plugin::GitHooks - A Git::Repository plugin with some goodies for hook developers

VERSION

version 3.6.0

SYNOPSIS

# load the plugin
use Git::Repository 'GitHooks';

my $git = Git::Repository->new();

my $config  = $git->get_config();
my $branch  = $git->get_current_branch();
my @commits = $git->get_commits($oldcommit, $newcommit);

my @files_modified_by_commit = $git->filter_files_in_index('AM');
my @files_modified_by_push   = $git->filter_files_in_range('AM', $oldcommit, $newcommit);

DESCRIPTION

This module adds several methods useful to implement Git hooks to Git::Repository.

In particular, it is used by the standard hooks implemented by the Git::Hooks framework.

NAME

Git::Repository::Plugin::GitHooks - Add useful methods for hooks to Git::Repository

CONFIGURATION VARIABLES

CONFIG_ENCODING

Git configuration files usually contain just ASCII characters, but values and sub-section names may contain any characters, except newline. If your config files have non-ASCII characters you should ensure that they are properly decoded by specifying their encoding like this:

$Git::Repository::Plugin::GitHooks::CONFIG_ENCODING = 'UTF-8';

The acceptable values for this variable are all the encodings supported by the Encode module.

METHODS FOR THE GIT::HOOKS FRAMEWORK

The following methods are used by the Git::Hooks framework and are not intended to be useful for hook developers. They're described here for completeness.

prepare_hook NAME, ARGS

This is used by Git::Hooks::run_hooks to prepare the environment for specific Git hooks before invoking the associated plugins. It's invoked with the arguments passed by Git to the hook script. NAME is the script name (usually the variable $0) and ARGS is a reference to an array containing the script positional arguments.

load_plugins

This loads every plugin configured in the githooks.plugin option.

invoke_external_hooks ARGS...

This is used by Git::Hooks::run_hooks to invoke external hooks.

post_hooks

Returns the list of post hook functions registered with the post_hook method below.

METHODS FOR HOOK DEVELOPERS

The following methods are intended to be useful for hook developers.

post_hook SUB

Plugin developers may be interested in performing some action depending on the overall result of every check made by every other hook. As an example, Gerrit's patchset-created hook is invoked asynchronously, meaning that the hook's exit code doesn't affect the action that triggered the hook. The proper way to signal the hook result for Gerrit is to invoke it's API to make a review. But we want to perform the review once, at the end of the hook execution, based on the overall result of all enabled checks.

To do that, plugin developers can use this routine to register callbacks that are invoked at the end of run_hooks. The callbacks are called with the following arguments:

  • HOOK_NAME

    The basename of the invoked hook.

  • GIT

    The Git::Repository object that was passed to the plugin hooks.

  • ARGS...

    The remaining arguments that were passed to the plugin hooks.

The callbacks may see if there were any errors signaled by the plugin hook by invoking the get_errors method on the GIT object. They may be used to signal the hook result in any way they want, but they should not die or they will prevent other post hooks to run.

cache SECTION

This may be used by plugin developers to cache information in the context of a Git::Repository object. SECTION is any string which becomes associated with a hash-ref. The method simply returns the hash-ref, which can be used by the caller to store any kind of information. Plugin developers are encouraged to use the plugin name as the SECTION string to avoid clashes.

get_config [SECTION [VARIABLE]]

This groks the configuration options for the repository by invoking git config --list. The configuration is cached during the first invocation in the object Git::Repository object. So, if the configuration is changed afterwards, the method won't notice it. This is usually ok for hooks, though, which are short-lived.

With no arguments, the options are returned as a hash-ref pointing to a two-level hash. For example, if the config options are these:

section1.a=1
section1.b=2
section1.b=3
section2.x.a=A
section2.x.b=B
section2.x.b=C

Then, it'll return this hash:

{
    'section1' => {
        'a' => [1],
        'b' => [2, 3],
    },
    'section2.x' => {
        'a' => ['A'],
        'b' => ['B', 'C'],
    },
}

The first level keys are the part of the option names before the last dot. The second level keys are everything after the last dot in the option names. You won't get more levels than two. In the example above, you can see that the option "section2.x.a" is split in two: "section2.x" in the first level and "a" in the second.

The values are always array-refs, even it there is only one value to a specific option. For some options, it makes sense to have a list of values attached to them. But even if you expect a single value to an option you may have it defined in the global scope and redefined in the local scope. In this case, it will appear as a two-element array, the last one being the local value.

So, if you want to treat an option as single-valued, you should fetch it like this:

$h->{section1}{a}[-1]
$h->{'section2.x'}{a}[-1]

If the SECTION argument is passed, the method returns the second-level hash for it. So, following the example above:

$git->get_config('section1');

This call would return this hash:

{
    'a' => [1],
    'b' => [2, 3],
}

If the section doesn't exist an empty hash is returned. Any key/value added to the returned hash will be available in subsequent invocations of get_config.

If the VARIABLE argument is also passed, the method returns the value(s) of the configuration option SECTION.VARIABLE. In list context the method returns the list of all values or the empty list, if the variable isn't defined. In scalar context, the method returns the variable's last value or undef, if it's not defined.

As a special case, options without values (i.e., with no equals sign after its name in the configuration file) are set to the string 'true' to force Perl recognize them as true Booleans.

The string undef may be used to reset the list of values. Only values after the last occurrence of undef are considered either in list or in scalar context. This is a general way for you to cancel higher level configurations (e.g., system or global) configs in lower level configurations (e.g. local). And it works for every configuration option.

get_config_boolean SECTION VARIABLE

Git configuration variables may be grokked as Booleans. (See git help config.) There are specific values meaning true (viz. yes, on, true, 1, and the absence of a value) and specific values meaning false (viz. no, off, false, 0, and the empty string).

This method checks the variable's value and returns 1 or 0 representing Boolean values in Perl. If the variable's value isn't recognized as a Git Boolean the method croaks. If the variable isn't defined the method returns undef.

In the Git::Hooks documentation, all configuration variables mentioning a BOOL value are grokked with this method.

get_config_integer SECTION VARIABLE

Git configuration variables may be grokked as integers. (See git help config.) They may start with an optional signal (+ or -), followed by one or more decimal digits, and end with an optional scaling factor letter, viz. k (1024), m (1024*1024), or g (1024*1024*1024). The scaling factor may be in lower or upper-case.

This method checks the variable's value format and returns the corresponding Perl integer. If the variable's value isn't recognized as a Git integer the method croaks. If the variable isn't defined the method returns undef.

In the Git::Hooks documentation, all configuration variables mentioning an INT value are grokked with this method.

check_timeout

If the configuration option githooks.timeout is set to a positive number, this method aborts the hook if more than that amount of time (in seconds) has passed since the start of the run. It's called in many places by Git::Hooks itself and by some of its plugins to try to stop runaway checks.

fault MESSAGE INFO

This method should be used by plugins to record consistent error or warning messages. It gets one or two arguments. MESSAGE is a multi-line string explaining the error. INFO is an optional hash-ref which may contain additional information about the message, which will be used to complement it.

A "complete" fault is formatted like this:

[PREFIX: CONTEXT]

MESSAGE

  DETAILS

PREFIX gives contextual information about the message. It can be set via the prefix INFO hash key. If not, the package name of the function which called fault is used, which usually happens to be the name of the plugin which detected the error.

CONTEXT is additional contextual information, such as a reference name, a commit SHA-1, and a violated configuration option.

MESSAGE is the multi-line error message.

DETAILS is a multi-line string giving more details about the error. Usually showing error output from an external command.

Besides the MESSAGE, which is required, and the PREFIX, which has a default value, all other items must be informed via the INFO hash-ref with the following keys:

  • prefix

    A string giving broad contextual information about the error message. When absent, the prefix used is the package name of the function which called fault, which is usually a Git::Hooks plugin name.

  • commit

    The SHA-1 or a Git::Repository::Log object representing a commit. It is informed in the CONTEXT area like this (as a short SHA-1):

    [PREFIX: commit SHA-1]
  • ref

    The name of a Git reference (usually a branch). It is informed in the CONTEXT area like this:

    [PREFIX: on ref REF]
  • option

    The name of a configuration option related to the error message. It is informed in the CONTEXT area like this:

    [PREFIX: violates option 'OPTION']
  • details

    A string containing details about the error message. If present, it is appended to the MESSAGE, separated by an empty line, and with its lines prefixed by two spaces.

The method simply records the formatted error message and returns. It doesn't die.

The messages can be colorized if they go to a terminal. This can be configured by the configuration options githooks.color and githooks.color.<slot>, which are explained in the section "CONFIGURATION" in Git::Hooks documentation.

get_faults

This method returns a string specially formatted with all error messages recorded with the fault method, a header, and a footer, if requested by configuration.

fail_on_faults [WARN_ONLY]

By default (or if WARN_ONLY is false) if there are any faults registered so far by the fault method, this method logs the fault messages in the ERROR level and aborts by croaking.

If WARN_ONLY is true, it logs the fault messages in the WARN level.

undef_commit

The undefined commit is a special SHA-1 used by Git in the update and pre-receive hooks to signify that a reference either was just created (as the old commit) or has been just deleted (as the new commit). It consists of 40 zeroes.

empty_tree

The empty tree represents an empty directory for Git.

get_commit COMMIT

Returns a Git::Repository::Log object representing COMMIT.

get_commits OLDCOMMIT NEWCOMMIT [OPTIONS [PATHS]]

Returns a list of Git::Repository::Log objects representing every commit reachable from NEWCOMMIT but not from OLDCOMMIT.

There are two special cases, though:

If NEWCOMMIT is the undefined commit, i.e., '0000000000000000000000000000000000000000', this means that a branch, pointing to OLDCOMMIT, has been removed. In this case the method returns an empty list, meaning that no new commit has been created.

If OLDCOMMIT is the undefined commit, this means that a new branch pointing to NEWCOMMIT is being created. In this case we want all commits reachable from NEWCOMMIT but not reachable from any other branch. The syntax for this is NEWCOMMIT ^B1 ^B2 ... ^Bn", i.e., NEWCOMMIT followed by every other branch name prefixed by carets. We can get at their names using the technique described in, e.g., this discussion.

The Git::Repository::Log objects are constructed ultimately by invoking the git log command like this:

git log [<options>] <revision range> [-- <paths>]

The revision range is usually just OLDCOMMIT..NEWCOMMIT, but there are some special cases which require some calculating as discussed above.

The OPTIONS optional argument is an array-ref pointing to an array of strings, which will be passed as options to the git-log command. It may be useful to grok some extra information about each commit (e.g., using --name-status).

The PATHS optional argument is an array-ref pointing to an array of strings, which will be passed as pathspecs to the git-log command. It may be useful to filter the list of commits, grokking only those affecting specific paths in the repository.

read_commit_msg_file FILENAME

Returns the relevant contents of the commit message file called FILENAME. It's useful during the commit-msg and the prepare-commit-msg hooks.

The file is read using the character encoding defined by the i18n.commitencoding configuration option or utf-8 if not defined.

Some non-relevant contents are stripped off the file. Specifically:

  • diff data

    Sometimes, the commit message file contains the diff data for the commit. This data begins with a line starting with the fixed string diff --git a/. Everything from such a line on is stripped off the file.

  • comment lines

    Every line beginning with a # character is stripped off the file.

  • trailing spaces

    Any trailing space is stripped off from all lines in the file.

  • trailing empty lines

    Any empty line at the end is stripped off from the file, making sure it ends in a single newline.

All this cleanup is performed to make it easier for different plugins to analyze the commit message using a canonical base.

write_commit_msg_file FILENAME, MSG, ...

Writes the list of strings MSG to FILENAME. It's useful during the commit-msg and the prepare-commit-msg hooks.

The file is written to using the character encoding defined by the i18n.commitencoding configuration option or utf-8 if not defined.

An empty line (\n\n) is inserted between every pair of MSG arguments, if there is more than one, of course.

get_affected_refs

Returns the list of names of the references affected by the current push command. It's useful in the update and the pre-receive hooks.

get_affected_ref_range REF

Returns the two-element list of commit ids representing the OLDCOMMIT and the NEWCOMMIT of the affected REF.

get_affected_ref_commits REF [OPTIONS [PATHS]]

Returns the list of commits leading from the affected REF's NEWCOMMIT to OLDCOMMIT. The commits are represented by Git::Repository::Log objects, as returned by the get_commits method.

The optional arguments OPTIONS and PATHS are passed to the get_commits method.

filter_name_status_in_index FILTER

Returns a hash with information about files changed in the index (aka stage area or cache) compared to HEAD. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-index --name-status command. It's useful in the pre-commit hook when you want to know which files are being modified in the upcoming commit.

FILTER specifies in which kind of changes you're interested in. It's passed as the argument to the --diff-filter option of git diff-index, which is documented like this:

--diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]

  Select only files that are Added (A), Copied (C), Deleted (D), Modified
  (M), Renamed (R), have their type (i.e. regular file, symlink,
  submodule, ...) changed (T), are Unmerged (U), are Unknown (X), or have
  had their pairing Broken (B). Any combination of the filter characters
  (including none) can be used. When * (All-or-none) is added to the
  combination, all paths are selected if there is any file that matches
  other criteria in the comparison; if there is no file that matches other
  criteria, nothing is selected.

filter_name_status_in_range FILTER FROM TO [OPTIONS [PATHS]]

Returns a hash with information about files that are changed between commits FROM and TO. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status command. It's useful in the update and the pre-receive hooks when you want to know which files are being modified in the commits being received by a git push command.

FILTER specifies in which kind of changes you're interested in. Please, read about it in the filter_name_status_in_index method above.

FROM and TO are revision parameters (see git help revisions) specifying two commits. They're passed as arguments to the git diff-tree command in order to compare them and grok the files that differ between them.

A special case occurs when FROM is the undefined commit, which happens when we're calculating the commit range in a pre-receive or update hook and a new branch or tag has been pushed. In this case we pass FROM and TO to the get_commits method to find the list of new commits being pushed and calculate the difference between the first commit's parent and TO. When the first commit has no parent (in case it's a root commit) we return an empty list.

The optional arguments OPTIONS and PATHS are passed to the get_commits method.

filter_name_status_in_commit FILTER, COMMIT

Returns a hash with information about files that are changed in COMMIT. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status command. It's useful in the patchset-created and the draft-published hooks when you want to know which files are being modified in the single commit being received by a git push command.

FILTER specifies in which kind of changes you're interested in. Please, read about it in the filter_name_status_in_index method above.

COMMIT is a revision parameter (see git help revisions) specifying the commit. It's passed a argument to git diff-tree in order to compare it to its parents and grok the files that changed in it.

Merge commits are treated specially. Only files that are changed in COMMIT with respect to all of its parents are returned. The reasoning behind this is that if a file isn't changed with respect to one or more of COMMIT's parents, then it must have been checked already in those commits and we don't need to check it again. In this case, since the files may have been changed differently in each branch (added, modified, deleted, etc.), the hash values are strings of letters, one for each branch.

filter_files_in_index FILTER

Returns the sorted keys of the hash that would be returned by the filter_name_status_in_index method if invoked with the same arguments.

filter_files_in_range FILTER FROM TO [OPTIONS [PATHS]]

Returns the sorted keys of the hash that would be returned by the filter_name_status_in_range method if invoked with the same arguments.

filter_files_in_commit FILTER, COMMIT

Returns the sorted keys of the hash that would be returned by the filter_name_status_in_commit method if invoked with the same arguments.

authenticated_user

Returns the username of the authenticated user performing the Git action. It groks it from the githooks.userenv configuration variable specification, which is described in the Git::Hooks documentation. It's useful for most access control check plugins.

If githooks.userenv isn't configured, it tries to grok the username from environment variables set by Gerrit, Bitbucket Server, and GitLab before trying the USER environment variable as a last resort. If it can't find it, it returns undef.

repository_name

Returns the repository name as a string. Currently it knows how to grok the name from Gerrit, Bitbucket, and GitLab servers. Otherwise it tries to grok it from the GIT_DIR environment variable, which holds the path to the Git repository.

get_current_branch

Returns the repository's current branch name, as indicated by the git symbolic-ref HEAD command.

If the repository is in a detached head state, i.e., if HEAD points to a commit instead of to a branch, the method returns undef.

get_sha1 REV

Returns the SHA1 of the commit represented by REV, using the command

git rev-parse --verify REV

It's useful, for instance, to grok the HEAD's SHA1 so that you can pass it to the get_commit method.

get_head_or_empty_tree

Returns the string "HEAD" if the repository already has commits. Otherwise, if it is a brand new repository, it returns the SHA1 representing the empty tree. It's useful to come up with the correct argument for, e.g., git diff during a pre-commit hook. (See the default pre-commit.sample script which comes with Git to understand how this is used.)

blob REV, FILE, ARGS...

Returns the name of a temporary file into which the contents of the file FILE in revision REV has been copied.

It's useful for hooks that need to read the contents of changed files in order to check anything in them.

These objects are cached so that if more than one hook needs to get at them they're created only once.

By default, all temporary files are removed when the Git::Repository object is destroyed.

Any remaining ARGS are passed as arguments to File::Temp::newdir so that you can have more control over the temporary file creation.

If REV:FILE does not exist or if there is any other error while trying to fetch its contents the method dies.

file_size REV FILE

Returns the size (in bytes) of FILE (a path relative to the repository root) in revision REV.

file_mode REV FILE

Returns the mode (as a number) of FILE (a path relative to the repository root) in revision REV.

is_reference_enabled REF

This method should be invoked by hooks to see if REF is enabled according to the githooks.ref and githooks.noref options. Please, read about these options in Git::Hooks documentation.

REF must be a complete reference name or undef. Local hooks should pass the current branch, and server hooks should pass the references affected by the push command. If REF is undef, the method returns true.

The method decides if a reference is enabled using the following algorithm:

  • If REF matches any REFSPEC in githooks.ref then it is enabled.

  • Else, if REF matches any REFSPEC in githooks.noref then it is disabled.

  • Else, it is enabled.

match_user SPEC

Checks if the authenticated user (as returned by the authenticated_user method above) matches the specification, which may be given in one of the three different forms acceptable for the githooks.admin configuration configuration option, i.e., as a username, as a @group, or as a ^regex.

im_admin

Checks if the authenticated user (again, as returned by the authenticated_user method) matches the specifications given by the githooks.admin configuration variable. This is useful to exempt "administrators" from the restrictions imposed by the hooks.

grok_acls CFG ACTIONS

This method returns a list of ACLs (Access Control Lists) grokked from the CFG.acl options, where CFG is a configuration session like githooks.checkfile.

The CFG.acl is a multi-valued option specifying rules allowing or denying specific users to perform specific actions on specific "things". (Commons such things are references and files). By default any user can perform any action on any thing. So, the rules are used to impose restrictions.

When a hook is invoked it groks all things that were affected in any way by the commits involved and tries to match each of them to a RULE to see if the action performed on it is allowed or denied.

A RULE takes three or four parts, like this:

(allow|deny) [ACTIONS]+ <spec> (by <userspec>)?
  • (allow|deny)

    The first part tells if the rule allows or denies an action.

  • [ACTIONS]+

    The second part specifies which actions are being considered by a combination of letters. The ACTIONS argument is a string containing all valid letters for the corresponding ACLs.

    See the documentation of the acl option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for two examples of this.

  • <spec>

    The third part specifies which things are being considered. In its simplest form, a spec is taken as a literal string matching the thing exactly by name.

    If the spec starts with a caret (^) it's interpreted as a Perl regular expression, the caret being kept as part of the regexp. These specs match potentially many things.

    Before being interpreted as a string or as a regexp, any sub-string of it in the form {VAR} is replaced by $ENV{VAR}. This is useful, for example, to interpolate the committer's username in the spec, in order to create personal name spaces for users.

    (See the documentation of the acl option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for examples things as files and references, respectively.)

  • by <userspec>

    The fourth part is optional. It specifies which users are being considered. It can be the name of a single user (e.g. james) or the name of a group (e.g. @devs).

    If not specified, the RULE matches any user.

The RULEs are matched in the reverse order as they appear in the result of the command git config CFG.acl, so that later rules take precedence. This way you can have general rules in the global context and more specific rules in the repository context, naturally.

So, the last RULE matching the action, the file, and the user, tells if the operation is allowed or denied.

If no RULE matches the operation, it is allowed by default.

In the returned list, each ACL is represented by a hash with the following keys:

  • acl

    Contains the original representation of the ACL, which is useful in producing error messages.

  • allow

    A Boolean telling if the ACL is an "allow".

  • action

    The string representation of the action (e.g. 'AMD' or 'CRUD').

  • spec

    The spec, which can be either a string or a pre-compiled regex object.

  • who

    The name of a user or of a group.

As an optimization, only ACLs matching the current user, either explicitly or by not having a WHO part, are returned in the list.

SEE ALSO

Git::Repository::Plugin, Git::Hooks.

AUTHOR

Gustavo L. de M. Chaves <gnustavo@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2023 by CPQD <www.cpqd.com.br>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.