NAME

TaskPipe::Tool::Command_TestTask - command to test an individual TaskPipe task

PURPOSE

Test an individual task by running it against test data

DESCRIPTION

test task can be used to run test data against an individual task and check the output. This effectively enables "unit testing" of tasks to make sure they are working correctly before running them as part of a plan.

A list of test data should be supplied within the task module itself by providing a test_pinterp subroutine.

What is pinterp ?

There are 3 words that are important when discussing the data going into a task. Those words are:

1. input(s)

The inputs are the raw data which the task is provided with. When running a plan, this is the data which is provided by the previous task. To give an example, let's say a previous task provides our task Scrape_Example with this set of data:

{
    url => 'http://www.example.com/some-list',
    headers => {
        Referer => 'http://www.example.com'
    },
    date => '2018-17-10'
}

This data is the input or inputs.

2. parameters (or params):

You specify parameters in your plan. For example:

task:
    _name: Scrape_Example
    url: $this
    headers:
        Referer: $this

This part of the task specification are the parameters:

url: $this
headers:
    Referer: $this

The parameters tell TaskPipe which part of the input data to accept and use.

3. "Interpolated parameters" or pinterp

The parameters are interpolated using the input data. The result is the pinterp.

The combination of the parameters and the input data results in the following data being accepted and used in the task:

url => 'http://www.example.com/some-list'
headers => {
    Referer => 'http://www.example.com'
}

These are the pinterp. Note that in the original set of inputs there was a date input. This is not included in pinterp because we didn't include a date parameter in the plan. So inputs and pinterp are different.

In fact inputs and pinterp can be really very different, because we can specify that we want to accept data from earlier tasks (e.g. instead of accepting data from the previous task, we accept it from the task previous to the previous task (ie 2 tasks before, instead of one). Consider the following parameters:

url: $this
headers:
    Referer: $this[1]{url}

These parameters are telling TaskPipe to take the url from the output named url of the previous task, but take the Referer from the output named url of the task previous to the previous task (2 tasks ago).

Specifying the url and Referer header like this is a common situation, because this mirrors how web pages progress when a human is clicking around in a web browser: the Referer is always the previous url to the one you are visiting.

Including test data in your Task module

When testing tasks, we cut to the data that the task is actually accepting - so that means supplying the pinterp directly. Making sure the inputs are correct is a job to consider when we are putting the plan together as a whole.

To give an example of how testing works, let's say our scraping task TaskPipe::Task_Scrape_Example has a test_pinterp subroutine which looks like

sub test_pinterp{[{
   url => 'http://www.example.com/list-something',
   headers => {
       Referer => 'http://www.example.com'
   }
}]}

so this subroutine returns a list containing one item of test data - the hashref

{
    url => 'http://www.example.com/list-something',
    headers => {
        Referer => 'http://www.example.com'
    }
}

Let's say our scraping task is in the module TaskPipe::Task_Scrape_Example. That means the name of the task is Scrape_Example. To test our task we would type

taskpipe test task --name=Scrape_Example --test=0

The --test=0 parameter tells TaskPipe to use the first item in the test_pinterp list to run the task over.

If you prefer to name your test data, you can write your test_pinterp subroutine so it returns a hashref:

sub test_pinterp{{

   mytest => {
       url => 'http://www.example.com/list-something',
       headers => {
           Referer => 'http://www.example.com'
       }
   }
}}

In this example we have one test set of test data named mytest and we can test the task against this data using:

taskpipe test task --test=mydata

Test output

The results of the test are normally output to a file in your log directory. The filename will be a concatenation of (file_prefix + the task name + the date and time + the file_suffix) where file_prefix and file_suffix come from the project config settings in the TaskPipe::Task::TestSettings section. test task should print the filename it produced to the terminal when you execute the command.

You can also set output in TaskPipe::Task::TestSettings to screen to echo output to the screen instead of to a file (but potentially a lot of output depending on the task), or screen,file for both.

OPTIONS

name

The "name" of the task to test. The name of a task is found from the module name via

TaskPipe::Task_<name>

or

MyProject::Task_<name>

e.g. the task corresponding to module TaskPipe::Task_Record has name="Record"

test

The name or index of the test to run. (Specify test_pinterp as a 'HashRef[HashRef]' to use names, or 'ArrayRef[HashRef]' to use indices.

AUTHOR

Tom Gracey <tomgracey@gmail.com>

COPYRIGHT AND LICENSE

Copyright (c) Tom Gracey 2018

TaskPipe is free software, licensed under

The GNU Public License Version 3