NAME
PDL::Algorithm::Center - Various methods of finding the center of a sample
VERSION
version 0.09
DESCRIPTION
PDL::Algorithm::Center
is a collection of algorithms which specialize in centering datasets.
SUBROUTINES
See "TYPES" for information on the types used in the subroutine descriptions.
sigma_clip
$results = sigma_clip(
center => Optional [ Center | CodeRef ],
clip => Optional [PositiveNum],
coords => Optional [Coords],
dtol => PositiveNum,
iterlim => Optional [PositiveInt],
log => Optional [Bool | CodeRef],
mask => Optional [ Undef | Piddle_min1D_ne ],
save_mask => Optional [Bool],
save_weight => Optional [Bool],
nsigma => PositiveNum,
weight => Optional [ Undef | Piddle_min1D_ne ],
);
Center a dataset by iteratively excluding data outside of a radius equal to a specified number of standard deviations. The dataset may be specified as a list of coordinates and optional weights, or as a weight piddle of shape NxM (e.g., an image). If only the weight piddle is provided, it is converted internally into a list of coordinates with associated weights.
To operate on a subset of the input data, specify the mask
option.
A PDL::Algorithm::Center::Failure::parameter exception will be thrown if there is a parameter error.
The center of a data set is determined by:
clipping (ignoring) the data whose distance to the current center is greater than a specified number of standard deviations
calculating a new center by performing a (weighted) centroid of the remaining data
calculating the standard deviation of the distance from the remaining data to the center
repeat step 1 until either a convergence tolerance has been met or the iteration limit has been exceeded
The initial center may be explicitly specified, or may be calculated by performing a (weighted) centroid of the data.
The initial standard deviation is calculated using the initial center and either the entire dataset, or from a clipped region about the initial center.
Options
The following options are available:
center
=> ArrayRef | Piddle1D_ne | coderef-
The initial center. It may be
An array of length N
The array may contain undefined values for each dimension for which the center should be determined by finding the mean of the values in that dimension.
A piddle with shape N (or something that can be coerced into one, see "TYPES"),
A coderef which will return the center as a piddle with shape N. The subroutine is called as
&$center( $coords, $mask, $weight, $total_weight );
with
$coords
-
A piddle with shape NxM containing M coordinates with dimension N
$mask
-
A piddle with shape M, essentially a flattened copy of the initial
$mask
option to "iterate". $weight
-
A piddle with shape M, essentially a copy of the initial
$weight
option to "iterate". $total_weight
-
A scalar which is the sum of
$mask * $weight
clip
=> positive number-
Optional. The clipping radius used to determine the initial standard deviation.
coords
=> Coords-
Optional. The coordinates to center.
coords
is a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements.weight
may be specified with coords to indicate weighted data.mask
may be specified to indicate that a subset of the coordinates should be operated on.coords
is useful if the data cube is not fully populated; for dense data, useweight
instead. dtol
=> positive number-
Optional. If specified iteration will cease when successive centers are closer than the specified distance.
iterlim
=> positive integer-
Optional. The maximum number of iterations to run. Defaults to 10.
log
=> boolean|coderef-
Optional.
If
log
is true (and not a coderef), a default logger which outputs to STDOUT will be used.If a coderef it will be called before the first iteration and at the end of each iteration. It is passed a copy of the current iteration's results object; see "Sigma Clip Iteration Results".
mask
=> piddle-
Optional. This is a piddle which specifies which coordinates to include in the calculations. Its values are either
0
or1
, where values of1
indicate coordinates to be included. It defaults to a piddle of all1
's.When used with
coords
,mask
must be a piddle of shape M, where M is the number of data elements incoords
.If
coords
is not specified,mask
should have the same shape asweight
. save_mask
=> boolean-
If true, the mask used in the final iteration will be returned in the iteration result object.
save_weight
=> boolean-
If true, the weights used in the final iteration will be returned in the iteration result object.
nsigma
=> scalar-
The size of the clipping radius, in units of the standard deviation.
weight
=> piddle-
Optional. Data weights. When used with
coords
,weight
must be a piddle of shape M, where M is the number of data elements incoords
. Ifcoords
is not specified,weight
is a piddle of shape NxM, where N is the number of dimensions in the data and M is the number of data elements.It defaults to a piddle of all
1
's.
Sigma Clip Results
sigma_clip returns an object which includes all of the attributes from the final iteration object (See "Sigma Clip Iterations" ), with the following additional attributes/methods:
iterations
=> arrayref-
An array of results objects for each iteration.
success
=> boolean-
True if the iteration converged, false otherwise.
error
=> error object-
If convergence has failed, this will contain an error object describing the failure. See "Errors".
mask
=> piddle-
If the
$save_mask
option is true, this will be the final inclusion mask. weight
=> piddle-
If the
$save_weight
option is true, this will be the final weights.
Sigma Clip Iterations
The results for each iteration are stored in an object with the following attributes/methods:
center
=> piddle|undef-
A 1D piddle containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
iter
=> integer-
The iteration index. An index of
0
indicates the values determined before the iterative loop was entered, and reflects the initial clipping and mask exclusion. nelem
=> integer-
The number of data elements used in the center.
total_weight
=> number-
The combined weight of the data elements used to determine the center.
sigma
=> number|undef-
The standard deviation of the clipped data. The value for the last iteration will be undefined if all of the elements have been clipped.
clip
=> number|undef-
The clipping radius. This will be undefined for the first iteration if the
clip
option was not specified. dist
=> number-
Optional. The distance between the previous and current centers. This is defined only if the
dtol
option was passed.
iterate
$result = iterate(
center => Center | CodeRef,
initialize => CodeRef,
calc_center => CodeRef,
calc_wmask => CodeRef,
is_converged => CodeRef,
coords => Coords,
iterlim => PositiveInt,
log => Optional [CodeRef],
mask => Optional [Piddle1D_ne],
save_mask => Optional [Bool],
save_weight => Optional [Bool],
weight => Optional [Piddle1D_ne],
);
A generic iteration loop for centering data using callbacks for calculating centers, included element masks, weight, and iteration completion.
A PDL::Algorithm::Center::Failure::parameter exception will be thrown if there is a parameter error.
Options
The following options are accepted:
center
=> Piddle1D_ne | coderef-
The initial center. It may either be a piddle with shape N (or something that can be coerced into one, see "TYPES") or a coderef which will return the center as a piddle with shape N. The coderef is called as
$initial_center = &$center( $coords, $mask, $weight, $total_weight );
with
$coords
-
A piddle with shape NxM containing M coordinates with dimension N
$mask
-
A piddle with shape M, essentially a flattened copy of the initial
$mask
option to "iterate". $weight
-
A piddle with shape M, essentially a copy of the initial
$weight
option to "iterate". $total_weight
-
A scalar which is the sum of
$mask * $weight
.
initialize
=> coderef-
This subroutine provides initialization prior to entering the iteration loop. It should initialize the passed iteration object and work storage.
It is invoked as:
&$initialize( $coords, $mask, $weight, $current, $work );
with
$coords
-
A piddle of shape NxM with the coordinates of each element
$mask
-
A piddle with shape M, essentially a flattened copy of the initial
$mask
option to "iterate". $weight
-
A piddle with shape M, essentially a copy of the initial
$weight
option to "iterate". $current
-
a reference to a Hash::Wrap based object containing data for the current iteration.
initialize
may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate
:nelem
-
The number of included coordinates,
$mask-
sum>. total_weight
-
The sum of the weights of the included coordinates,
($mask * $weight)->dsum
.
$work
-
A hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
calc_center
=> coderef-
This subroutine should return a piddle of shape N with the calculated center.
It will be called as:
$center = &$calc_center( $coords, $mask, $weight, $current, $work );
with
$coords
-
A piddle of shape NxM with the coordinates of each element
$mask
-
A piddle with shape M containing the current inclusion mask.
$weight
-
A piddle with shape M containing the current weights for the included coordinates.
$current
-
A reference to a Hash::Wrap based object containing data for the current iteration.
calc_center
may augment the underlying hash with its own data (but see "Iteration Objects"). The following attributes are provided byiterate
:nelem
-
The number of included coordinates,
$mask->sum
. total_weight
-
The sum of the weights of the included coordinates,
($mask*$weight)->dsum)
.
$work
-
A hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
calc_wmask
=> coderef-
This subroutine should determine the current set of included coordinates and their current weights.
It will be called as:
&$calc_mask( $coords, $mask, $weight, $current, $work );
with
$coords
-
A piddle of shape NxM with the coordinates of each element
$mask
-
A piddle with shape M, essentially a flattened copy of the initial
$mask
option to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update$current->nelem
if this is changed. $weight
-
A piddle with shape M, essentially a flattened copy of the initial
$mask
option to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update$current->total_weight
if this is changed. $current
-
A reference to a Hash::Wrap based object containing data for the current iteration.
calc_center
may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate
:nelem
-
The number of included coordinates,
$mask->sum
. If$mask
is changed this must either be updated or set to the undefined value. total_weight
-
The sum of the weights of the included coordinates,
($mask * $weight)->dsum
. If$weight
is changed this must either be updated or set to the undefined value.
$work
-
A hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
is_converged
=> coderef-
This subroutine should return a boolean value indicating whether the iteration has converged.
It is invoked as:
$bool = &$is_converged( $coords, $mask, $weight, $last, $current, $work );
with
$coords
-
A piddle of shape NxM with the coordinates of each element
$mask
-
A piddle with shape M containing the current inclusion mask.
$weight
-
A piddle with shape M containing the current weights for the included coordinates.
$last
-
A reference to a Hash::Wrap based object containing data for the previous iteration.
is_converged
may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate
:nelem
-
The number of included coordinates.
total_weight
-
The sum of the weights of the included coordinates.
$current
-
A reference to a Hash::Wrap based object containing data for the current iteration, with attributes as described above for
$last
$work
-
A hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
The
is_converged
routine is passed references to the actual objects used by sigma_clip to keep track of the iterations. This means that theis_converged
routine may manipulate the starting point for the next iteration by altering its$current
parameter.is_converged
is called prior to entering the iteration loop with$last
set toundef
. This allows priming the$current
structure, which will be used as$last
in the first iteration. coords
=> Coords-
The coordinates to center.
coords
is a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements. iterlim
-
A positive integer specifying the maximum number of iterations.
log
=> coderef-
Optional. A subroutine which will be called
It is invoked as
&$log( $iteration );
where
$iteration
is a copy of the current iteration object. The object will have at least the following fields:center
=> piddle|undef-
A piddle of shape N containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
iter
-
The iteration index
nelem
-
The number of included coordinates.
total_weight
-
The summed weight of the included coordinates.
There may be other attributes added by the various callbacks (
calc_wmask
,calc_center
,is_converged
). See for example, "Sigma Clip Iterations". mask
=> piddle-
Optional. This is a piddle which specifies which coordinates to include in the calculations. Its values are either
0
or1
, where values of1
indicate coordinates to be included. It defaults to a piddle of all1
's.When used with
coords
,mask
must be a piddle of shape M, where M is the number of data elements incoords
.If
coords
is not specified,mask
should have the same shape asweight
. save_mask
=> boolean-
If true, the mask used in the final iteration will be returned in the iteration result object.
save_weight
=> boolean-
If true, the weights used in the final iteration will be returned in the iteration result object.
weight
=> piddle-
Optional. Data weights. When used with
coords
,weight
must be a piddle of shape M, where M is the number of data elements incoords
. Ifcoords
is not specified,weight
is a piddle of shape NxM, where N is the number of dimensions in the data and M is the number of data elements.It defaults to a piddle of all
1
's.
Callbacks are provided with Hash::Wrap based objects which contain the data for the current iteration. They should add data to the objects underlying hash which records particulars about their specific operation,
Work Space
Callbacks are passed Hash::Wrap based iteration objects and a reference to a $work
hash. The iteration objects may have additional elements added to them (which will be available to the caller), but should refrain from storing unnecessary data there, as each new iteration's object is copied from that for the previous iteration.
Instead, use the passed $work
hash. It is shared amongst the callbacks, so use it to store data which will not be returned to the caller.
Results
iterate returns an object which includes all of the attributes from the final iteration object (See "Iteration Object" ), with the following additional attributes/methods:
iterations
=> arrayref-
An array of result objects for each iteration.
success
=> boolean-
True if the iteration converged, false otherwise.
error
=> error object-
If convergence has failed, this will contain an error object describing the failure. See "Errors".
mask
=> piddle-
If the
$save_mask
option is true, this will be the final inclusion mask. weight
=> piddle-
If the
$save_weight
option is true, this will be the final weights.
The value of the center
attribute in the last iteration will be undefined if all of the elements have been clipped.
Iteration Object
The results for each iteration are stored in an object with the following attributes/methods (in addition to those added by the callbacks).
center
=> piddle|undef-
A 1D piddle containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
iter
=> integer-
The iteration index. An index of
0
indicates the values determined before the iterative loop was entered, and reflects the initial clipping and mask exclusion. nelem
=> integer-
The number of data elements used in the center.
total_weight
=> number-
The combined weight of the data elements used to determine the center.
Iteration Steps
Before the first iteration:
Extract an initial center from
center
.Create a new iteration object.
Call
initialize
.Call
log
For each iteration:
Creat a new iteration object by copying the old one.
Call
calc_wmask
, with a copy of the initial mask and weights.calc_mask
should update (in place) at least one of themUpdate summed weight and number of elements if
calc_wmask
sets them toundef
.Call
calc_center
with the current mask and weights.Call
is_converged
with the current mask and weights.Call
log
Goto step 1 if not converged and iteration limit has not been reached.
TYPES
In the description of the subroutines, the following types are specified:
- Center
-
This accepts a non-null, non-empty 1D piddle, or anything that can be converted into one (for example, a scalar, a scalar piddle, or an array of numbers );
- CodeRef
-
A code reference.
- PositiveNum
-
A positive real number.
- PositiveInt
-
A positive integer.
- Coords
-
This accepts a non-null, non-empty 2D piddle, or anything that can be converted or up-converted to it.
- Piddle_min1D_ne
-
This accepts a non-null, non-empty piddle with a minimum of 1 dimension.
- Piddle1D_ne
-
This accepts a non-null, non-empty 1D piddle.
ERRORS
Errors are represented as objects in the following classes:
- Parameter Validation
-
These are unconditionally thrown as PDL::Algorithm::Center::Failure::parameter objects.
- Iteration
-
These are stored in the result object's
error
attribute.PDL::Algorithm::Center::Failure::iteration::limit_reached PDL::Algorithm::Center::Failure::iteration::empty
The objects stringify to a failure message.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=PDL-Algorithm-Center or by email to bug-PDL-Algorithm-Center@rt.cpan.org.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
SOURCE
The development version is on github at https://github.com/djerius/pdl-algorithm-center and may be cloned from git://github.com/djerius/pdl-algorithm-center.git
AUTHOR
Diab Jerius <djerius@cpan.org>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2017 by Smithsonian Astrophysical Observatory.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007