NAME
PDL::Algorithm::Center - Various methods of finding the center of a sample
VERSION
version 0.11
DESCRIPTION
PDL::Algorithm::Center is a collection of algorithms which
specialize in centering datasets.
SUBROUTINES
See "TYPES" for information on the types used in the subroutine descriptions.
sigma_clip
$results = sigma_clip(
center => Optional [ Center | CodeRef ],
clip => Optional [PositiveNum],
coords => Optional [Coords],
dtol => PositiveNum,
iterlim => Optional [PositiveInt],
log => Optional [Bool | CodeRef],
mask => Optional [ Undef | Piddle_min1D_ne ],
save_mask => Optional [Bool],
save_weight => Optional [Bool],
nsigma => PositiveNum,
weight => Optional [ Undef | Piddle_min1D_ne ],
);
Center a dataset by iteratively excluding data outside of a radius equal to a specified number of standard deviations. The dataset may be specified as a list of coordinates and optional weights, or as a weight piddle of shape NxM (e.g., an image). If only the weight piddle is provided, it is converted internally into a list of coordinates with associated weights.
To operate on a subset of the input data, specify the mask option.
A PDL::Algorithm::Center::Failure::parameter exception will be thrown if there is a parameter error.
The center of a data set is determined by:
- clipping (ignoring) the data whose distance to the current center is greater than a specified number of standard deviations
- calculating a new center by performing a (weighted) centroid of the remaining data
- calculating the standard deviation of the distance from the remaining data to the center
- repeat step 1 until either a convergence tolerance has been met or the iteration limit has been exceeded
The initial center may be explicitly specified, or may be calculated by performing a (weighted) centroid of the data.
The initial standard deviation is calculated using the initial center and either the entire dataset, or from a clipped region about the initial center.
Options
The following options are available:
-
center=> _ArrayRef | Piddle1D_ne | coderef _The initial center. It may be
-
An array of length N
The array may contain undefined values for each dimension for which the center should be determined by finding the mean of the values in that dimension.
-
A piddle with shape N (or something that can be coerced into one, see "TYPES"),
-
A coderef which will return the center as a piddle with shape N. The subroutine is called as
&$center( $coords, $mask, $weight, $total_weight );with
-
$coordsA piddle with shape NxM containing M coordinates with dimension N
-
$maskA piddle with shape M, essentially a flattened copy of the initial
$maskoption to "iterate". -
$weightA piddle with shape M, essentially a copy of the initial
$weightoption to "iterate". -
$total_weightA scalar which is the sum of
$mask * $weight
-
-
-
clip=> positive numberOptional. The clipping radius used to determine the initial standard deviation.
-
coords=> CoordsOptional. The coordinates to center.
coordsis a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements.weightmay be specified with coords to indicate weighted data.maskmay be specified to indicate that a subset of the coordinates should be operated on.coordsis useful if the data cube is not fully populated; for dense data, useweightinstead. -
dtol=> positive numberOptional. If specified iteration will cease when successive centers are closer than the specified distance.
-
iterlim=> positive integerOptional. The maximum number of iterations to run. Defaults to 10.
-
log=> boolean|coderefOptional.
If
logis true (and not a coderef), a default logger which outputs to STDOUT will be used.If a coderef it will be called before the first iteration and at the end of each iteration. It is passed a copy of the current iteration's results object; see "Sigma Clip Iteration Results".
-
mask=> piddleOptional. This is a piddle which specifies which coordinates to include in the calculations. Its values are either
0or1, where values of1indicate coordinates to be included. It defaults to a piddle of all1's.When used with
coords,maskmust be a piddle of shape M, where M is the number of data elements incoords.If
coordsis not specified,maskshould have the same shape asweight. -
save_mask=> booleanIf true, the mask used in the final iteration will be returned in the iteration result object.
-
save_weight=> booleanIf true, the weights used in the final iteration will be returned in the iteration result object.
-
nsigma=> scalarThe size of the clipping radius, in units of the standard deviation.
-
weight=> piddleOptional. Data weights. When used with
coords,weightmust be a piddle of shape M, where M is the number of data elements incoords. Ifcoordsis not specified,weightis a piddle of shape NxM, where N is the number of dimensions in the data and M is the number of data elements.It defaults to a piddle of all
1's.
Sigma Clip Results
sigma_clip returns an object which includes all of the attributes from the final iteration object (See "Sigma Clip Iterations" ), with the following additional attributes/methods:
-
iterations=> arrayrefAn array of results objects for each iteration.
-
success=> booleanTrue if the iteration converged, false otherwise.
-
error=> error objectIf convergence has failed, this will contain an error object describing the failure. See "Errors".
-
mask=> piddleIf the
$save_maskoption is true, this will be the final inclusion mask. -
weight=> piddleIf the
$save_weightoption is true, this will be the final weights.
Sigma Clip Iterations
The results for each iteration are stored in an object with the following attributes/methods:
-
center=> piddle|undefA 1D piddle containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
-
iter=> integerThe iteration index. An index of
0indicates the values determined before the iterative loop was entered, and reflects the initial clipping and mask exclusion. -
nelem=> integerThe number of data elements used in the center.
-
total_weight=> numberThe combined weight of the data elements used to determine the center.
-
sigma=> number|undefThe standard deviation of the clipped data. The value for the last iteration will be undefined if all of the elements have been clipped.
-
clip=> number|undefThe clipping radius. This will be undefined for the first iteration if the
clipoption was not specified. -
dist=> numberOptional. The distance between the previous and current centers. This is defined only if the
dtoloption was passed.
iterate
$result = iterate(
center => Center | CodeRef,
initialize => CodeRef,
calc_center => CodeRef,
calc_wmask => CodeRef,
is_converged => CodeRef,
coords => Coords,
iterlim => PositiveInt,
log => Optional [CodeRef],
mask => Optional [Piddle1D_ne],
save_mask => Optional [Bool],
save_weight => Optional [Bool],
weight => Optional [Piddle1D_ne],
);
A generic iteration loop for centering data using callbacks for calculating centers, included element masks, weight, and iteration completion.
A PDL::Algorithm::Center::Failure::parameter exception will be thrown if there is a parameter error.
Options
The following options are accepted:
-
center=> _Piddle1D_ne | coderef _The initial center. It may either be a piddle with shape N (or something that can be coerced into one, see "TYPES") or a coderef which will return the center as a piddle with shape N. The coderef is called as
$initial_center = &$center( $coords, $mask, $weight, $total_weight );with
-
$coordsA piddle with shape NxM containing M coordinates with dimension N
-
$maskA piddle with shape M, essentially a flattened copy of the initial
$maskoption to "iterate". -
$weightA piddle with shape M, essentially a copy of the initial
$weightoption to "iterate". -
$total_weightA scalar which is the sum of
$mask * $weight.
-
-
initialize=> coderefThis subroutine provides initialization prior to entering the iteration loop. It should initialize the passed iteration object and work storage.
It is invoked as:
&$initialize( $coords, $mask, $weight, $current, $work );with
-
$coordsA piddle of shape NxM with the coordinates of each element
-
$maskA piddle with shape M, essentially a flattened copy of the initial
$maskoption to "iterate". -
$weightA piddle with shape M, essentially a copy of the initial
$weightoption to "iterate". -
$currenta reference to a Hash::Wrap based object containing data for the current iteration.
initializemay augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate:-
nelemThe number of included coordinates,
$mask-sum>. -
total_weightThe sum of the weights of the included coordinates,
($mask * $weight)->dsum.
-
-
$workA hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
-
-
calc_center=> coderefThis subroutine should return a piddle of shape N with the calculated center.
It will be called as:
$center = &$calc_center( $coords, $mask, $weight, $current, $work );with
-
$coordsA piddle of shape NxM with the coordinates of each element
-
$maskA piddle with shape M containing the current inclusion mask.
-
$weightA piddle with shape M containing the current weights for the included coordinates.
-
$currentA reference to a Hash::Wrap based object containing data for the current iteration.
calc_centermay augment the underlying hash with its own data (but see "Iteration Objects"). The following attributes are provided byiterate:-
nelemThe number of included coordinates,
$mask->sum. -
total_weightThe sum of the weights of the included coordinates,
($mask*$weight)->dsum).
-
-
$workA hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
-
-
calc_wmask=> coderefThis subroutine should determine the current set of included coordinates and their current weights.
It will be called as:
&$calc_mask( $coords, $mask, $weight, $current, $work );with
-
$coordsA piddle of shape NxM with the coordinates of each element
-
$maskA piddle with shape M, essentially a flattened copy of the initial
$maskoption to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update$current->nelemif this is changed. -
$weightA piddle with shape M, essentially a flattened copy of the initial
$maskoption to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update$current->total_weightif this is changed. -
$currentA reference to a Hash::Wrap based object containing data for the current iteration.
calc_centermay augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate:-
nelemThe number of included coordinates,
$mask->sum. If$maskis changed this must either be updated or set to the undefined value. -
total_weightThe sum of the weights of the included coordinates,
($mask * $weight)->dsum. If$weightis changed this must either be updated or set to the undefined value.
-
-
$workA hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
-
-
is_converged=> coderefThis subroutine should return a boolean value indicating whether the iteration has converged.
It is invoked as:
$bool = &$is_converged( $coords, $mask, $weight, $last, $current, $work );with
-
$coordsA piddle of shape NxM with the coordinates of each element
-
$maskA piddle with shape M containing the current inclusion mask.
-
$weightA piddle with shape M containing the current weights for the included coordinates.
-
$lastA reference to a Hash::Wrap based object containing data for the previous iteration.
is_convergedmay augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided byiterate:-
nelemThe number of included coordinates.
-
total_weightThe sum of the weights of the included coordinates.
-
-
$currentA reference to a Hash::Wrap based object containing data for the current iteration, with attributes as described above for
$last -
$workA hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
The
is_convergedroutine is passed references to the actual objects used by sigma_clip to keep track of the iterations. This means that theis_convergedroutine may manipulate the starting point for the next iteration by altering its$currentparameter.is_convergedis called prior to entering the iteration loop with$lastset toundef. This allows priming the$currentstructure, which will be used as$lastin the first iteration. -
-
coords=> CoordsThe coordinates to center.
coordsis a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements. -
iterlimA positive integer specifying the maximum number of iterations.
-
log=> coderefOptional. A subroutine which will be called
- between the call to
initializeand the start of the first iteration - at the end of each iteration
It is invoked as
&$log( $iteration );where
$iterationis a copy of the current iteration object. The object will have at least the following fields:-
center=> piddle|undefA piddle of shape N containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
-
iterThe iteration index
-
nelemThe number of included coordinates.
-
total_weightThe summed weight of the included coordinates.
There may be other attributes added by the various callbacks (
calc_wmask,calc_center,is_converged). See for example, "Sigma Clip Iterations". - between the call to
-
mask=> piddleOptional. This is a piddle which specifies which coordinates to include in the calculations. Its values are either
0or1, where values of1indicate coordinates to be included. It defaults to a piddle of all1's.When used with
coords,maskmust be a piddle of shape M, where M is the number of data elements incoords.If
coordsis not specified,maskshould have the same shape asweight. -
save_mask=> booleanIf true, the mask used in the final iteration will be returned in the iteration result object.
-
save_weight=> booleanIf true, the weights used in the final iteration will be returned in the iteration result object.
-
weight=> piddleOptional. Data weights. When used with
coords,weightmust be a piddle of shape M, where M is the number of data elements incoords. Ifcoordsis not specified,weightis a piddle of shape NxM, where N is the number of dimensions in the data and M is the number of data elements.It defaults to a piddle of all
1's.
Callbacks are provided with Hash::Wrap based objects which contain the data for the current iteration. They should add data to the objects underlying hash which records particulars about their specific operation,
Work Space
Callbacks are passed Hash::Wrap based iteration objects and a
reference to a $work hash. The iteration objects may have additional
elements added to them (which will be available to the caller),
but should refrain from storing unnecessary data there, as each
new iteration's object is copied from that for the previous iteration.
Instead, use the passed $work hash. It is shared amongst the
callbacks, so use it to store data which will not be returned to
the caller.
Results
iterate returns an object which includes all of the attributes from the final iteration object (See "Iteration Object" ), with the following additional attributes/methods:
-
iterations=> arrayrefAn array of result objects for each iteration.
-
success=> booleanTrue if the iteration converged, false otherwise.
-
error=> error objectIf convergence has failed, this will contain an error object describing the failure. See "Errors".
-
mask=> piddleIf the
$save_maskoption is true, this will be the final inclusion mask. -
weight=> piddleIf the
$save_weightoption is true, this will be the final weights.
The value of the center attribute in the last iteration will be
undefined if all of the elements have been clipped.
Iteration Object
The results for each iteration are stored in an object with the following attributes/methods (in addition to those added by the callbacks).
-
center=> piddle|undefA 1D piddle containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
-
iter=> integerThe iteration index. An index of
0indicates the values determined before the iterative loop was entered, and reflects the initial clipping and mask exclusion. -
nelem=> integerThe number of data elements used in the center.
-
total_weight=> numberThe combined weight of the data elements used to determine the center.
Iteration Steps
Before the first iteration:
- Extract an initial center from
center. - Create a new iteration object.
- Call
initialize. - Call
log
For each iteration:
- Creat a new iteration object by copying the old one.
- Call
calc_wmask, with a copy of the initial mask and weights.calc_maskshould update (in place) at least one of them - Update summed weight and number of elements if
calc_wmasksets them toundef. - Call
calc_centerwith the current mask and weights. - Call
is_convergedwith the current mask and weights. - Call
log - Goto step 1 if not converged and iteration limit has not been reached.
TYPES
In the description of the subroutines, the following types are specified:
-
Center
This accepts a non-null, non-empty 1D piddle, or anything that can be converted into one (for example, a scalar, a scalar piddle, or an array of numbers );
-
CodeRef
A code reference.
-
PositiveNum
A positive real number.
-
PositiveInt
A positive integer.
-
Coords
This accepts a non-null, non-empty 2D piddle, or anything that can be converted or up-converted to it.
-
Piddle_min1D_ne
This accepts a non-null, non-empty piddle with a minimum of 1 dimension.
-
Piddle1D_ne
This accepts a non-null, non-empty 1D piddle.
ERRORS
Errors are represented as objects in the following classes:
-
Parameter Validation
These are unconditionally thrown as PDL::Algorithm::Center::Failure::parameter objects.
-
Iteration
These are stored in the result object's
errorattribute.PDL::Algorithm::Center::Failure::iteration::limit_reached PDL::Algorithm::Center::Failure::iteration::empty
The objects stringify to a failure message.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=PDL-Algorithm-Center
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
AUTHOR
Diab Jerius djerius@cpan.org
COPYRIGHT AND LICENSE
This software is Copyright (c) 2017 by Smithsonian Astrophysical Observatory.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007