NAME
Image::Leptonica::Func::recogtrain
VERSION
version 0.03
recogtrain.c
recogtrain.c
Training on labelled data
l_int32 recogTrainLabelled()
l_int32 recogProcessMultLabelled()
PIX *recogProcessSingleLabelled()
l_int32 recogAddSamples()
PIX *recogScaleCharacter()
l_int32 recogAverageSamples()
l_int32 pixaAccumulateSamples()
l_int32 recogTrainingFinished()
l_int32 recogRemoveOutliers()
Evaluate training status
l_int32 recogaTrainingDone()
l_int32 recogaFinishAveraging()
Training on unlabelled data
l_int32 recogTrainUnlabelled()
Padding the training set
l_int32 recogPadTrainingSet()
l_int32 *recogMapIndexToIndex()
static l_int32 recogAverageClassGeom()
l_int32 recogaBestCorrelForPadding()
l_int32 recogCorrelAverages()
l_int32 recogSetPadParams()
static l_int32 recogGetCharsetSize()
static l_int32 recogCharsetAvailable()
Debugging
l_int32 recogaShowContent()
l_int32 recogShowContent()
l_int32 recogDebugAverages()
l_int32 recogShowAverageTemplates()
PIX *recogShowMatchesInRange()
PIX *recogShowMatch()
l_int32 recogMakeBmf()
Static helpers
static char *l_charToString()
static void addDebugImage1()
static void addDebugImage2()
FUNCTIONS
pixaAccumulateSamples
l_int32 pixaAccumulateSamples ( PIXA *pixa, PTA *pta, PIX **ppixd, l_float32 *px, l_float32 *py )
pixaAccumulateSamples()
Input: pixa (of samples from the same class, 1 bpp)
pta (<optional> of centroids of the samples)
&ppixd (<return> accumulated samples, 8 bpp)
&px (<optional return> average x coordinate of centroids)
&py (<optional return> average y coordinate of centroids)
Return: 0 on success, 1 on failure
Notes:
(1) This generates an aligned (by centroid) sum of the input pix.
(2) We use only the first 256 samples; that's plenty.
(3) If pta is not input, we generate two tables, and discard
after use. If this is called many times, it is better
to precompute the pta.
recogAddSamples
l_int32 recogAddSamples ( L_RECOG *recog, PIXA *pixa, l_int32 classindex, l_int32 debug )
recogAddSamples()
Input: recog
pixa (1 or more characters)
classindex (use -1 if not forcing into a specified class)
debug
Return: 0 if OK, 1 on error
Notes:
(1) The pix in the pixa are all 1 bpp, and the character string
labels are embedded in the pix.
(2) Note: this function decides what class each pix belongs in.
When input is from a multifont pixaa, with a valid value
for @classindex, the character string label in each pix
is ignored, and @classindex is used as the class index
for all the pix in the pixa. Thus, for that situation we
use this class index to avoid making the decision through a
lookup based on the character strings embedded in the pix.
(3) When a recog is initially filled with samples, the pixaa_u
array is initialized to accept up to 256 different classes.
When training is finished, the arrays are truncated to the
actual number of classes. To pad an existing recog from
the boot recognizers, training is started again; if samples
from a new class are added, the pixaa_u array must be
extended by adding a pixa to hold them.
recogAverageSamples
l_int32 recogAverageSamples ( L_RECOG *recog, l_int32 debug )
recogAverageSamples()
Input: recog
debug
Return: 0 on success, 1 on failure
Notes:
(1) This is called when training is finished, and after
outliers have been removed.
Both unscaled and scaled inputs are averaged.
Averages must be computed before any identification is done.
(2) Set debug = 1 to view the resulting templates
and their centroids.
recogBestCorrelForPadding
l_int32 recogBestCorrelForPadding ( L_RECOG *recog, L_RECOGA *recoga, NUMA **pnaset, NUMA **pnaindex, NUMA **pnascore, NUMA **pnasum, PIXA **ppixadb )
recogBestCorrelForPadding()
Input: recog (typically the recog to be padded)
recoga (array of recogs for potentially providing the padding)
&naset (<return> of indices into the sets to be matched)
&naindex (<return> of matching indices into the best set)
&nascore (<return> of best correlation scores)
&naave (<return> average of correlation scores from each recog)
&pixadb (<optional return> debug images; use NULL for no debug)
Return: 0 if OK, 1 on error
Notes:
(1) This finds, for each class in recog, the best matching template
in the recoga. For that best match, it returns:
* the recog set index in the recoga,
* the index in that recog for the class,
* the score for the best match
(2) It also returns in @naave, for each recog in recoga, the
average overall correlation for all averaged templates to
those in the input recog. The recog with the largest average
can supply templates in cases where the input recog has
no examples.
(3) For classes in recog1 for which no corresponding class
is found in any recog in recoga, the index -1 is stored
in both naset and naindex, and 0.0 is stored in nascore.
(4) Both recog and all the recog in recoga should be generated
with isotropic scaling to the same character height (e.g., 30).
recogCorrelAverages
l_int32 recogCorrelAverages ( L_RECOG *recog1, L_RECOG *recog2, NUMA **pnaindex, NUMA **pnascore, PIXA **ppixadb )
recogCorrelAverages()
Input: recog1 (typically the recog to be padded)
recog2 (potentially providing the padding)
&naindex (<return> of classes in 2 with respect to classes in 1)
&nascore (<return> correlation scores of corresponding classes)
&pixadb (<optional return> debug images)
Return: 0 if OK, 1 on error
Notes:
(1) Use this for potentially padding recog1 with instances in recog2.
The recog have been generated with isotropic scaling to the
same fixed height (e.g., 30). The training has been "finished"
in the sense that all arrays have been computed and they
could potentially be used as they are. This is necessary
for doing the correlation between scaled images.
However, this function is called when there is a request to
augument some of the examples in classes in recog1.
(2) Iterate over classes in recog1, finding the corresponding
class in recog2 and computing the correlation score between
the average templates of the two. naindex is a LUT between
the index of a class in recog1 and the corresponding one in recog2.
(3) For classes in recog1 that do not exist in recog2, the index
-1 is stored in naindex, and 0.0 is stored in the score.
recogDebugAverages
l_int32 recogDebugAverages ( L_RECOG *recog, l_int32 debug )
recogDebugAverages()
Input: recog
debug (0 no output; 1 for images; 2 for text; 3 for both)
Return: 0 if OK, 1 on error
Notes:
(1) Generates an image that pairs each of the input images used
in training with the average template that it is best
correlated to. This is written into the recog.
(2) It also generates pixa_tr of all the input training images,
which can be used, e.g., in recogShowMatchesInRange().
recogMakeBmf
l_int32 recogMakeBmf ( L_RECOG *recog, const char *fontdir, l_int32 size )
recogMakeBmf()
Input: recog
fontdir (for bitmap fonts; typically "fonts")
size (of font; even integer between 4 and 20; default is 6)
Return: 0 if OK, 1 on error
Notes:
(1) This can be used to (re)set the size of the font used for
debug labelling.
recogPadTrainingSet
l_int32 recogPadTrainingSet ( L_RECOG **precog, l_int32 debug )
recogPadTrainingSet()
Input: &recog (to be replaced if padding or more drastic measures
are necessary; otherwise, it is unchanged.)
debug (1 for debug output saved to recog; 0 otherwise)
Return: 0 if OK, 1 on error
Notes:
(1) Before calling this, call recogSetPadParams() if you want
non-default values for the character set type, min_nopad
and max_afterpad values, and paths for labelled bitmap
character sets that can be used to augment an input recognizer.
(2) If all classes in @recog have at least min_nopad samples,
nothing is done. If the total number of samples in @recog
is very small, @recog is replaced by a boot recog from the
specified bootpath. Otherwise (the intermediate case),
@recog is replaced by one with scaling to fixed height,
where an array of recog are used to augment the input recog.
(3) If padding or total replacement is done, this destroys
the input recog and replaces it by a new one. If the recog
belongs to a recoga, the replacement is also done in the recoga.
recogProcessMultLabelled
l_int32 recogProcessMultLabelled ( L_RECOG *recog, PIX *pixs, BOX *box, char *text, PIXA **ppixa, l_int32 debug )
recogProcessMultLabelled()
Input: recog (in training mode)
pixs (if depth > 1, will be thresholded to 1 bpp)
box (<optional> cropping box)
text (<optional> if null, use text field in pix)
&pixa (<return> of split and thresholded characters)
debug (1 to display images of samples not captured)
Return: 0 if OK, 1 on error
Notes:
(1) This crops and segments one or more labelled and contiguous
ascii characters, for input in training. It is a special case.
(2) The character images are bundled into a pixa with the
character text data embedded in each pix.
(3) Where there is more than one character, this does some
noise reduction and extracts the resulting character images
from left to right. No scaling is performed.
recogProcessSingleLabelled
l_int32 recogProcessSingleLabelled ( L_RECOG *recog, PIX *pixs, BOX *box, char *text, PIXA **ppixa )
recogProcessSingleLabelled()
Input: recog (in training mode)
pixs (if depth > 1, will be thresholded to 1 bpp)
box (<optional> cropping box)
text (<optional> if null, use text field in pix)
&pixa (one pix, 1 bpp, labelled)
Return: 0 if OK, 1 on error
Notes:
(1) This crops and binarizes the input image, generating a pix
of one character where the charval is inserted into the pix.
recogRemoveOutliers
l_int32 recogRemoveOutliers ( L_RECOG *recog, l_float32 targetscore, l_float32 minfract, l_int32 debug )
recogRemoveOutliers()
Input: recog (after training samples are entered)
targetscore (keep everything with at least this score)
minfract (minimum fraction to retain)
debug (1 for debug output)
Return: 0 if OK, 1 on error
Notes:
(1) Removing outliers is particularly important when recognition
goes against all the samples in the training set, as opposed
to the averages for each class. The reason is that we get
an identification error if a mislabeled sample is a best
match for an input bitmap.
(2) However, the score values depend strongly on the quality
of the character images. To avoid losing too many samples,
we supplement a target score for retention with a minimum
fraction that we must keep. With poor quality images, we
may keep samples with a score less than the targetscore,
in order to satisfy the @minfract requirement.
(3) We always require that at least one sample will be retained.
(4) Where the training set is from the same source (e.g., the
same book), use a relatively large minscore; say, ~0.8.
(5) Method: for each class, generate the averages and match each
scaled sample against the average. Decide which
samples will be ejected, and throw out both the
scaled and unscaled samples and associated data.
Recompute the average without the poor matches.
recogScaleCharacter
PIX * recogScaleCharacter ( L_RECOG *recog, PIX *pixs )
recogScaleCharacter()
Input: recog
pixs (1 bpp, to be scaled)
Return: pixd (scaled) if OK, null on error
recogSetPadParams
l_int32 recogSetPadParams ( L_RECOG *recog, const char *bootdir, const char *bootpattern, const char *bootpath, l_int32 type, l_int32 min_nopad, l_int32 max_afterpad )
recogSetPadParams()
Input: recog (to be padded, if necessary)
bootdir (<optional> directory to bootstrap labelled pixa)
bootpattern (<optional> pattern for bootstrap labelled pixa)
bootpath (<optional> path to single bootstrap labelled pixa)
type (character set type; -1 for default; see enum in recog.h)
size (character set size; -1 for default)
min_nopad (min number in a class without padding; -1 default)
max_afterpad (max number of samples in padded classes;
-1 for default)
Return: 0 if OK, 1 on error
Notes:
(1) This is used to augment or replace a book-adapted recognizer (BAR).
It is called when the recognizer is created, and must be
called again before recogPadTrainingSet() if non-default
values are to be used.
(2) Default values allow for some padding. To disable padding,
set @min_nopad = 0.
(3) Constraint on @min_nopad and @max_afterpad guarantees that
padding will be allowed if requested.
(4) The file directory (@bootdir) and tail pattern (@bootpattern)
are used to identify serialized pixa, from which we can
generate an array of recog. These can be used to augment
an input but incomplete BAR (book adapted recognizer).
(5) If the BAR is very sparse, we will ignore it and use a generic
bootstrap recognizer at @bootpath.
recogShowAverageTemplates
l_int32 recogShowAverageTemplates ( L_RECOG *recog )
recogShowAverageTemplates()
Input: recog
Return: 0 on success, 1 on failure
Notes:
(1) This debug routine generates a display of the averaged templates,
both scaled and unscaled, with the centroid visible in red.
recogShowContent
l_int32 recogShowContent ( FILE *fp, L_RECOG *recog, l_int32 display )
recogShowContent()
Input: stream
recog
display (1 for showing template images, 0 otherwise)
Return: 0 if OK, 1 on error
recogShowMatch
PIX * recogShowMatch ( L_RECOG *recog, PIX *pix1, PIX *pix2, BOX *box, l_int32 index, l_float32 score )
recogShowMatch()
Input: recog
pix1 (input pix; several possibilities)
pix2 (<optional> matching template)
box (<optional> region in pix1 for which pix2 matches)
index (index of matching template; use -1 to disable printing)
score (score of match)
Return: pixd (pair of images, showing input pix and best template),
or null on error.
Notes:
(1) pix1 can be one of these:
(a) The input pix alone, which can be either a single character
(box == NULL) or several characters that need to be
segmented. If more than character is present, the box
region is displayed with an outline.
(b) Both the input pix and the matching template. In this case,
pix2 and box will both be null.
(2) If the bmf has been made (by a call to recogMakeBmf())
and the index >= 0, the index and score will be rendered;
otherwise their values will be ignored.
recogShowMatchesInRange
l_int32 recogShowMatchesInRange ( L_RECOG *recog, PIXA *pixa, l_float32 minscore, l_float32 maxscore, l_int32 display )
recogShowMatchesInRange()
Input: recog
pixa (of 1 bpp images to match)
minscore, maxscore (range to include output)
display (to display the result)
Return: 0 if OK, 1 on error
Notes:
(1) This gives a visual output of the best matches for a given
range of scores. Each pair of images can optionally be
labelled with the index of the best match and the correlation.
If the bmf has been previously made, it will be used here.
(2) To use this, save a set of 1 bpp images (labelled or
unlabelled) that can be given to a recognizer in a pixa.
Then call this function with the pixa and parameters
to filter a range of score.
recogTrainLabelled
l_int32 recogTrainLabelled ( L_RECOG *recog, PIX *pixs, BOX *box, char *text, l_int32 multflag, l_int32 debug )
recogTrainLabelled()
Input: recog (in training mode)
pixs (if depth > 1, will be thresholded to 1 bpp)
box (<optional> cropping box)
text (<optional> if null, use text field in pix)
multflag (1 if one or more contiguous ascii characters;
0 for a single arbitrary character)
debug (1 to display images of samples not captured)
Return: 0 if OK, 1 on error
Notes:
(1) Training is restricted to the addition of either:
(a) multflag == 0: a single character in an arbitrary
(e.g., UTF8) charset
(b) multflag == 1: one or more ascii characters rendered
contiguously in pixs
(2) If box != null, it should represent the cropped location of
the character image.
(3) If multflag == 1, samples will be rejected if the number of
connected components does not equal to the number of ascii
characters in the textstring. In that case, if debug == 1,
the rejected samples will be displayed.
recogTrainUnlabelled
l_int32 recogTrainUnlabelled ( L_RECOG *recog, L_RECOG *recogboot, PIX *pixs, BOX *box, l_int32 singlechar, l_float32 minscore, l_int32 debug )
recogTrainUnlabelled()
Input: recog (in training mode: the input characters in pixs are
inserted after labelling)
recogboot (labels the input)
pixs (if depth > 1, will be thresholded to 1 bpp)
box (<optional> cropping box)
singlechar (1 if pixs is a single character; 0 otherwise)
minscore (min score for accepting the example; e.g., 0.75)
debug (1 for debug output saved to recog; 0 otherwise)
Return: 0 if OK, 1 on error
Notes:
(1) This trains on unlabelled data, using a bootstrap recognizer
to apply the labels. In this way, we can build a recognizer
using a source of unlabelled data.
(2) The input pix can have several (non-touching) characters.
If box != NULL, we treat the region in the box as a single char
If box == NULL, use all of pixs:
if singlechar == 0, we identify each c.c. as a single character
if singlechar == 1, we treat pixs as a single character
Multiple chars are identified separately by recogboot and
inserted into recog.
(3) recogboot is a trained recognizer. It would typically be
constructed from a variety of sources, and use the average
templates for scoring.
(4) For debugging, if bmf is defined in the recog, the correlation
scores are generated and saved (by adding to the pixadb_boot
field) with the matching images.
recogTrainingFinished
l_int32 recogTrainingFinished ( L_RECOG *recog, l_int32 debug )
recogTrainingFinished()
Input: recog
debug
Return: 0 if OK, 1 on error
Notes:
(1) This must be called after all training samples have been added.
(2) Set debug = 1 to view the resulting templates
and their centroids.
(3) The following things are done here:
(a) Allocate (or reallocate) storage for (possibly) scaled
bitmaps, centroids, and fg areas.
(b) Generate the (possibly) scaled bitmaps.
(c) Compute centroid and fg area data for both unscaled and
scaled bitmaps.
(d) Compute the averages for both scaled and unscaled bitmaps
(e) Truncate the pixaa, ptaa and numaa arrays down from
256 to the actual size.
(4) Putting these operations here makes it simple to recompute
the recog with different scaling on the bitmaps.
(5) Removal of outliers must happen after this is called.
recogaFinishAveraging
l_int32 recogaFinishAveraging ( L_RECOGA *recoga )
recogaFinishAveraging()
Input: recoga
Return: 0 if OK, 1 on error
recogaShowContent
l_int32 recogaShowContent ( FILE *fp, L_RECOGA *recoga, l_int32 display )
recogaShowContent()
Input: stream
recoga
display (1 for showing template images, 0 otherwise)
Return: 0 if OK, 1 on error
recogaTrainingDone
l_int32 recogaTrainingDone ( L_RECOGA *recoga, l_int32 *pdone )
recogaTrainingDone()
Input: recoga
&done (1 if training finished on all recog; 0 otherwise)
Return: 0 if OK, 1 on error
AUTHOR
Zakariyya Mughal <zmughal@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Zakariyya Mughal.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.