NAME

Dancer::SearchApp::HTMLSnippet - HTML snippet extractor

SYNOPSIS

my @document_snippets = Dancer::SearchApp::HTMLSnippet->extract_highlights(
    html => $html,
    hl_tag => '<em>',
    hl_end => '</em>',
    snippet_length => 150,
    max_snippets => 8,
);

METHODS

`Dancer::SearchApp::HTMLSnippet->extract_highlights`

my @document_snippets = Dancer::SearchApp::HTMLSnippet->extract_highlights(
    html => $html,
    hl_tag => '<em>',
    hl_end => '</em>',
    snippet_length => 150,
    max_snippets => 8,
);

This extract the highlight snippets and metadata from the HTML as prepared by Tika and highlightedd by Elasticsearch. It returns a list of hash references, each containing a (well-formed) HTML snippet containing the highlights and a page entry noting the original page number if the snippet originated from within a <p class="page\d+"> section (or crosses that)

{
    html => 'this is a <b>result</b> you searched for',
    page => 42,
}

`Dancer::SearchApp::HTMLSnippet->extract_highlights`

my @hits = Dancer::SearchApp::HTMLSnippet->extract_highlights(
    html => $html,
    max_length => 300,
);

for my $entry (@hits) {
  print "Match: $entry->{start} ($entry->{length} bytes)\n";
};

`Dancer::SearchApp::HTMLSnippet->cleanup_tika`

my $content = Dancer::SearchApp::HTMLSnippet->cleanup_tika( $html );

Cleans up HTML output from Apache Tika.

BUG TRACKER

Please report bugs in this module via the RT CPAN bug queue at https://rt.cpan.org/Public/Dist/Display.html?Name=Dancer-SearchApp or via mail to dancer-searchapp-Bugs@rt.cpan.org.

AUTHOR

Max Maischein corion@cpan.org

COPYRIGHT (c)

LICENSE

This module is released under the same terms as Perl itself.

To install Dancer::SearchApp, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Dancer::SearchApp

CPAN shell

perl -MCPAN -e shell
install Dancer::SearchApp

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

METHODS

Dancer::SearchApp::HTMLSnippet->extract_highlights

Dancer::SearchApp::HTMLSnippet->extract_highlights

Dancer::SearchApp::HTMLSnippet->cleanup_tika

BUG TRACKER

AUTHOR

COPYRIGHT (c)

LICENSE

Module Install Instructions

`Dancer::SearchApp::HTMLSnippet->extract_highlights`

`Dancer::SearchApp::HTMLSnippet->extract_highlights`

`Dancer::SearchApp::HTMLSnippet->cleanup_tika`