Changes for version 0.30_11 - 2010-08-19

  • New features:
    • KinoSearch::Search::QueryParser now supports escaping double quotes.
  • Changed:
    • KinoSearch::Plan::FieldType o Removed set_boost(), set_indexed(), set_store(). o Added sortable().
  • Moved, but compatibility stubs retained:
    • KinoSearch::Search::Similarity -> KinoSearch::Index::Similarity
    • KSx::Search::LongFieldSim -> KSx::Index::LongFieldSim
  • Bugfixes:
    • Sorting problem with complex sort specs on x86.
    • Improved segment recycling to avoid pathological state on large indexes.
    • Add missing compatibility stub for KinoSearch::Doc (now KinoSearch::Document::Doc.)
    • Various esoteric bugs in QueryParser.

Changes for version 0.30_101 - 2010-05-01

  • Bugfixes:
    • r6044, r6046-6048: fixes to Build process for compiler values other than 'cc'.
    • r6078: fix BitCollector bug, where segment offset was ignored.
    • r6079-6081: Fix problem with corruption of the Perl stack by Host_callback().

Changes for version 0.30_10 - 2010-03-29

  • Bugfixes:
    • Endian portability issue solved for index sort caches.
    • Security issue solved for untrusted indexes (potential file deletion, see commit r5970).
  • Compatibility
    • Index version bumped.`

Changes for version 0.30_09 - 2010-03-26

  • New public classes:
    • KSx::Search::ProximityQuery
  • New documentation:
    • KinoSearch - new backwards compatibility policy.
    • KinoSearch::Docs::DevGuide
  • Improvements:
    • Lower memory consumption while indexing sortable fields.
    • Lower address space requirements for very large indexes with sortable fields.
    • Improved error reporting for incorrectly implemented subclasses.
  • Moved, but compatibility stubs retained:
    • KinoSearch::Schema -> KinoSearch::Plan::Schema
    • KinoSearch::Indexer -> KinoSearch::Index::Indexer
    • KinoSearch::Searcher -> KinoSearch::Search::IndexSearcher
    • KinoSearch::FieldType -> KinoSearch::Plan::FieldType
    • KinoSearch::FieldType::BlobType -> KinoSearch::Plan::BlobType
    • KinoSearch::FieldType::FullTextType -> KinoSearch::Plan::FullTextType
    • KinoSearch::FieldType::StringType -> KinoSearch::Plan::StringType
    • KinoSearch::QueryParser -> KinoSearch::Search::QueryParser
    • KinoSearch::Doc -> KinoSearch::Document::Doc
    • KinoSearch::Doc::HitDoc -> KinoSearch::Document::HitDoc
    • KinoSearch::Search::Searchable -> KinoSearch::Search::Searcher
    • KinoSearch::Search::HitCollector -> KinoSearch::Search::Collector
    • KinoSearch::Search::HitCollector::BitCollector -> KinoSearch::Search::Collector::BitCollector
  • Public API Changes:
    • KinoSearch::Highlight::Highlighter o new() - param "searchable" replaced by "searcher". o get_searchable() - replaced by get_searcher().
    • KinoSearch::Search::PolySearcher o new() - param "searchables" replaced by "searchers".
    • KinoSearch::Search::Query o make_compiler() - parameter "searchable" replaced by "searcher".
    • KinoSearch::Search::Compiler o new() - parameter "searchable" replaced by "searcher". o highlight_spans() - parameter "searchable" replaced by "searcher".

Changes for version 0.30_083 - 2010-03-03

  • Bugfixes:
    • r5835: Add missing NULL-termination to Charmonizer strdup() clone.
    • r5883, r5884: fix missing get_analyzers() method in PolyAnalyzer.

Changes for version 0.30_082 - 2010-01-30

  • Bugfixes:
    • Improve compatibility with Perl 5.11.x.

Changes for version 0.30_081 - 2010-01-29

  • Bugfixes:
    • Improve compatibility with OS X 10.4 Tiger.
  • Miscellaneous:
    • Make build process less verbose.

Changes for version 0.30_08 - 2010-01-28

  • Improvements:
    • FullTextType fields may now be sortable.
    • QueryParser's parse() method now invokes tree(), expand(), and prune() as methods, so that overriding one of them in a subclass affects parse().
  • Bugfixes:
    • Search speed improvement for AND conjunctions: the "skipping" optimization for posting lists has been fixed and re-enabled.
    • Fixed a problem where a stale, invalid "write.lock.temp" written after disk fillup could block subsequent indexing.
    • Permission problems with subfolders in indexes now trigger exceptions rather than segfaults.
  • Moved, but compatibility stubs retained:
    • KinoSearch::Architecture -> KinoSearch::Plan::Architecture
    • KinoSearch::Obj -> KinoSearch::Object::Obj
    • KinoSearch::Obj::BitVector -> KinoSearch::Object::BitVector
  • Moved:
    • KinoSearch::Index::PostingsReader -> KinoSearch::Index::PostingListReader
  • Classes with API Changes:
    • KinoSearch::Index::IndexManager o new() - "hostname" param replaced by "host". o get_hostname() - replaced by get_host().
    • KinoSearch::Store::LockFactory o new() - "hostname" param replaced by "host".
    • KinoSearch::Store::Lock o new() - "hostname" param replaced by "host".

Changes for version 0.30_072 - 2009-12-23

  • Bugfixes:
    • Update XS binding code for improved compatibility with Perl 5.11.x.

Changes for version 0.30_071 - 2009-12-16

  • Bugfixes:
    • Fix an intermittent problem with lost deletions.
    • Handle UTF-8 hash keys properly within JSON writing code, fixing serialization of non-English stoplists.
    • Fix a build-time memory error that could cause some platforms (e.g. FreeBSD 7.x) to abort the build.
    • Update Tokenizer for compatibility with Perl 5.11.
    • Improve compatibility with unrecognized compiler platforms.
    • Fix a theoretical bug with truncated offsets for index files > 2 GB.
    • Reword and de-glitch the FastUpdates cookbook entry.

Changes for version 0.30_07 - 2009-08-30

  • Bugfixes:
    • Revisit the bug in IndexManager's recycle(). The 0.30_06 fix attempt had made it manifest less often, but had not completely eliminated it.
    • Throw an error in Indexer if recycle() returns duplicated segments.

Changes for version 0.30_06 - 2009-08-17

  • Bugfixes:
    • Solve a fencepost error in IndexManager's recycle() method which could cause document duplication and lost deletions.

Changes for version 0.30_05 - 2009-08-06

  • Features:
    • Support for near-real-time indexing.
  • New public classes:
    • KinoSearch::Index::IndexManager
    • KinoSearch::Index::BackgroundMerger
    • KinoSearch::Index::DeletionsWriter
    • KinoSearch::Obj::Err
    • KinoSearch::Store::LockErr
  • New documentation:
    • KinoSearch::Docs::Cookbook::FastUpdates
  • API changes:
    • KinoSearch::Indexer o new() - param "lock_factory" replaced by param "manager"
    • KinoSearch::Index::IndexReader o open() - param "lock_factory" replaced by param "manager"
    • KinoSearch::Index::SegReader o get_seg_num() - added. o get_seg_name() - added.
    • KinoSearch::Highlight::Highlighter o Three dots replaced by Unicode ellipsis.
    • KinoSearch::Store::Lock o Now an abstract class. o new()
      • Now an abstract constructor.
      • param "agent_id" renamed to "hostname".
      • o get_agent_id() - replaced by get_hostname(). o request() - added. o shared() - added.
    • KinoSearch::Store::LockFactory o new() - param "agent_id" renamed to "hostname" o make_shared_lock() - Now returns a Lock (instead of a SharedLock).
  • Redacted:
    • KinoSearch::Store::SharedLock
  • Moved:
    • KinoSearch::Util::BitVector -> KinoSearch::Object::BitVector. (Compatibility subclass left in place for now.)
  • Bugfixes:
    • Fields with empty strings could produce corrupt Lexicons.
    • Segment data files (cf.dat) over 2 GB could cause search-time crashes.
  • Compatibility:
    • File-format compatible with 0.30_04.

Changes for version 0.30_04 - 2009-07-05

  • Bugfixes:
    • Stemmer had been malfunctioning, producing incorrect stems in some cases and bailing out with "invalid UTF-8" errors in others. Bug found and solution proposed by Nick Wellnhofer.
  • Features:
    • Memory mapping implemented for Windows, so now that platform gets fast Searcher opens and minimized search-time process memory footprint too.
  • Compatibility:
    • Indexes that utilize Stemmer must be regenerated.

Changes for version 0.30_03 - 2009-07-03

  • Bugfixes:
    • Fix a problem in SortCollector that led to ranking errors for some documents.
    • Eliminate a symbol conflict with MSVC.

Changes for version 0.30_02 - 2009-06-29

  • API Changes:
    • KinoSearch::Indexer o new() - "schema" argument now required only at index creation.
    • KinoSearch::QueryParser::QueryParser - ancient compatibility stub redacted, use KinoSearch::QueryParser instead.
  • Bugfixes:
    • Various C compatibility tweaks for Solaris, PowerPC Linux, etc.
  • Compatibility:
    • Index version bumped.

Changes for version 0.30_01 - 2009-06-18

  • Highlights:
    • Many new classes and methods.
    • Improved Searcher open times and decreased process memory footprint.
    • Improved sorting support.
    • Improved subclassing support.
    • Improved indexing speed.
    • Schemas serialized and stored with indexes.
    • Improved pluggability.
    • Expanded tutorial documentation.
    • Restored Windows compatibility.
  • New public classes:
    • KinoSearch::Architecture
    • KinoSearch::Doc
    • KinoSearch::Doc::HitDoc
    • KinoSearch::Indexer (replaces InvIndexer)
    • KinoSearch::FieldType (replaces FieldSpec)
    • KinoSearch::FieldType::BlobField
    • KinoSearch::FieldType::FullTextField (replaces FieldSpec::text)
    • KinoSearch::FieldType::StringField
    • KinoSearch::Highlight::HeatMap
    • KinoSearch::Index::DataReader
    • KinoSearch::Index::DataWriter
    • KinoSearch::Index::DocReader
    • KinoSearch::Index::Lexicon
    • KinoSearch::Index::LexiconReader
    • KinoSearch::Index::PolyReader
    • KinoSearch::Index::PostingList
    • KinoSearch::Index::PostingsReader
    • KinoSearch::Index::Segment
    • KinoSearch::Index::SegReader
    • KinoSearch::Index::SegWriter
    • KinoSearch::Index::Snapshot
    • KinoSearch::Obj
    • KinoSearch::Search::ANDQuery
    • KinoSearch::Search::Compiler
    • KinoSearch::Search::HitCollector
    • KinoSearch::Search::HitCollector::BitCollector
    • KinoSearch::Search::LeafQuery
    • KinoSearch::Search::MatchAllQuery
    • KinoSearch::Search::Matcher
    • KinoSearch::Search::NoMatchQuery
    • KinoSearch::Search::NOTQuery
    • KinoSearch::Search::ORQuery
    • KinoSearch::Search::PolyQuery
    • KinoSearch::Search::RangeQuery (replaces RangeFilter)
    • KinoSearch::Search::RequiredOptionalQuery
    • KinoSearch::Search::SortRule (factored out of SortSpec)
    • KinoSearch::Search::Span
    • KinoSearch::Util::BitVector
    • KSx::Index::ByteBufDocReader
    • KSx::Index::ByteBufDocWriter
    • KSx::Index::ZlibDocReader
    • KSx::Index::ZlibDocWriter
    • KSx::Search::MockScorer
  • New/updated documentation:
    • KinoSearch::Docs::Tutorial::Simple (updated)
    • KinoSearch::Docs::Tutorial::BeyondSimple (updated)
    • KinoSearch::Docs::Tutorial::FieldType (new)
    • KinoSearch::Docs::Tutorial::Analysis (new)
    • KinoSearch::Docs::Tutorial::Highlighter (new)
    • KinoSearch::Docs::Tutorial::QueryObjects (new)
    • KinoSearch::Docs::Cookbook::CustomQuery (new)
    • KinoSearch::Docs::Cookbook::CustomQueryParser (new)
    • KinoSearch::Docs::DocIDs (new)
  • Removed/redacted/replaced:
    • KinoSearch::Analysis::Token - redacted pending API overhaul.
    • KinoSearch::Analysis::TokenBatch - redacted pending API overhaul.
    • KinoSearch::Docs::DevGuide - removed.
    • KinoSearch::FieldSpec - replaced by FieldType.
    • KinoSearch::FieldSpec::text - replaced by FullTextType and StringType.
    • KinoSearch::Highlight::Encoder - rolled into Highlighter.
    • KinoSearch::Highlight::Formatter - rolled into Highlighter.
    • KinoSearch::Highlight::SimpleHTMLEncoder - rolled into Highlighter.
    • KinoSearch::Highlight::SimpleHTMLFormatter - rolled into Highlighter.
    • KinoSearch::Index::Term - removed. Now any object can be a term.
    • KinoSearch::InvIndex - removed.
    • KinoSearch::InvIndexer - replaced by Indexer.
    • KinoSearch::Posting - redacted pending API overhaul.
    • KinoSearch::Posting::MatchPosting - redacted pending API overhaul.
    • KinoSearch::Posting::RichPosting - redacted pending API overhaul.
    • KinoSearch::Posting::ScorePosting - redacted pending API overhaul.
    • KinoSearch::Search::BooleanQuery - replaced by ANDQuery, ORQuery, NOTQuery, and RequiredOptionalQuery.
    • KinoSearch::Search::Filter - removed. Filtering can now be achieved via ANDQuery, NOTQuery, etc.
    • KinoSearch::Search::PolyFilter - removed.
    • KinoSearch::Search::QueryFilter - replaced by KSx::Search::Filter
    • KinoSearch::Search::RangeFilter - replaced by RangeQuery.
    • KinoSearch::Util::Class - removed.
    • KinoSearch::Util::ToolSet - permanently redacted.
  • Renamed:
    • KinoSearch::Analysis::LCNormalizer => KinoSearch::Analysis::CaseFolder
    • KinoSearch::Search::SearchServer => KSx::Remote::SearchServer
    • KinoSearch::Search::SearchClient => KSx::Remote::SearchClient
    • KinoSearch::Simple => KSx::Simple
    • KinoSearch::Search::MultiSearcher => KinoSearch::Search::PolySearcher
  • API Changes:
    • KinoSearch::Analysis::Analyzer o analyze_batch() - redacted pending API overhaul.
    • KinoSearch::Analysis::PolyAnalyzer o get_analyzers() - added.
    • KinoSearch::Analysis::Tokenizer o new() - parameter "token_re" replaced by "pattern".
    • KinoSearch::Highlight::Highlighter o Highlighter objects are now single-field. o Fields must now be marked as "highlightable" at index time via their FieldType. o Excerpts are now created manually rather than automatically inserted via the Hits class. o new() - now takes four params instead of none: "searchable", "field", "query", and "excerpt_length". o add_spec() - removed. o create_excerpt(), highlight(), encode(), set_pre_tag(), get_pre_tag(), set_post_tag(), get_post_tag(), get_searchable(), get_query(), get_compiler(), get_excerpt_length(), get_field - added.
    • KinoSearch::Index::IndexReader o open() - takes an "index" (string filepath or Folder object) instead of an "invindex", plus an optional "snapshot". Always returns a PolyReader (instead of an unspecified IndexReader subclass). o max_doc() - replaced by doc_max(), which has slightly different semantics since doc ids now start at 1 rather than 0. o num_docs() - renamed to doc_count(). o del_count(), seg_readers(), offsets(), fetch(), obtain() - added.
    • KinoSearch::Indexer (replaces KinoSearch::InvIndexer) o new() - parameters changed. Old: "invindex", "lock_factory". New: "schema", "index", "create", "truncate", "lock_factory". o add_doc() - now takes either a hash ref or a Doc object, and optionally takes labeled params. o finish() - refactored into commit(), prepare_commit(), and optimize(). o add_invindexes() - replaced by add_index(). o delete_by_term() - now takes labeled parameters rather than positional args. o delete_by_query() - added.
      • takes "index" (a string filepath or Folder object), "lock_factory", and
    • KinoSearch::QueryParser o tree(), expand(), expand_leaf(), prune(), make_term_query(), make_phrase_query(), make_and_query(), make_or_query(), make_not_query(), make_req_opt_query() - added.
    • KinoSearch::Schema o No longer an abstract class. o "%fields" hash eliminated. o Now gets serialized as JSON and stored with index. o clobber(), open(), read() - removed. o analyzer() - removed. o similarity() - removed. o pre_sort() - removed. o add_field() - replaced by spec_field(), which associates a field name with a FieldType object rather than a class name. o num_fields(), all_fields(), fetch_type(), fetch_sim(), architecture(), get_architecture(), get_similarity() - added.
    • KinoSearch::Search::Hits o fetch_hit_hashref() - replaced by next(), which return a HitDoc by default. o create_excerpts() - removed.
    • KinoSearch::Search::PhraseQuery o new() - now takes params "field" and "terms". o add_term() - removed. o get_field(), get_terms() - added.
    • KinoSearch::Search::PolySearcher (formerly MultiSearcher) o Now supports SortSpec.
    • KinoSearch::Search::Query o make_compiler() - added.
    • KinoSearch::Search::Searchable o search() - renamed to hits(). o new(), glean_query(), get_schema(), collect(), doc_max(), doc_freq(), fetch_doc() - added.
    • KinoSearch::Search::SortSpec o new() - takes new param "rules", an array of SortRules. o add() - removed.
    • KinoSearch::Search::TermQuery o new() - now takes "field", and "term" (which is a string rather than a Term object as before).
    • KinoSearch::Searcher o new() - now takes "index" (a string filepath, a Folder object, or an IndexReader object), rather than "invindex" or "reader". o search() - renamed to hits(). o set_prune_factor() - removed. o collect(), doc_max(), doc_freq(), fetch_doc(), get_schema() - added.
  • Subclassing improvements:
    • Although KinoSearch is now implemented almost entirely in C, pure-Perl dynamic subclassing is supported. (Public methods which are overridden in pure-Perl subclasses are automatically detected and invoked as callbacks by the the internal KS object engine.)
  • Significant internal changes:
    • All classes now implemented in C, with Perl and XS only where necessary.
    • Doc IDs now start at 1 rather than 0.

Changes for version 0.20_051 - 2008-01-20

  • Bug Fixes:
    • Occasionally incorrect search results fixed by disabling Skip_To optimization.

Changes for version 0.20_05 - 2007-10-27

  • API Changes:
    • KinoSearch::Search::Hits o seek() - Removed. (Patch by Nathan Kurz.)
    • KinoSearch::Schema::FieldSpec has become KinoSearch::FieldSpec::text. o The old class is retained for now as a compatibility alias.
    • KinoSearch::Schema o %fields hash now accepts 'text' as an alias for 'KinoSearch::FieldSpec::text'.
  • Significant Bug fixes:
    • Fix index-corrupting bug affecting deletions. Reported by Scott Beck.
    • Insecure temp file creation during test suite eliminated. Reported by Andreas Koenig as RT #28777.
    • Fix phrase matching failure due to underflow. Repeatable test scenario provided by Matthew O'Connor. Diagnosis and patch provided by Nathan Kurz.
    • RangeFilter now works with multi-segment indexes. Patch by Chris Nandor.
    • Occasional runaway memory usage curtailed.

Changes for version 0.20_04 - 2007-06-20

  • Highlights:
    • Several bug fixes.
  • New public classes:
    • KinoSearch::Simple.
  • Renamed:
    • KinoSearch::QueryParser::QueryParser => KinoSearch::QueryParser
  • API Changes:
    • KinoSearch::QueryParser o No longer recognizes 'field:term_text' construct by default. o set_heed_colons() - Added.
    • KinoSearch::InvIndex o create() - Removed. o read() - Added. o open() - Behavior changed -- now creates an index if none detected.
    • KinoSearch::Schema o create() - Removed. o read() - Added. o open() - Behavior changed -- now creates an index if none detected.
  • Credits:
    • Bug reports from Henry Combrinck, Chris Nandor, and Marco Barromeo.

Changes for version 0.20_03 - 2007-05-08

  • Highlights:
    • Combining filters now possible using PolyFilter.
    • Significantly improved indexing speed.
    • Better NFS compatibility using LockFactory.
  • New public classes:
    • KinoSearch::Index::IndexReader
    • KinoSearch::Posting
    • KinoSearch::Posting::ScorePosting
    • KinoSearch::Posting::RichPosting
    • KinoSearch::Search::PolyFilter
    • KinoSearch::Store::Lock
    • KinoSearch::Store::SharedLock
    • KinoSearch::Store::LockFactory
  • New/updated documentation:
    • KinoSearch::Docs::IRTheory
    • KinoSearch::Docs::FileFormat
  • Removed:
    • KinoSearch::Docs::NFS
  • Renamed:
    • KinoSearch::Contrib::LongFieldSim => KSx::Search::LongFieldSim
  • Classes with API changes:
    • KinoSearch::Schema o %FIELDS must now be spelled %fields (resolving conflict with Perl core pragmas base.pm and fields.pm). o pre_sort() - Added. (experimental)
    • KinoSearch::Schema::FieldSpec o store_pos_boost() - Removed. o posting_type() - Added. (experimental)
    • KinoSearch::Analysis::Analyzer o analyze() - Removed. o analyze_batch() - Added.
    • KinoSearch::Analysis::Stopalizer o Now removes stopwords rather than turning them to empty strings.
    • KinoSearch::InvIndex o get_folder() - Added. o get_schema() - Added.
    • KinoSearch::InvIndexer o new() - Parameters changed.
      • host_id - Removed.
      • lock_factory - Added.
    • KinoSearch::Highlight::Highlighter o new() - All arguments removed. o add_spec() - Added, making it possible to customize multiple excerpts.
    • KinoSearch::Highlight::SimpleHTMLEncoder o Now uses HTML::Entities::encode_entities, so more entities are affected.
    • KinoSearch::Searcher o get_reader() - Added. o set_prune_factor - Added. (experimental)
    • KinoSearch::Search::Hits o Now supports multiple highlighted excerpts per document. o Excerpts now use key of "excerpts" rather than "excerpt".
    • KinoSearch::Search::RangeFilter o Now supports "open ended searches": all above or all below a bound. o new() - Default values added.
  • Credits:
    • Chris Nandor was the driving force behind PolyFilter and Filter, contributing code, tests, bug reports and bug fixes.
    • Patches and failing test cases contributed by Edward Betts, Henry Combrinck, Simon Cozens, and Peter Karman.

Changes for version 0.20_02 - 2007-03-06

  • Rework Schema API. o Add instance method add_field(), facilitating dynamic schemas. o Remove init_fields(). o Require the declaration of a %FIELDS hash. o Change how field names are associated with FieldSpecs. o Update documentation throughout KinoSearch to reflect the new API.
  • Fix crashing bug in in TermListWriter/TermListReader isolated by Edward Betts.

Changes for version 0.20_01 - 2007-02-26

  • KinoSearch 0.20 is a major rewrite, adding many new features. It also breaks backwards compatibility in a number of ways.
  • Two key features, UTF-8 support and custom sorting, were not possible to implement while preserving backwards compatibility. Once the decision was made to proceed with them, breaking all existing installations, it made little sense to proceed by half measures, so the API has been given a significant overhaul.
  • KinoSearch has always carried an "alpha code" warning; it is being invoked for this release. While it will continue to carry the "alpha" warning for a short while longer, the point of jamming so many changes into one release is to cause disruption only once; once the code in 0.20 proves itself, hopefully no more backwards incompatible changes will be needed any time soon.
  • New behaviors:
    • KinoSearch now uses UTF-8 for all input and output, throughout the entire library. This affects many classes, but particularly those under Analysis, Highlight, and QueryParser.
    • The default scoring algorithm has changed subtly -- aggressive per-field boosting is no longer important or even desirable. The old behavior is available from KinoSearch::Contrib::LongFieldSim.
  • New public classes:
    • KinoSearch::Schema
    • KinoSearch::Schema::FieldSpec
    • KinoSearch::InvIndex
    • KinoSearch::Analysis::Token
    • KinoSearch::Search::RangeFilter
    • KinoSearch::Search::SortSpec
    • KinoSearch::Search::Similarity
    • KinoSearch::Contrib::LongFieldSim
  • New documentation:
    • KinoSearch::Docs::NFS
  • Removed classes:
    • KinoSearch::Document::Doc
    • KinoSearch::Document::Field
    • KinoSearch::Search::Hit
  • Renamed classes:
    • KinoSearch::Store::InvIndex => KinoSearch::Store::Folder
    • KinoSearch::Store::FSInvIndex => KinoSearch::Store::FSFolder
    • KinoSearch::Store::RAMInvIndex => KinoSearch::Store::RAMFolder
  • Updated documentation:
    • KinoSearch
    • KinoSearch::Docs::DevGuide
    • KinoSearch::Docs::FileFormat
    • KinoSearch::Docs::Tutorial
  • Classes with API changes:
    • KinoSearch::InvIndexer o new() - Args changed.
      • create - Removed.
      • analyzer - Removed.
      • lock_id - Added.
      • o spec_field() - Removed. o new_doc() - Removed. o add_doc() - Args changed.
        • Takes a hashref rather than a Doc object.
        • Accepts optional labeled param 'boost'.
      • o delete_docs_by_term() - Removed. o delete_by_term() - Added. (Behavior differs subtly from delete_docs_by_term()).
    • KinoSearch::Searcher o new() - args changed.
      • analyzer - Removed.
      • o search() - Now calls Hits->seek before returning Hits object. Args changed.
        • offset - Added.
        • num_wanted - Added.
        • sort_spec - Added.
    • KinoSearch::Search::Hits o Now comes pre-seeked, courtesy of changes to Searcher. o seek() - No longer triggers new number crunching if requested values can be accomodated using results of prior search. o fetch_hit() - Removed. o create_excerpts() - Now puts multiple excerpts under $hit->{excerpts} rather than one under $hit->{excerpt}.
    • KinoSearch::Search::MultiSearcher o new() - Args changed.
      • schema - Added.
      • analyzer - Removed.
    • KinoSearch::Highlight::Highlighter o new() - Args changed.
      • fields - Added.
      • excerpt_length - Now specified in characters rather than bytes.
      • excerpt_field - Removed.
      • pre_tag - Removed.
      • post_tag - Removed.
    • KinoSearch::QueryParser::QueryParser o new() - Args changed.
      • schema - Added.
      • default_field - Removed.
      • analyzer - No longer required -- now used to override schema.
    • KinoSearch::Analysis::TokenBatch o new() - Args changed.
      • text - Added.
      • o next() - Returns a Token instead of a boolean. o reset() - Added. o add_many_tokens() - Added. o set_text(), get_text(), set_start_offset(), get_start_offset(), set_end_offset(), get_end_offset(), set_pos_inc(), get_pos_inc - All removed.
  • Internal changes:
    • Large-scale refactoring has taken place. The most significant changes are...
    • OO framework imposed on C code via boilerplater.pl, with KinoSearch::Util::Obj as the base class.
    • Charmonizer added.
    • perlapi functions and data structures replaced whenever possible.
    • Lots of classes, especially under KinoSearch::Index, reorganized around Schema and SegInfo.
    • Many tests added, removed, or revised to accomodate changes in the main library code.
    • C code moved to dedicated files.
    • Build.PL custom code moved to buildlib/KinoSearchBuild.pm
  • File Format:
    • Significantly redesigned. The most visible change is that the segments file is now encoded using YAML rather than an arbitrary binary format.
    • Old indexes cannot be read and must be regenerated.
  • Locking
    • write.lock files now located in the index directory rather than under /tmp.
    • Commit locks are no longer needed due to file format changes.
    • Stale write locks are now removed without warning.

Documentation

dump the contents of an index
KinoSearch smoke test script
Sync Lucy to KinoSearch or vice versa
Query matching an ordered list of terms.
Search engine library.
Tokenize/modify/filter text.
Normalize case, facilitating case-insensitive search.
Multiple Analyzers in series.
Reduce related words to a shared root.
Suppress a "stoplist" of common words.
Split a string into tokens.
KinoSearch recipes.
Sample subclass of Query.
Sample subclass of QueryParser.
Near real-time index updates.
Quick-start guide to hacking on KinoSearch.
Characteristics of KinoSearch document ids.
Overview of index file format.
Manage indexes on shared volumes.
Crash course in information retrieval.
Step-by-step introduction to KinoSearch toolset.
How to choose and use Analyzers.
A more flexible app structure.
Specify per-field properties and behaviors.
Augment search results with highlighted excerpts.
Use Query objects instead of query strings.
Bare-bones search app.
A document read from an index.
Density of relevant data in a string.
Create and highlight excerpts.
Consolidate index segments in the background.
Abstract base class for reading index data.
Write data to an index.
Abstract base class for marking documents as deleted.
Retrieve stored documents.
Policies governing index updating, locking, and file deletion.
Read from an inverted index.
Build inverted indexes.
Iterator for a field's terms.
Multi-segment implementation of IndexReader.
Term-Document pairings.
Single-segment IndexReader.
Write one segment of an index.
Warehouse for information about one segment of an inverted index.
Judge how well a document matches a query.
Point-in-time index file list.
An array of bits.
Base class for all KinoSearch objects.
Configure major components of an index.
Default behaviors for binary fields.
Define a field's behavior.
Full-text search field type.
User-created specification for an inverted index.
Non-tokenized text type.
Intersect multiple result sets.
Collector which records doc nums in a BitVector.
Query-to-Matcher compiler.
Access search results.
Execute searches against a single index.
Leaf node in a tree created by QueryParser.
Query which matches all documents.
Match a set of document ids.
Invert the result set of another Query.
Query which matches no documents.
Union multiple result sets.
Query matching an ordered list of terms.
Base class for composite Query objects.
Aggregate results from multiple searchers.
A specification for a search query.
Transform a string into a Query object.
Match a range of values.
Join results for two Queries, one required, one optional.
Base class for searching collections of documents.
Element of a SortSpec.
Specify a custom sort order for search results.
An offset, a length, and a weight.
Query which matches individual terms.
File System implementation of Folder.
Abstract class representing a directory.
Abstract class representing an interprocess mutex lock.
Lock exception.
In-memory Folder implementation.

Modules

A small OO language that forms symbiotic relationships with "host" languages.
Generate core C code for a Clownfish::Hierarchy.
Generate core C code for a class.
Generate core C code for a Clownfish file.
Generate core C code for a function.
Generate core C code for a method.
Perl bindings for a Clownfish::Hierarchy.
Generate Perl binding code for a Clownfish::Class.
Binding for an object method.
Binding for an object method.
Abstract base binding for a Clownfish::Function.
Convert between Clownfish and Perl via XS.
A block of embedded C code.
An object representing a single class definition.
Formatted comment a la Doxygen.
Auto-generate code for "dumpable" classes.
Structured representation of the contents of a Clownfish source file.
Metadata describing a function.
A class hierarchy.
Metadata describing an instance method.
parameter list.
Collection of code.
Parse Clownfish header files.
Abstract base class for Clownfish symbols.
A variable's type.
An arbitrary type.
A composite type, e.g. Obj**.
A primitive Type representing a floating point number.
A primitive Type representing an integer.
An object Type.
Abstract base class for primitive types.
A Type to support C's va_list.
The void Type.
Miscellaneous helper functions.
A Clownfish variable.
Read a Doc as a fixed-width byte array.
Write a Doc as a fixed-width byte array.
Similarity optimized for long fields.
Compressed doc storage.
Compressed doc storage.
Connect to a remote SearchServer.
Make a Searcher remotely accessible.
Build a caching filter based on results of a Query.
Matcher with arbitrary docs and scores.
Basic search engine.
Replaced by CaseFolder.
Replaced by KinoSearch::Index::Indexer.
Renamed to KinoSearch::Search::QueryParser.
Simple query parser, with no boolean operators.

Provides

in lib/KSx/Search/Filter.pm
in lib/KSx/Search/Filter.pm

Examples