Revision history for KinoSearch

0.20_05 2007-10-27

  API Changes:

    * KinoSearch::Search::Hits 
      o seek() - Removed. (Patch by Nathan Kurz.)

    * KinoSearch::Schema::FieldSpec has become KinoSearch::FieldSpec::text.
      o The old class is retained for now as a compatibility alias.

    * KinoSearch::Schema
      o %fields hash now accepts 'text' as an alias for
        'KinoSearch::FieldSpec::text'.

  Significant Bug fixes:

    * Fix index-corrupting bug affecting deletions.  Reported by Scott Beck.
    * Insecure temp file creation during test suite eliminated. Reported by
      Andreas Koenig as RT #28777.
    * Fix phrase matching failure due to underflow.  Repeatable test scenario
      provided by Matthew O'Connor.  Diagnosis and patch provided by Nathan
      Kurz.
    * RangeFilter now works with multi-segment indexes. Patch by 
      Chris Nandor.
    * Occasional runaway memory usage curtailed.

0.20_04 2007-06-20

  Highlights: 

     * Several bug fixes.

  New public classes:

     * KinoSearch::Simple.

  Renamed:

    * KinoSearch::QueryParser::QueryParser => KinoSearch::QueryParser

  API Changes:

    * KinoSearch::QueryParser 
      o No longer recognizes 'field:term_text' construct by default.
      o set_heed_colons() - Added.
    * KinoSearch::InvIndex
      o create() - Removed.
      o read() - Added.
      o open() - Behavior changed -- now creates an index if none detected.
    * KinoSearch::Schema
      o create() - Removed.
      o read() - Added.
      o open() - Behavior changed -- now creates an index if none detected.

  Credits:

    * Bug reports from Henry Combrinck, Chris Nandor, and Marco Barromeo.


0.20_03 2007-05-08 

  Highlights:

    * Combining filters now possible using PolyFilter.
    * Significantly improved indexing speed.
    * Better NFS compatibility using LockFactory.

  New public classes:

    * KinoSearch::Index::IndexReader
    * KinoSearch::Posting
    * KinoSearch::Posting::ScorePosting
    * KinoSearch::Posting::RichPosting
    * KinoSearch::Search::PolyFilter
    * KinoSearch::Store::Lock
    * KinoSearch::Store::SharedLock
    * KinoSearch::Store::LockFactory

  New/updated documentation:

    * KinoSearch::Docs::IRTheory
    * KinoSearch::Docs::FileFormat

  Removed:

    * KinoSearch::Docs::NFS

  Renamed:
  
    * KinoSearch::Contrib::LongFieldSim => KSx::Search::LongFieldSim

  Classes with API changes:
  
    * KinoSearch::Schema
      o %FIELDS must now be spelled %fields (resolving conflict with Perl core
        pragmas base.pm and fields.pm).
      o pre_sort() - Added. (experimental)

    * KinoSearch::Schema::FieldSpec
      o store_pos_boost() - Removed.
      o posting_type() - Added. (experimental)

    * KinoSearch::Analysis::Analyzer
      o analyze() - Removed.
      o analyze_batch() - Added.

    * KinoSearch::Analysis::Stopalizer
      o Now removes stopwords rather than turning them to empty strings.
  
    * KinoSearch::InvIndex
      o get_folder() - Added.
      o get_schema() - Added.
  
    * KinoSearch::InvIndexer
      o new() - Parameters changed.
        * host_id - Removed.
        * lock_factory - Added.
  
    * KinoSearch::Highlight::Highlighter
      o new() - All arguments removed.
      o add_spec() - Added, making it possible to customize multiple excerpts.

    * KinoSearch::Highlight::SimpleHTMLEncoder
      o Now uses HTML::Entities::encode_entities, so more entities are
        affected.

    * KinoSearch::Searcher
      o get_reader() - Added.
      o set_prune_factor - Added. (experimental)

    * KinoSearch::Search::Hits
      o Now supports multiple highlighted excerpts per document.
      o Excerpts now use key of "excerpts" rather than "excerpt".

    * KinoSearch::Search::RangeFilter
      o Now supports "open ended searches": all above or all below a bound.
      o new() - Default values added.  

  Credits:

    * Chris Nandor was the driving force behind PolyFilter and Filter,
      contributing code, tests, bug reports and bug fixes.
    * Patches and failing test cases contributed by Edward Betts, Henry
      Combrinck, Simon Cozens, and Peter Karman.

0.20_02 2007-03-06
  * Rework Schema API.
    o Add instance method add_field(), facilitating dynamic schemas.
    o Remove init_fields().
    o Require the declaration of a %FIELDS hash.
    o Change how field names are associated with FieldSpecs.
    o Update documentation throughout KinoSearch to reflect the new API.
  * Fix crashing bug in in TermListWriter/TermListReader isolated by Edward 
    Betts.

0.20_01 2007-02-26
  
  KinoSearch 0.20 is a major rewrite, adding many new features.  It also
  breaks backwards compatibility in a number of ways.  
  
  Two key features, UTF-8 support and custom sorting, were not possible to
  implement while preserving backwards compatibility.  Once the decision was
  made to proceed with them, breaking all existing installations, it made
  little sense to proceed by half measures, so the API has been given a
  significant overhaul.

  KinoSearch has always carried an "alpha code" warning; it is being invoked
  for this release.  While it will continue to carry the "alpha" warning for
  a short while longer, the point of jamming so many changes into one release
  is to cause disruption only once; once the code in 0.20 proves itself,
  hopefully no more backwards incompatible changes will be needed any time
  soon.

  New behaviors:

    * KinoSearch now uses UTF-8 for all input and output, throughout the
      entire library.  This affects many classes, but particularly those under
      Analysis, Highlight, and QueryParser.
    * The default scoring algorithm has changed subtly -- aggressive 
      per-field boosting is no longer important or even desirable.  The old
      behavior is available from KinoSearch::Contrib::LongFieldSim.

  New public classes:

    * KinoSearch::Schema
    * KinoSearch::Schema::FieldSpec
    * KinoSearch::InvIndex
    * KinoSearch::Analysis::Token
    * KinoSearch::Search::RangeFilter
    * KinoSearch::Search::SortSpec
    * KinoSearch::Search::Similarity
    * KinoSearch::Contrib::LongFieldSim

  New documentation:

    * KinoSearch::Docs::NFS

  Removed classes:

    * KinoSearch::Document::Doc
    * KinoSearch::Document::Field
    * KinoSearch::Search::Hit

  Renamed classes:

    * KinoSearch::Store::InvIndex    => KinoSearch::Store::Folder
    * KinoSearch::Store::FSInvIndex  => KinoSearch::Store::FSFolder
    * KinoSearch::Store::RAMInvIndex => KinoSearch::Store::RAMFolder

  Updated documentation:

    * KinoSearch
    * KinoSearch::Docs::DevGuide
    * KinoSearch::Docs::FileFormat
    * KinoSearch::Docs::Tutorial

  Classes with API changes:

    * KinoSearch::InvIndexer
      o new() - Args changed.
        * create - Removed.
        * analyzer - Removed.
        * lock_id - Added.
      o spec_field() - Removed.
      o new_doc() - Removed.
      o add_doc() - Args changed.
        * Takes a hashref rather than a Doc object.
        * Accepts optional labeled param 'boost'.
      o delete_docs_by_term() - Removed.
      o delete_by_term() - Added.  (Behavior differs subtly from
        delete_docs_by_term()).

    * KinoSearch::Searcher
      o new() - args changed.
        * analyzer - Removed.
      o search() - Now calls Hits->seek before returning Hits object.  Args
          changed.
        * offset - Added.
        * num_wanted - Added.
        * sort_spec - Added.

    * KinoSearch::Search::Hits
      o Now comes pre-seeked, courtesy of changes to Searcher.
      o seek() - No longer triggers new number crunching if requested values
        can be accomodated using results of prior search.
      o fetch_hit() - Removed.
      o create_excerpts() - Now puts multiple excerpts under $hit->{excerpts}
        rather than one under $hit->{excerpt}.

    * KinoSearch::Search::MultiSearcher
      o new() - Args changed.
        * schema - Added.
        * analyzer - Removed.

    * KinoSearch::Highlight::Highlighter
      o new() - Args changed.
        * fields - Added.
        * excerpt_length - Now specified in characters rather than bytes.
        * excerpt_field - Removed.
        * pre_tag - Removed.
        * post_tag - Removed.

    * KinoSearch::QueryParser::QueryParser
      o new() - Args changed.
        * schema - Added.
        * default_field - Removed.
        * analyzer - No longer required -- now used to override schema.

    * KinoSearch::Analysis::TokenBatch
      o new() - Args changed.
        * text - Added.
      o next() - Returns a Token instead of a boolean.
      o reset() - Added.
      o add_many_tokens() - Added.
      o set_text(), get_text(), set_start_offset(), get_start_offset(),
        set_end_offset(), get_end_offset(), set_pos_inc(), get_pos_inc - All
        removed. 

  Internal changes:

    Large-scale refactoring has taken place.  The most significant 
    changes are...

    * OO framework imposed on C code via boilerplater.pl, with
      KinoSearch::Util::Obj as the base class.
    * Charmonizer added.
    * perlapi functions and data structures replaced whenever possible.
    * Lots of classes, especially under KinoSearch::Index, reorganized around
      Schema and SegInfo.  
    * Many tests added, removed, or revised to accomodate changes in the main
      library code.
    * C code moved to dedicated files.
    * Build.PL custom code moved to buildlib/KinoSearchBuild.pm
  
  File Format:

    * Significantly redesigned.  The most visible change is that the segments
      file is now encoded using YAML rather than an arbitrary binary format.
    * Old indexes cannot be read and must be regenerated.

  Locking

    * write.lock files now located in the index directory rather than 
      under /tmp.
    * Commit locks are no longer needed due to file format changes.
    * Stale write locks are now removed without warning.

0.15 2006-12-04
  * Remove dead lock files when possible (with a warning), rather than failing
    outright.  (Credit to Matthew O'Connor, Luke Closs, Socialtext for
    providing initial implementation and test.)
  * Fix package name glitch in SearchClient.

0.14 2006-11-12 
  * Add MultiSearcher, SearchServer and SearchClient.

0.13 2006-08-19
  * Fix "negate operator" bug in QueryParser.
  * Allow multiple fields to be spec'd for QueryParser.
  * Add Finnish stoplist.
  * Add ExtUtils::ParseXS and ExtUtils::CBuilder as prereqs, since
    Module::Build doesn't handle C code as of 0.28.

0.12 2006-06-26
  * Modify Highlighter API 
    o Deprecate pre_tag, post_tag arguments to new().
    o Now encodes some HTML entities by default.
    o Add support for new classes Encoder, SimpleHTMLEncoder, Formatter, 
      and SimpleHTMLFormatter.
  * Add new class KinoSearch::Search::Hit.
  * Add Hits::fetch_hit, which returns a Hit object.
  * Expose experimental API for TokenBatch.
  * Expose experimental API for Analyzer::analyze().
  * Fix bug with Stopalized indexes and QueryParser.
  * Fix bug: returned hits now sort secondarily on doc_num as advertised.

0.11 2006-05-17
  * Restore Stopalizer functionality.
  * Launder filenames so they pass taint check when index is initialized.
  * Restore call to optimize() in Lucene benchmarker.

0.10 2006-05-04
  * Improved Windows compatibility.
  * Make it possible to subclass some KinoSearch classes. 
  * Add InVindexer::add_invindexes().
  * Add bin/dump_index, contributed by Brian Phillips.
  * Tighten up C code for ISO C90 compliance.
  * Improved support for Russian and KOI8-R encoding.
  * Fixed bug affecting indexes with segments bigger than 4 GB.
  * Fixed bug #18899, KinoSearch and locale.

0.09 2006-04-13
  * Incremental indexing enabled 
      o delete_docs_by_term() added to InvIndexer.
      o option 'optimize' added to InvIndexer::finish.
  * Hits now returns the top 100 matches by default unless seek() has been
    called.
  * QueryFilter added.
  * Benchmarking scripts added.

0.08  2006-03-10
  * Restore ability to overwrite invindexes.

0.07  2006-03-10
  * Cut down on file descriptor requirements at search-time, eliminating "too
    many open files" error.
  * Make cleaning of invindex dir less aggressive when create => 1 is
    specified.

0.06  2006-03-02
  * Backwards incomaptible file format change (another is coming).
  * Opened up APIs for Query subclasses and QueryParser.
  * added KinoSearch::Highlight::Highlighter
  * Document, field, and query boosting enabled.
  * Behavior of KinoSearch::Search::Hits::fetch_hit_hashref modified.
  * KinoSearch::Document::Doc::to_hashref privatized.
  * Dependencies pared down.
  * Fixed bug affecting invindexes with 10 or more fields

0.05  2006-01-24
  * KinoSearch, a complete rewrite, supersedes Search::Kinosearch.