lib/XML/Comma/docs/todo.ski

- implement ->count() for storage iterators, since we have to find all 
  the docs before iterating over them anyway, so there's no performance
  overhead... WARNING: be sure to die if given a where_clause!
ski@wwe:~$ perl -le 'print time." starting..."; use XML::Comma; my $it = 
XML::Comma::Def->AllAfrica_NewsStory->get_store("post")->iterator(); my 
$i=0; while(++$it) { ++$i; print time.": ".$i unless($i % 1000) } print 
$i'
1205511459 starting...
1205511477: 1000
1205511482: 2000
1205511487: 3000
18 sec startup time + 4 sec per 1000 docs

- '$doc->bar = 123;' syntax is possible - see:
	perldoc perl56delta / perlsub:
		Lvalue subroutines

		WARNING: Lvalue subroutines are still experimental and the
		implementation may change in future versions of Perl.

	Compress-Raw-Zlib uses it...

- if first time store and #stores in def == 1, don't complain if store
  name isn't specified.

- fixup "get_index->rebuild()" requires stores argument bug

- fix comma-rebuild-index.pl and any other scripts to deal with loading
  defs from modules

- make install hack:
  in misc/install-extras.pl, exec perl -e 'use XML::Comma' to
  force compilation of Inline::C stuff for Parser?

- document select_count in guide.html, etc.

- no doc_id on new (unstored) docs makes creating docs that point at one 
  another hard

- perhaps implement iterator->to_array($n) where you are forced to
  specify a max size will get thru the comittee? other args:
  whole_enchilada => 1, reset_it_count (equivalent to it_refresh
  immediately after)

doc_is_new is confusing now.
since a new doc can be set an arbitrary number of times w/o need of 
get_lock or keep_open, in some sense a new doc is always new until you 
load it from disk. currently doc_is_new ceases to be new as soon as you 
associate the doc with some storage, which is what brian wants, but you 
can write code like this (from multi_store_doctype.t)
  my $doc = XML::Comma::Doc->new( type => $def_one_name );
  $doc->element( "foo" )->set( "foo_$_" );
  $doc->bar( "bar_$_" );
  $doc->store( store => "one" ); # updates indices 'one' and 'all'
  $doc->store( store => "two" ); # updates indices 'two' and 'all'
resolve this ambiguity.

fix Index->erase() to clear cached data as necessary
then implement rebuild(drop_first => 1);

alter table statements fail with postgres for when we change a def's 
sql_type or so:
  DBD::Pg::st execute failed: ERROR:  syntax error at or near "MODIFY"
  LINE 3:               MODIFY type VARCHAR(255)
                        ^
   [for Statement "
            ALTER TABLE TravelPo_0001
                MODIFY type VARCHAR(255)
        "] at /usr/local/share/perl/5.8.8/XML/Comma/SQL/Base.pm line 293.

would it be worth having some SQL-backend-neutral names for sql_types?
e.g. pg:
  varchar text name bytea bool date time - etc.
vs mysql:
  VARCHAR CHAR INT TINYINT - etc.
more here: 
  http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606khatri/

trying to rebuild an index_only store shouldn't work, but it should give 
a better error message than this:
tperl -e 'XML::Comma::Def->Shorten->get_index("main")->rebuild(verbose => 1)'
  beginning re-index from store main...
  error while rebuilding: UNKNOWN_ACTION -- no method or element 
  'extension' found in 'DocumentDefinition:store' at -e line 1

if i try to do <field><name>foo</name></field>, and foo is a plural
  element, i don't get indexed - nor do i get any error output!
  when is the right time to abort?
  also, is it possible that we should auto-detect by plurality
  if we want a field or collection, and have both mean the
  same thing? (perhaps not, the programmer needs to know the
  difference to create sql to actually query the thing)

- we have to rejigger the install process to have a seperate 
  dbconfig step sooner than later, as when we return from cpan currently,
  we make and make test, and die slowly w/o that info... do a basic SQL
  connect check first, perhaps, and if it fails, skip the rest.


testing cpan with a local tarball:
  TARBALL=/media/edgy/home/ski/mysrc/XML-Comma-1.993.tgz
  mkdir /root/.cpan/build
  tar xfz $TARBALL
  cd XML-Comma-*
  cpan .

convert all uses of bareword file handles to proper my $foo style
  (that is, if this is compat with 5.6)

calling ->erase() when you don't have perms to delete the store doesn't 
seem to do the right thing... (no error - try with a TravelPictureLocal)
  (just a reflection of the index errors not propagated problem we had?)

a terser, and hopefully quicker to parse data format not based on XML?
  http://en.wikipedia.org/wiki/YAML
  http://en.wikipedia.org/wiki/JSON
  - or perhaps use XML with attributes, as
    <element name="foo" />
  is a hell of a lot terser and easier to read than
    <element><name>foo</name></element>
  the selection of storage format would come in a layer right next
  to storage/location... or perhaps look at how the gzip compression
  is implemented and do something like that?

fix sync_deferred_textsearches/Pg the Right Way

- access to convenience methods like AllAfrica::Travel::Post::get_uri
  and blurb_text... for now we have to call
  AllAfrica::Travel::Post::get_uri($doc), instead of $doc->get_uri;

#it'd also be nice to say things like the below, instead of
#having a method with the same name as:
#     <textsearch>
#         <name>full_text</name>
#         <code><![CDATA[ sub {
#            return $_[0]->title."\n".
#                   join("\n", $_[0]->tags())."\n".
#                   $_[0]->body_text;
#         } ]]></code>
#      </textsearch>

shortcut methods skip set_hook (and probably read_hook) !
  (at least with a blob_element - try $doc->original("foo")
   on a travel...Picture::Local doc)

<location> for BlobElements
  this deals with elements on a different FS in a "smart" way,
    and lets you move blobs out from the same hierarchy
    as the xml docs, so you can grep -r there easily
  another way to think of this is as the inverse of
    the <extension> field (ie, <prefix>, akin to
    PREFIX=/some/path make install)

integrate some variation on Versionable into core

parameterized where_clauses and such.

default mode of destination dir should be $owner:{www-data|web}
  and mode 664 on comma.log

introspection cleanup:
  add to docs that we deprecate from top-level of DocDef:
    <required>, <plural>, <is_i*_from_hash>, these should  be in
    a properties block now:
      <properties>
        <required>qw ( foo bar )</required>
        <plural>qw ( foo )</plural>
        <doc_key>qw ( bar )</doc_key>
      </properties>
  add support for this syntax:
    <element><name>foo</name>
      <properties>plural,required</properties></element>
    <element><name>bar</name>
      <properties>doc_key,required</properties></element>
  allow support for persistent setting of this in a UI system
    or be somewhere else in a UI-settable system
    where can we store this adjunct data?
    perhaps this should wait until a file can be both a def and doc?
  also clean up the "4 different ways" problem, standardize on the
      last (while retaining compatibility):
    $def->is_{required|plural|is_ignore_for_hash}
    $el->def->is_{nested|blob}
    $macros{enum|boolean|range|timestamp|timestamp_created|timestamp_last_modified}
    $def->has_property($required_plural_blob_nested_enum_or_etc, $el_name)
  also rewrite legacy functions like is_required as calls to has_property

insights from introspection:
1 - a bug: you can load a <TravelPost>...</TravelPost> with doc on disk
    that has <tpt>...</tpt> - add a sanity check there.
2 - a possible bug: the below dies if we don't have a listing_fields (?)
  <nested_element>
      <name>listing_fields</name>
      <element><name>foo</name><macro>something_that_installs_validate_hook</macro></element>
      <required>qw ( foo )</required>
  </nested_element>

comma/postgres bug:
  need to use a superuser for now.
  later do what it suggests and change COPY to \copy ?

#complete test:
dropdb -U postgres comma ; createdb -U postgres -O comma comma; \
make distclean ; perl Makefile.PL && make && make test && sudo make install

TODO: make sure all new API stuff is in doc/guide.html and Bootstrap.pm

add to comma-2.html/Changelog:
  some sort of smart dump of svn commits to chronicle

2.1:
- apply & fix the stuff in weak/ for ->parent, ->top, etc.
- get more info on copyright bounding start dates from comma 1.21
- there is no way to get the size of the store from the index... perhaps
  this is a feature? ... but it would be nice to have an 
  $index->total_count or so
- make an Iteratable.pm or so which Storage::Iterator and
  Index::Iterator inherit from ( we probably can't use it for much
  but it will look nicer that way at least ... )

$sit->length() vs. $idx->count() ... resolve this inconsisency if poss.
consider iterator_refresh for storage iterators?
$sit->length(max => $n) would be nice for gigantic stores...

dave ideas:
  2 - (not sure if this is something we should do via comma, but it
       would help convert people who are SQL-addicts):
  this common construct is reaaaaaly slow:
        for ( @{ $doc->issue_ptr } ) {
           push @issues, $index->single( where_clause => "doc_id = $_" )
        }
        foreach my $issue (@issues) {
          ...
        }
  you hit the DB N times. you can hit it just once by building a gigantic
  where clause, e.g.
        my $wc = join ("OR", map { "doc_id = $_" } @{ $doc->issue_ptr });
        my $it = $index->iterator(where_clause => $wc);
        while(++$it) { 
          ...
        }
  or even better, using an IN statement:
        my $wc = "doc_id IN (".join (",", @{ $doc->issue_ptr }).")";
        my $it = $index->iterator(where_clause => $wc);
        while(++$it) { 
          ...
        }


Reported by tim, 2/15/2007

This doesn't work... Probably to fix this after we get the weakref
stuff in and right.
$ perl -MXML::Comma -e 'use strict ;
  my $def = XML::Comma::Def->read( name=>"AllAfrica_Publisher" ); 
  foreach my $el ( $def->def_sub_elements() ) {
    my $tag = $el->name(); 
    print $tag . "\n" if $el->element_is_plural($tag);
  }'

This does:
$ perl -MXML::Comma -e 'use strict ;
  my $def = XML::Comma::Def->read( name=>"AllAfrica_Publisher" );
  foreach my $el ( $def->def_sub_elements() ) {
    my $tag = $el->name();
    print $tag . "\n" if $def->is_plural($tag);
  }'

This is an API bug. THE other is_* have this problem as well.


- standardize/clean up the order of "use" statements, etc. in t/*.t


with postgres, we get lots of spurious warnings starting "NOTICE"
  these can be ignored, but are ugly. Pg code in general needs a cleanup

pg for dummies:
  To get postgres up and running on my machine, for use with comma, I
  just did an ubuntu install, then:
    apt-get install postgresql-8.1 postgresql-server-dev-8.1
    sudo cpan -i DBD::Pg
    sudo bash
     $EDITOR /etc/postgresql/8.1/main/pg_hba.conf
     ## change:
        local  all  postgres  ident sameuser
        local   all         all                               ident sameuser
     ## to:
        local   all         postgres                          ident sameuser
        local all all password
        #local   all         all                               ident sameuser
     /etc/init.d/postgresql-8.1 restart
     su postgres
      createuser --pwpompt comma #for now make a superuser
        #there is a bug we need to fix with COPY commands with Pg
        #relevant tests are in t/indexing.t
      createdb -O comma comma
      exit #shell as postgres user
     psql -U comma comma #make sure you can connect
     psql -U postgres #make sure you can connect
     #edit Configuration.pm accordingly

- element-specific locking for when whole body lock is unnec. and too
  heavy weight?
- maybe use ExtUtils::MakeMaker::prompt for automated testers
- it'd be nice to have "iterators" on nested plural elements
- more test cases for $blob_el->{append|set}() combinations
- make $blob_el->set() more efficient by nuking the fcopy and
  instead keep the appendage and the original data seperate
  use _set_was_called for this

- t/indexing.t sometimes fails because of an incomplete previous run, or
  cruft in the database. make it more robust.
  ### step 5. - STORE_ERROR/POST_STORE_ERROR/DOC_INDEX_ERROR (0001) 
  think: table to drop, delete entry in index_tables, delete any comma_locks
  (dangerous)

- it'd be nice if we could do some SQL-ish things with storage iterators...
  but best to read up on the progress (or is that lack of progress) with\
  SQLite first...
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)