Changes made in Sense-Clusters version 0.93 during version 0.95
Ted Pedersen tpederse@d.umn.edu
Anagha Kulkarni kulka020@d.umn.edu
Mahesh Joshi joshi031@d.umn.edu
1. Updated Toolkit/clusterstop/clusterstopping.pl : -Anagha
- changed the default cluster-stopping measure from PK2 to PK3
- changed the default crfun from h2 to i2
- formatted and added details to the error messages
- added check for catching "NaN" values generated by the crfuns
with the Expected / reference data (Gap Statistic)
- added a check for -ve delta values
- updated and reorganized the documentation.
- now generates PREFIX.gap file that contains crfun values,
delta values and the predicted k.
- updated the logic for setting the default delta value.
- modified the redirection from >& to > for the vcluster and
scluster calls.
2. Updated discriminate.pl : -Anagha
- changed the default #clusters from 10 to 2
- modified the program logic to catch the exit status of
clusterstopping.pl and if it has failed then output the
reason of failure from the *.predictions file (if present)
and use the default #clusters (2) to proceed.
- changed the calls to vcluster and scluster such that now
the --showtree option is used only if the #clusters > 1.
(NOTE: The -showtree option provides a ascii representation
of the clustering solution however if the #clusters is 1 then
this option generates quite a few error messages which are
not related to SenseClusters functionality. Thus we are
currently not using this option when #clusters = 1. If Cluto
fixes this problem in future then we can go back to using
-showtree option consistently.)
- now dendograms are generated by vclusters or scluster only
if #clusters > 1
- updated and reorganized the documentation.
- added an error check to verify that the number of bigram
features is not 0 before proceeding with generation of
co-occurrence features.
- removed the error check: if --training option not used
nor --split option used then --scope_train cannot be used.
- modified messages: added angled brackets to the filenames
and remove periods following filenames or parameters.
- added an error check to discriminate.pl to verify that the
specified training file exists.
3. Updated Web/SC-cgi/callwrap.pl : -Anagha
- Now displays the message about SVD not being performed or
cluster-stopping failing and thus using the default #clusters.
4. Updated Demos directory : -Ted
- reorganized files and directories somewhat, and added new options
to demo scripts, to reflect new functionality in the package that
has been introduced since the demos were last updated 2 years ago.
5. Updated Toolkit/preprocess/sval2/maketarget.pl : -Anagha
- added enclosing head tags to the regex generated by this script
via --head option.
6. Updated default stoplist : -Ted
- former stoplist only removed lower case words. The new list includes
stop words that begin with upper and lower case. This affects the
web interface, Demos, and Docs.
7. Updated discriminate.pl : -Mahesh
- Added support for LSA context clustering using the
"--context o2 --lsa" option combination
- Modified error messages
- Updated POD and command line help with respect to LSA context
clustering
- Incremented internal version
- Updated to invoke nsp2regex.pl after wordvec.pl in SC native
order2 context clustering mode
8. Updated Toolkit/vector/order1vec.pl : -Mahesh
- Modified output of --clabel option to discard features that were
not found even once in the test data
- Added --transpose option to support output in the form of a
feature-by-context matrix similar to Latent Semantic Analysis
(LSA) representation
- Added --testregex TEST_REGEX option, which outputs only those
regular expressions from the input FEATURE_REGEX file that
matched at least once in the input SVAL2 file. This file
is required as input to order2vec.pl in LSA context clustering
mode.
9. Updated Toolkit/vector/order2vec.pl : -Mahesh
- Dropped the --token TOKEN_REGEX option and the FEATURES file at
the command line, order2vec.pl now requires a command line of
the form:
order2vec.pl [options] SVAL2 WORDVEC FEATURE_REGEX
- Modified the regex that reads features from features file, to
accept general ngrams, rather than just unigrams
- Updated POD and command line help
10. Added new test cases in Testing/vector/order2vec/ -Mahesh
- Added four test cases for four types of features, testing the LSA
context clustering scenario, in binary and non-binry mode
11. Updated web interface files in Web/SC-cgi -Mahesh
- Modified index.cgi, first.cgi, second.cgi and callwrap.pl to support
LSA context clustering
12. Updated Docs/HTML/discriminate.html -Mahesh
- Updated with respect to POD update of discriminate.pl
13. Updated Docs/HTML/Toolkit_Docs/vector/order2vec.html -Mahesh
- Updated with respect to POD update of order2vec.pl
14. Updated Docs/Flows/SenseClusters-ContextClustering.ai/pdf -Mahesh
- Added LSA context clustering flow
15. Updated Docs/Flows/SenseClusters-WordClustering.ai/pdf -Mahesh
- Removed obsolete kocos.pl call from the flow
16. Updated SC/Toolkit/clusterlabel/clusterlabeling.pl to create -Anagha
the temporary files with time-stamp in their names.
(Changelog-v0.93to0.95 Last Updated on August 7, 2006 by Anagha)