<html>
<title>Latent Semantic Analysis</title>
<body>
<h1>Latent Semantic Analysis</h1>
</body>
</html>

Our implementation of Latent Semantic Analysis supports clustering of  
similar contexts and clustering of lexical features. This provides the  
same functionality as is available in the native SenseClusters 
methodology, but using a different underlying representation. 
<br><br>
Traditionally LSA represents text using a term by document matrix. Our  
implementation generalizes this to a feature by context matrix, where  
terms are but one kind of feature, and documents on kind of context. 
Features may be unigrams, bigrams, co-occurrences, and target  
co-occurrences. Contexts may be units of text of any length, although  
typically they are sentences, paragraphs, or short articles.
<br><br>
The basic assumption behind LSA feature clustering is that features can be  
differentiated from each other and divided into classes or clusters based  
on the contexts in which they occur. Features that occur in similar 
contexts are assumed to be similar to each other. A similar assumption 
underlies LSA context clustering, in that contexts that are made up of 
features that have occurred in similar contexts should be considered 
similar to each other.