LATENT SEMANTIC Process (LSP)

In a typical LSI (Latent Semantic Indexing) process there are 3 main steps: Document Indexing, Term-Document Matrix construction, and Document Retrieval (via Singular Vector Decomposition). During the Document Indexing, filtration, tokenization, markup removal, and stemming is optionaly added.

Normalization, local and global weights can be used in the scoring step. The used of log normalized or augmentated scales could be part of the local weights. Entropy could be implemented if the choice would be to use global weights. In any case, plain frequency counts of terms would not be enough.

Were terms are synonyms or not is irrelevant to SVD. The co-occurence of diferent orders (1st level, 2nd level, 3rd level, etc) is what causes a term clustering, after the reconstruction done by Singular Value Decomposition. Latent Semantic Indexing causes a redistribution of term weights, across the entire matrix representing the collection of documents. This means that any single change can cause a new re distribution of these terms inside any of these documents. The whole process can be used on keyword researches, query expansions, or query reformulations.

SEO Intelligence | Articles | Blog | Tools | Research | Services | Engines | Consultants