Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Comparing Results of Similarity Merasures

Comparing Results of Similarity Merasures

Learning goals

  1. Understand that different modeling choices can produce very different results.
  2. Have a feeling how you could statistically compare the differences of the models.
  3. Know how you could extract keywords from documents with the tf-idf approach.
  4. Try to argue which model you like best in a certain scenario.

Video

Script

Quiz

1 which method can be used best to find characteristic words of a text?

jaccard
TF-IDF
TF
Language Model
Smoothed Language Model

2 Which method works well in an information retrieval setting

jaccard
TF-IDF
Language Model
Smoothed Language Model

3 Which method should be used when you don't have several occurences of the same elements?

jaccard
TF-IDF
Language Model
Smoothed Language Model


Further reading

  1. tba

Discussion