Web Science/Part2: Emerging Web Properties/Modelling Similarity of Text/Jaccard Similarity for Sets

Jaccard Similarity for Sets

Learning goals

  1. Understand how text documents can be modeled as sets
  2. Know the Jaccard coefficient as a similarity measure on sets
  3. Know a trick how to remember the formula
  4. Be aware of the possible outcomes of the Jaccard index
  5. As always be able to criticize your model

Video

Script

The slides can be found at File:Jaccard-Similarity-for-Sets.pdf

Quiz

1 given D1 = a a a b and D2 = b b b a what is the jaccard coefficient of the corresponding word sets?

1/1
2/4
2/8
2/6

2 given D1 = a b c d e and D2 = e f g h what is the jaccard coefficient of the corresponding word sets?

0
1/7
1/8
1/9
2/8


Further reading

  1. tba

Discussion