Web Science/Part2: Emerging Web Properties/Advanced statistical descriptive models for the Web/The Zipf law for text

The Zipf law for text

Learning goals

  1. Be able to name some fundamental properties about how frequencies of words in texts are distributed
  2. Be a little bit more cautious about visual impressions when looking at log-log plots
  3. Know both formulations of Zipf’s law



Find the slide deck at File:Questioning_the_Zipf_law.pdf


What do you know about Zipf law?

Plotting the rank of words against the frequency appear as a straight line
the word rank multiplied by its frequency is supposed to be roughly constant
on the simple english wikipedia dataset the law only seams to hold for the top ranked words
Zipf's law has been falsified for many years and is only taught for historical reasons

Further reading

  1. tba
