Dunning's log-likelihood statistic was developed by Ted Dunning in the early 1990's. It can be seen as a refinement of Pearson's chi-square test, which has been a statistical work horse for over a century. It reports its results as if it were a chi-square test with one degree of freedom. This makes it an easy statistic to evaluate for anybody who has worked through an elementary statistics text book. The improvements over chi-square result from more nusanced assumptions about how words are distributed and how to compute the distance between the observed and the expected.
You use Dunning's by choosing two work sets. You want to know what words are overused or underused in your 'analysis corpus' when compared with your 'reference corpus'. Your analysis corpus could be a subset of your reference corpus. You might ask what words in Bleak House are disproportionately common or rare against the background of all Dickens novels. But you can just as easily compare two corpora that have no overlap, such as novels written by men or women.
The input for Dunning's can be spellings, lemmata, POS tags, lemma bigrams, or POS trigrams, but in the current interface only spellings and lemmata are enabled. The output gives you the counts and relative frequencies for each 'feature' and the log likelihood value, which is identical with the chi-square values for one degree of freedom. It is important to have a clear sense of the odds associated with particular chi-square values, which do not increase in a linear fashion: small increases in the chi square value reflect huge decreases in probability, as is shown by the following table:
| Log likelihood ratio | Percentage | Odds |
| 3.84 | 5% | 1 in 20 |
| 6.63 | 1% | 1 in 100 |
| 7.9 | 0.5% | 1 in 200 |
| 10.83 | 0.1% | 1 in 1,000 |
| 15.15 | 0.01% | 1 in 10,000 |
| 19.5 | 0.001% | 1 in 100,000 |
| 23.9 | 0.0001% | 1 in a million |
| 37.3 | 0.0000001 | 1 in a billion |
At the same time, you want to be cautious in interpreting the astronomically low probabilities that are associated with quite modest actual differences. The following figures are drawn from a Dunning's test comparing 250 19th century British novels by male and female writers. It is not a great surprise that 'she' (including the possessive use of 'her') discriminates most sharply between male and female fiction. It accounts for 2.7% of all words in women writers, compared with 1.6% in male writers. Women writers use 'she' almost twice as much as men. The chi square value associated with this disparity is the astronomical 43,095.
Male and female writers also differ in their use of the definite article. Here the proportional difference on the order of 10%. Men use it a little more often (5.3%) than women (4,7%) -- a result consistent with the findings of a group of Israeli computational linguists about differences between male and female writing across a great variety of genres. The chi square value for this result is 2,912, smaller but still astronomical and not easily squared with an absolute difference so small that it would be difficult or impossible for readers to sense.
The Dunning's test is sensitive to the relative frequency of words. With a very common word, proprtionally small differences will produce very large chi-square values. Differences on the order of 10% create a chi-square value of ~3,000 with the approximately two million occurrences of 'the'. 'Heart' is, not surprisingly, the noun most overused by women writers. There are 30,000 occurrences of it. Women use it more by ~50%, but the chi-square value is only. ~1,500. 'Affection' occurs about 6,500 times. Women use it more than twice as much as men, but the chi-square value is only ~1,200.
In summary, the evaluation of Dunning's tests is not very difficult, but to make proper of the results you must pay attention to raw counts, relative frequencies and their ratios as well as to the chi-square value.
A similar caution applies to the visualization or tag cloud that can be created from Dunning's. Here the over- and under-used words appear in black and grey. But the size of the word reflects a statistical measure, not a difference in relative frequency.
Sources:
Dunning, T. 1993. "Accurate methods for the statistics of surprise and coincidence." Computational Linguistics 19.1 (Mar. 1993), 61-74.
Griffiths, D. Head First Statistics. 2009. Second edition. O'Reilly.