In this tutorial, we will compare two worksets using feature Frequency, Dunnings Log Likelihood and Term Frequency/Inverse Document Frequency
On the Workbench Project page, select the Worksets Comparison tool and click continue. You can also double-click the tool icon to continue.
The Workset Feature Comparison toolset
allows users to select two worksets and compare them in terms of unique word-forms (spellings) or groups of related word forms (lemma). It offers several different ways of performing these comparisons. Note that you can use the Export Features button to create tab separated (TSV) raw data for use in spreadsheets or other analytics programs.
Select the workset "genre: play-comedy" (creating worksets is covered in the tutorial Define Worksets) in the First Workset dropdown box.
In the Second Workset dropdown box, select "genre: play-history." These pre-defined worksets respectively contain all the comedy and history plays in the MONK datastore. We will be comparing works in these two worksets, according to criteria we select below.
Before performing the comparison, we also need to select the analysis method. Select Frequency Comparison. Next we need to select the feature category to compare, with options for (Spelling or Lemma). For this tutorial, select spelling. We also need to provide the Number of Features. This specifies the top most frequently appearing features in each workset. Finally, select the feature class; for this demonstration select Adjective.
The comparison process may take some time to finish, particularly when comparing large worksets.
The Spellings of the top n number of adjectives are displayed in three groups: the most frequent Spellings that appear uniquely in the First Workset, the most frequent Spellings that appear uniquely in the Second Workset, and the Spellings that are most frequent in both Worksets..
Clicking on any result shows the occurrence of the result in a chart displaying frequency of usage over the time period of the two Worksets selected.
Similarly, selecting multiple results (alt-click in Windows; command-click on Macs) shows the occurrence of each of the result in the chart allowing for comparison within the results themselves.
Select the "genre:play-comedy" workset (creating worksets is covered in the tutorial for Define Worksets) in the First Workset dropdown box.
In the Second Workset dropdown box, select the "genre:play-history" workset. We will be comparing works in these two worksets, according to criteria we select below.
Select Dunnings Log Likelihood as the analysis method, selecting the first workset (genre:play-comedy) as the reference set.
Select Minimum Frequency from the drop down. This specifies the minimum frequency for a term to be considered in the analysis.
Other options are not available for Dunnings Analysis.
Press the Compare button.
This feature comparison process takes some time, particularly when comparing larger groups of works.
The result is displayed as a word cloud. Word colors indicate their overuse (black), underuse (silver) and size indicates the extent of their overuse or underuse.
Clicking on any of the words, in this case strange displays the word within the works where it occurs. You can also use the "Toggle show all" to display every occurrence of that word in the workset, rather than just a few.
Select "genre:play-comedy" workset (creating worksets is covered in the tutorial for Define Worksets) in the First Workset dropdown box.
In the Second Workset dropdown box, select "genre: play-history" (the set of all historical plays in the MONK datastore). We will be comparing works in these two worksets, according to criteria we select below.
Make selections for Feature (either Spelling or Lemma), Number of Features, and Feature Class. We will select IDF as the analysis method, with the first workset as the reference set. Press the Compare button.
Clicking on any of the feature, for example, the word children, displays the frequency of usage over the time period in the two Worksets selected.
Selecting multiple results (alt-click in Windows, command-click on Macs) shows the occurrences over time for all selected features, allowing for comparison.
Click the continue link at the bottom right to view the documents. This toolstep displays the Work Selection tool and the Advanced Viewer tool. Select the document in Work Selection tool and the document text along with Table of Content and Bibliographic Information will appear on the right.
This tool also has a concordance search function, available under the "options" pane. The concordance search function allows you to form three different types of queries. You can search by part of speech, by lemma or by spelling. For example, to search for all nouns in a work you would form the query "* (n)". The lemma search works in the same way as the selection pane. For example, to search for the lemma love as a noun, form the query "love (n)." Spelling is simplest; just type the word you're looking for.
The concordance search also accepts the "*" and "|" operators used throughout MONK. You may not combine different types of searches. For example, searching spelling and lemma in one query (ex. "king (n*) | life") will not work. If you use multiple terms in a query, the concordance search will assume "or" statements between them.
The advanced viewer also has a TEI header tab available under options. This gives a variety of contextual information about the document, including its provenance.
The results can be viewed according to multiple groupings.
The Score of each work in the results indicates its similarity to the works in the First Workset, a score closer to 1.0 indicates higher similarity.
The Work Selection tool displays the results of the analysis in tabular form, as well as on a timeline and in a pie chart.
The results can also be saved as a workset.