Experiments with Collocations and N-grams
This page lists some experiments I did with my collocations and N-grams data in late 2018. My aim was to give an example of how the data could be further processed to obtain new knowledge.
I did five experiments, which are summarised below. The link to the results folder below will take you to a directory from where you can download the results.
The set of five experiments was not planned. I did each experiment without any expectation of doing any more. I have not gone back to re-edit what I wrote, so the documents in the results folder present my thoughts as they developed at the time. These documents were written as examples, and were not intended for submission to a journal. I circulated them privately to a few scholars for their interest.
In this experiment I used my formal N-grams data to try to establish what kinds of N-grams are the best for authorship attribution.
I continued experiment 1, adding more detail to the results and refining the conclusions.
I did a control test, to show that the method I used in the previous experiments is not biased.
I enlarged the set of plays used in the above experiments to include the so-called Extended Kyd Canon defined by Sir Brian Vickers.
In the final experiment I sketched a new method of authorship attribution and did a preliminary test with it. A great deal more work would need to be done if I were to develop the method further, particularly for use with scenes and acts rather than whole plays.