The following 151 words are regarded as common words for the purpose of listing N-gram and collocation matches. A bigram is excluded from the published lists if either of the words it consists of is among these words. A trigram is excluded if it contains two or more such words. Tetragrams and above are always listed. Similarly, a collocation is listed only if it contains at least two words which are not among these words.

        'tis      a      about      after      against      all      am      an      and      another      any      are      as      at      away      bar      be      because      before      both      but      by      can      close      come      could      dare      did      do      down      enough      enter      every      for      from      given      go      good      had      hath      have      he      hence      her      here      him      his      how      i      i'll      if      in      into      is      it      know      let      like      little      lord      love      make      man      many      may      me      might      more      most      much      must      my      need      neither      never      next      no      none      nor      not      nothing      now      o      of      off      on      once      one      or      other      our      out      over      part      past      see      shall      she      should      since      sir      so      some      such      take      than      that      the      thee      their      them      then      there      therefore      these      they      this      those      thou      though      through      thy      till      to      too      until      unto      up      upon      us      was      we      well      were      what      when      where      which      while      who      whom      whose      why      will      with      within      without      would      yet      you      your        

All matches are based on the lemmatized forms of words, rather than the words themselves; for example, kind hearts is matched with kind-hearted. Consistent with that, all words that are lemmatized the same as one of the above words are also treated as common words. For example, although only O is listed above, Oh is also treated as a common word, since they share the same lemma.

My published web pages giving lists of matches state the number of common words as 154. That was an error: unaccountably, an was listed four times instead of once in the list I originally made.

Why 151 (or 154) words?

The original list of common words consisted of the one hundred most common words in my database of plays. I found that this was not enough: too many very common bigrams and trigrams were being admitted to the published lists, making the files even larger than they are now, and hard to navigate. I had to increase the number of common words on the list. I therefore merged my list with the separate list of one hundred function words given in the following paper: Segarra et al, 'Attributing the Authorship of the Henry VI Plays by Word Adjacency', Shakespeare Quarterly, vol. 67, no. 2 (Summer 2016), 232-256. As the two lists already had many of the same words, the merged list contains 151 words. Segarra et al's list is, as I now realise, not the best one to use. For example, it contains given but not give; however, as both words have the same lemma, give is also treated as a common word.