The PPCME2 text samples are based largely on the Middle
English section of the Diachronic Part of the Helsinki Corpus of English
Texts (available from ICAME), with certain additions and deletions. However,
the size of the samples is considerably larger. For the earliest
Helsinki time period, all texts are exhaustively sampled. For later
Helsinki time periods, two texts per period were expanded to 50,000
words. The remaining texts are represented by the Helsinki Corpus
sample.
The main Helsinki time periods are M1-M4, each covering
approximately one hundred years. In addition, texts originally written
in a given period but for which the earliest manuscript is from a later
period are given two digit period designations.
The current edition of the PPCME2 includes a total of roughly 1.2
million words of running text. Each of the 56 text samples in the
corpus is available in three forms: parsed, part-of-speech tagged, and
unannotated text. In addition, there is a file with philological and
bibliographical information about each text.
Wordcounts for the individual text samples, along with date and
genre information, are contained in the file WORDCOUNT-PPCME2 in the current directory. The
wordcounts exclude punctuation and extralinguistic material such as page
numbers or token ID numbers.
The file is a text file that is suitable for importing into any
spreadsheet program; the field separator is the space character.
Table 1: Helsinki periods
Period designation
Composition date
Manuscript date
MX1
unknown
1150-1250
M1
1150-1250
1150-1250
M2
1250-1350
1250-1350
M23
1250-1350
1350-1420
M24
1250-1350
1420-1500
M3
1350-1420
1350-1420
M34
1350-1420
1420-1500
MX4
unknown
1420-1500
M4
1420-1500
1420-1500
Wordcount information