General introduction

As with the Penn Historical Corpora, our primary goal has been to create an annotation system that facilitates automated searches rather than to give a correct linguistic analysis of each sentence, which in many cases is unworkable and in some cases (due to structural and morphosyntactic ambiguity, inaudible material, or other reasons) downright impossible.
We have tried to plan our system so that at each stage of the annotation, information can be added in a monotonic way. That is, we want any future revisions of the bracketed structures always to add information, never to change it. This goal requires us to avoid judgments that are subjective or error-prone.
As much as possible, we have tried to avoid making decisions that would be controversial, whether with regard to text interpretation or to linguistic theory. In doubtful cases, we either avoid specifying structure, or we use default rules to decide the case for search purposes. An example of the first strategy concerns VPs. These are normally not indicated in the corpus, since VP boundaries are normally indeterminate. This is clearly the case in Middle English, which allows scrambling and where the internal structure of the VP is variable and changing. But even in modern English, there are many cases in which it is not clear whether some phrase attaches as a daughter of VP or higher up in the tree. An example of the second strategy concerns PP attachment. Whenever it is unclear where a PP attaches, we attach it by default as high as possible.