Changes to earlier annotation guidelines

Adjectival vs. verbal participles

When modifying nouns or as predicate adjectives, departicipial adjectives (INTERESTING, INTERESTED) are now tagged ADJ (rather than as VAG or VAN).

AS

When functioning as the head of a degree complement, AS is now tagged P (rather than C). Its clausal complement bears a dash tag (CP-DEG, IP-INF-DEG), parallel to the corresponding comparative case (CP-CMP).

AS WELL AS

The correlative conjunction use of
AS WELL AS is now distinguished from its ordinary comparative use (which remains the default).

Comparative and degree complements

For both theoretical and practical reasons, degree clauses are now annotated analogously to comparative clauses. The complements of degree and comparative heads (including their traces when extraposed) uniformly attach in the lowest position available to them in the overt syntax.

As a result, prenominal SUCH no longer projects an ADJP dominating the trace of an extraposed degree complement.

Conjunction of words with unlike categories

Word-level conjunction is now annotated uniformly, regardless of whether the first and subsequent words share the same category.

Superseded Current
(NP (PRO you)
    (CONJP (CONJ and)
           (OTHERS others)))
(NP (PRO you) (CONJ and) (OTHER others))

Complex tags

To facilitate (possibly future) lemmatization, we have reduced the incidence of complex tags by splitting words in more cases than in previous releases.

In compound words, complex tags in the (Early) Modern English corpora (PPCEME, PPCMBE2, PCEEC2) have generally been replaced by appropriate simple tags.

Superseded Current
(NP (ADJ+N gentleman))
(NP (N gentleman))
(ADJP (ADVR+ADJ overeager))
(ADJP (ADJ overeager))
(PP (Q+BEP+PRO albeit)
     (CP-ADV ...))
(PP (P albeit)
     (CP-ADV ...))
(ADVP (WADV+ADV however))
(ADVP (ADV however))
(WADJP (WADV+ADV however) (ADJ great))
(WADJP (WADV however) (ADJ great))

But the complex tags are retained if simplifying the tag would require splitting words or revising higher syntactic labels as well as in certain other cases.

Unchanged Not changed to
(D+OTHER another)
(D an@) (OTHER @other)
(PP (P+N indeed)
(ADVP (ADV indeed))
(NP (Q+ONE everyone))
(NP (Q everyone))

ELSE and ENOUGH

When following the head it modifies, ELSE and ENOUGH now consistently projects phrases (ADJP or ADVP, as appropriate in context) in the PPCME2, as in the later corpora.

ENOUGH is tagged ADVR only when it modifies an adjective; otherwise, it is labeled ADJR, projecting NP-MSR if necessary.

Movement out of ADJP, NP, and PP

In the earlier guidelines, PP complements or modifiers preceding the head noun in NP were always assumed to move out of NP and to be daughters of IP. However, a fair number of cases could not be easily annotated this way. The current guidelines treat movement out of all three categories in the same way - in particular, such movement is now indicated only if it is not string-vacuous. The current treatment is not necessarily correct for the NP case, especially for later stages of the language, but it is more uniform and thus easier to apply consistently, it covers more of the cases, and it also facilitates comparisons with historical French.

Predicate adjective or noun phrases

Predicate adjective phrases are explicitly tagged ADJP-PRD (rather than bare ADJP), and predicate noun phrases are tagged NP-PRD (rather than NP-OB1).

Punctuation

Punctuation (including parentheses and brackets) is uniformly tagged as PUNC. As in earlier releases, it is attached as high as possible (not necessarily where it "belongs").

PUNC is not on the default "ignore_nodes" list of CorpusSearch and needs to be added, if necessary, to the list by the user (best in a preference file).

Word tokenization

We have made word tokenization more uniform between Middle English and the later stages of the language, resulting in fewer "plus" tags in the corpora, and especially unusual ones like D+ADJ, P+PRO, etc. In addition, there are now only a handful of items that are tokenized differently in the Middle English corpus than in the later ones. See Annotation differences among the corpora for details.