Changes to earlier annotation guidelines
Adjectival vs. verbal participles
When modifying nouns or as predicate adjectives, departicipial adjectives
(INTERESTING, INTERESTED) are now tagged ADJ (rather than as
VAG or VAN).
AS
When functioning as the head of a degree complement, AS is now tagged
P (rather than C). Its clausal complement bears a dash
tag (CP-DEG,
IP-INF-DEG), parallel to the corresponding comparative case
(CP-CMP).
AS WELL AS
The correlative conjunction use of AS
WELL AS is now distinguished from its ordinary comparative use
(which remains the default).
Comparative and degree complements
For both theoretical and practical
reasons, degree clauses are now
annotated analogously to comparative
clauses. The complements of degree and comparative heads (including
their traces when extraposed) uniformly attach in the lowest position
available to them in the overt syntax.
As a result, prenominal SUCH no longer projects an ADJP
dominating the trace of an extraposed degree complement.
Conjunction of words with unlike categories
Word-level conjunction is now annotated uniformly, regardless of whether
the first and subsequent words share the same category.
Superseded
| Current
|
(NP (PRO you)
(CONJP (CONJ and)
(OTHERS others)))
|
(NP (PRO you) (CONJ and) (OTHER others))
|
Complex tags
To facilitate (possibly future) lemmatization, we have reduced the
incidence of complex tags by splitting
words in more cases than in previous releases.
In compound words, complex tags in the (Early) Modern English corpora
(PPCEME, PPCMBE2, PCEEC2) have generally been replaced by appropriate
simple tags.
Superseded
| Current
|
(NP (ADJ+N gentleman))
|
(NP (N gentleman))
|
(ADJP (ADVR+ADJ overeager))
|
(ADJP (ADJ overeager))
|
(PP (Q+BEP+PRO albeit)
(CP-ADV ...))
|
(PP (P albeit)
(CP-ADV ...))
|
(ADVP (WADV+ADV however))
|
(ADVP (ADV however))
|
(WADJP (WADV+ADV however) (ADJ great))
|
(WADJP (WADV however) (ADJ great))
|
But the complex tags are retained if simplifying the tag would require
splitting words or revising higher syntactic labels as well as in
certain other cases.
Unchanged
| Not changed to
|
(D+OTHER another)
|
(D an@) (OTHER @other)
|
(PP (P+N indeed)
|
(ADVP (ADV indeed))
|
(NP (Q+ONE everyone))
|
(NP (Q everyone))
|
ELSE and ENOUGH
When following the head it modifies, ELSE and ENOUGH now consistently
projects phrases (ADJP or ADVP, as appropriate in
context) in the PPCME2, as in the later corpora.
ENOUGH is tagged ADVR only when it modifies an adjective;
otherwise, it is labeled ADJR, projecting NP-MSR if
necessary.
Movement out of ADJP, NP, and PP
In the earlier guidelines, PP complements or modifiers preceding the
head noun in NP were always assumed to move out of NP and to be
daughters of IP. However, a fair number of cases could not be easily
annotated this way. The current guidelines treat movement out of all
three categories in the same way - in particular, such movement is now
indicated only if it is not string-vacuous. The current treatment is
not necessarily correct for the NP case, especially for later stages of
the language, but it is more uniform and thus easier to apply
consistently, it covers more of the cases, and it also facilitates
comparisons with historical French.
Predicate adjective or noun phrases
Predicate adjective phrases are explicitly tagged ADJP-PRD
(rather than bare ADJP), and predicate noun phrases are
tagged NP-PRD (rather than NP-OB1).
Punctuation
Punctuation (including parentheses and brackets) is uniformly tagged as
PUNC. As in earlier releases, it is attached as high as possible
(not necessarily where it "belongs").
| PUNC is not on the default "ignore_nodes" list of
CorpusSearch and needs to be added, if necessary, to the list by the
user (best in a preference file).
|
Word tokenization
We have made word tokenization more uniform between Middle English and
the later stages of the language, resulting in fewer "plus" tags in the
corpora, and especially unusual ones
like D+ADJ, P+PRO, etc. In addition, there are now
only a handful of items that are tokenized differently in the Middle
English corpus than in the later ones.
See Annotation differences among the
corpora for details.