The annotation uses a limited tree representation in the form of labeled parentheses. All open parentheses have an associated label, either a part-of-speech (POS) label for words (N, Adj, etc.) or a phrase label for larger constituents (NP, ADJP, etc.). We use the terms 'label' and 'tag' interchangeably. Every terminal node is associated with a POS label, but phrase labels are not included in every case where a fully labeled tree would require them. Intermediate projections in the sense of X' theory (N', ADJ', etc.) are not generally included in our representations. By comparison to trees in current syntactic theory, the trees in our corpora are flat, and they are not required to be binary-branching.
The partial representation of phrase structure just described is not intended to make a theoretical statement, but is adopted for practical reasons. Certain phrases are generally omitted in the annotation scheme because their boundaries are too difficult to define. The prime example is VP. The problematic character of VP is particularly obvious in early Middle English, where the order of the verb and its syntactic dependents is in flux (at least on the surface). But even in Present-Day English, the attachment site of verbal adjuncts is systematically ambiguous between low attachment within VP and high attachment at the clause level. Other categories and distinctions, such as including an NP level within DP, are omitted because the cost of including them outweighs their usefulness. Intermediate projections are omitted for both reasons. In no case should the lack of any particular phrase label be taken to imply that earlier forms of English failed to include the corresponding syntactic category. The trees in the corpora are simply underspecified.
The examples in this section of the manual are constructed examples
from Modern English so as to be maximally accessible. In the remainder of
the manual, we also include examples from the corpora. The examples are
mostly from late Middle English and (Early) Modern English; examples from
early Middle English are included when they are necessary to make a
linguistic point.
General principles
As just mentioned, the structures in our corpora generally include neither
VP nor intermediate projections. As a result, IP immediately dominates
all verbs, as in the following structure:
( (IP-MAT (NP-SBJ (NPR Mary))
(HVP has)
(BEN been)
(VAG meaning)
(IP-INF (TO to)
(VB go))
(PP (P for)
(NP (D a) (N week)))
(PUNC .)))
Dash tags
NP-SBJ subject NP
ADJP-SPR secondary predicate ADJP
PP-PRN parenthetical PP
IP-INF-PRP purpose infinitive
NP-SBJ-RSP resumptive subject NP
( (IP-MAT (NP-TMP (N Yesterday)) ← temporal NP
(NP-SBJ (NPR Mary)) ← subject
(VBD told)
(NP-OB2 (NPR Jane)) ← indirect object
(CP-THT (C that)
(IP-SUB (NP-SBJ (PRO she))
(VBD studied)
(NP-MSR (QP (ADVR too) (Q much))) ← measure NP
(PP (P during)
(NP (D the) (N weekend))))) ← no dash tag
(PUNC .)))
( (IP-MAT (NP-SBJ (NPR Mary))
(ADVP (ADV happily))
(VBD put)
(NP-OB1 (D the) (N book))
(ADVP-LOC (ADV there))
(ADVP-TMP (ADV+WARD afterward))
(PUNC .)))
( (IP-MAT (NP-SBJ (NPR Mary))
(VBD put)
(NP-OB1 (D the) (N book))
(PP (P on)
(NP (D the) (N table)))
(PP (P on)
(NP (NPR Saturday)))
(PUNC .)))
Structural principles
( (IP-MAT (CONJ But) ← sentential conjunction
(INTJ alas) ← single-word interjection
(, ,)
(NP-SBJ (PRO we))
(MD will) ← modal
(NEG not) ← negation
(Q all) ← floated quantifier
(VB end) ← verb
(RP up)) ← particle
(PP (P with)
(NP (PRO$ our) (N favorite)))
(. .)))
( (IP-MAT (NP-SBJ (NPR Mary))
(VBD saw)
(NP-OB1 (D the) (N man))
(PP (P with) ← daughter of IP-MAT, not NP-OB1
(NP (D the) (N telescope)))
(PUNC .)))
(NP (D the)
(N story)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO they))
(VBP tell)))
(PP (P about) ← daughter of NP, not IP-SUB
(NP (PRO the)
(N king))))
(NP (D the)
(N story)
(RRC (BAG being)
(VAN told))
(PP (P about) ← daughter of NP, not RRC
(NP (PRO the)
(N king))))
Internal structure of phrases
The internal structure of all nonclausal phrases is fundamentally similar.
(NP (Q many) ← no QP
(ADJ happy) ← no ADJP
(NS children)
(PP (P on)
(NP (D the) (ADJ overcrowded) (N beach))))
(NP (ADJP (ADJ happy) (CONJ and) (ADJ excited)) ← ADJP
(NS children))
(NP (Q many) ← no QP
(ADJP (ADV very) (ADJ happy)) ← ADJP
(NS children))
(NP (QP (ADV very) (Q many)) ← QP
(ADJ happy) ← no ADJP
(NS children))
(ADJP (ADV very) (ADJ happy)) ← no ADVP
(PP (ADV right) ← no ADVP
(P up)
(NP (D the) (N street)))
(NP (ADJ various) ← no ADJP for either adjective
(ADJ black)
(NS cats)))
(NP (ADJ black)
(NS cats)
(ADJP (ADJ galore))) ← ADJP for post-head modifier
(NP (ADJR enough) ← no ADJP
(N food))
(ADJP (ADJ fast)
(ADVP (ADVR enough)) ← ADVP for post-head modifier
Internal structure of clauses
IP
Various aspects of the internal structure of IPs have already been noted.
IPs generally have subjects. If the subject is not overt,
an empty subject is added.
( (IP-MAT (NP-SBJ (PRO They)) ← overt subject (VBD came) (PP (P at) (NP (NUM six))) (PUNC ,))) ( (IP-MAT (CONJ and) (NP-SBJ *con*) ← empty subject (VBD left) (PP (P at) (NP (NUM eight))) (PUNC .)))
But subjects are not obligatory in imperatives or infinitives.
( (IP-IMP (VBI Eat) ← no subject (NP-OB1 (PRO$ your) (NS vegetables)) (PUNC .))) ( (IP-IMP (DOI Do@) (NEG @n't) (NP-SBJ you) ← only overt subject indicated (VB dare) (PUNC !))) ( (IP-MAT (NP-SBJ (PRO We)) (VBP expect) (IP-INF (TO to) ← no subject (VB win)) (PUNC .))) ( (IP-MAT (NP-SBJ (PRO We)) (VBD heard) (IP-INF (NP-SBJ (PRO them)) ← only overt subject indicated (VB arrive)) (PUNC .)))
(CP (C that / 0) (IP ...))
The complementizer position is always included; when not filled by an overt complementizer, it contains 0 (zero).
( (IP-MAT (NP-SBJ (PRO We)) (VBD know) (CP-THT (C that / 0) (IP-SUB (NP-SBJ (PRO you)) (VBP like) (NP-OB1 (NS vegetables)))) (PUNC .)))
(CP (WXP ...) (C that / 0) (IP ...))
Both positions can be overtly filled, which was common in Middle English. Empty wh- positions and empty complementizers are both indicated by 0 (zero). The wh- phrase is coindexed with a trace of the same category within the IP. See Wh- trace for details, particularly Position of traces for the counterintuitive of the trace in IP-initial position.
(NP (D the) (N people) (CP-REL (WNP-1 (WPRO who)) ← overt wh-phrase (C 0) ← empty complementizer (IP-SUB (NP-OB1 *T*-1) (NP-SBJ (PRO you)) (VBD met)))) (NP (D the) (N people) (CP-REL (WNP-1 0) ← empty wh-phrase (C that) ← overt complementizer (IP-SUB (NP-OB1 *T*-1) (NP-SBJ (PRO you)) (VBD met)))) (NP (D the) (N people) (CP-REL (WNP-1 0) ← empty wh-phrase (C 0) ← empty complementizer (IP-SUB (NP-OB1 *T*-1) (NP-SBJ (PRO you)) (VBD met)))) ( (IP-MAT (NP-SBJ (PRO We)) (VBP understand) (CP-QUE-SUB (WADJP-1 (WADV how) (ADJ serious)) ← overt wh-phrase (C that / 0) ← overt / empty complementizer (IP-SUB (ADJP-PRD *T*-1) (NP-SBJ (D the) (N situation)) (BEP is))) (PUNC .)))
(PP (P if) (CP-ADV (C 0) ← C, no inversion (IP-SUB (NP-SBJ (PRO I)) (HVD had) (VBN known)))) (CP-ADV (IP-SUB (HVD had) ← inversion, no C (NP-SBJ (PRO I)) (VBN known))) ( (IP-MAT (NP-SBJ (PRO)) (VBD wrote) (CP-QUE-SUB (WADVP-1 (WADV when)) (C 0) ← C, no inversion (IP-SUB (ADVP-TMP *T*-1) (NP-SBJ (PRO they)) (MD will) (VB come))) (PUNC .))) ( (CP-QUE-MAT (WADVP-1 (WADV when)) ← inversion, no C (IP-SUB (ADVP-TMP *T*-1) (MD will) (NP-SBJ (PRO they)) (VB come)) (PUNC ?)))