Introduction to syntactic annotation


The annotation uses a limited tree representation in the form of labeled parentheses. All open parentheses have an associated label, either a part-of-speech (POS) label for words (N, Adj, etc.) or a phrase label for larger constituents (NP, ADJP, etc.). We use the terms 'label' and 'tag' interchangeably. Every terminal node is associated with a POS label, but phrase labels are not included in every case where a fully labeled tree would require them. Intermediate projections in the sense of X' theory (N', ADJ', etc.) are not generally included in our representations. By comparison to trees in current syntactic theory, the trees in our corpora are flat, and they are not required to be binary-branching.

The partial representation of phrase structure just described is not intended to make a theoretical statement, but is adopted for practical reasons. Certain phrases are generally omitted in the annotation scheme because their boundaries are too difficult to define. The prime example is VP. The problematic character of VP is particularly obvious in early Middle English, where the order of the verb and its syntactic dependents is in flux (at least on the surface). But even in Present-Day English, the attachment site of verbal adjuncts is systematically ambiguous between low attachment within VP and high attachment at the clause level. Other categories and distinctions, such as including an NP level within DP, are omitted because the cost of including them outweighs their usefulness. Intermediate projections are omitted for both reasons. In no case should the lack of any particular phrase label be taken to imply that earlier forms of English failed to include the corresponding syntactic category. The trees in the corpora are simply underspecified.

The examples in this section of the manual are constructed examples from Modern English so as to be maximally accessible. In the remainder of the manual, we also include examples from the corpora. The examples are mostly from late Middle English and (Early) Modern English; examples from early Middle English are included when they are necessary to make a linguistic point.

General principles

As just mentioned, the structures in our corpora generally include neither VP nor intermediate projections. As a result, IP immediately dominates all verbs, as in the following structure:

( (IP-MAT (NP-SBJ (NPR Mary))
          (HVP has)
          (BEN been)
          (VAG meaning)
          (IP-INF (TO to)
                  (VB go))
          (PP (P for)
	      (NP (D a) (N week)))
	  (PUNC .)))

Dash tags

Structural principles

Internal structure of phrases

The internal structure of all nonclausal phrases is fundamentally similar.

Internal structure of clauses

IP

Various aspects of the internal structure of IPs have already been noted. IPs generally have subjects. If the subject is not overt, an
empty subject is added.

( (IP-MAT (NP-SBJ (PRO They))             ← overt subject
          (VBD came)
          (PP (P at)
              (NP (NUM six)))
	  (PUNC ,)))

( (IP-MAT (CONJ and)
          (NP-SBJ *con*)                  ← empty subject
          (VBD left)
          (PP (P at)
              (NP (NUM eight)))
	  (PUNC .)))

But subjects are not obligatory in imperatives or infinitives.

( (IP-IMP (VBI Eat)                              ← no subject
          (NP-OB1 (PRO$ your) (NS vegetables))
	  (PUNC .)))

( (IP-IMP (DOI Do@)
          (NEG @n't)
          (NP-SBJ you)                           ← only overt subject indicated
          (VB dare)
	  (PUNC !)))

( (IP-MAT (NP-SBJ (PRO We))
          (VBP expect)
          (IP-INF (TO to)                        ← no subject
                  (VB win))
          (PUNC .)))

( (IP-MAT (NP-SBJ (PRO We))
          (VBD heard)
          (IP-INF (NP-SBJ (PRO them))            ← only overt subject indicated
                  (VB arrive))
          (PUNC .)))

Non-wh CP

THAT clauses (CP-THT), degree complements (CP-DEG), and certain adverbial clauses (CP-ADV) have the following basic structure:

(CP (C that / 0)
    (IP ...))

The complementizer position is always included; when not filled by an overt complementizer, it contains 0 (zero).

( (IP-MAT (NP-SBJ (PRO We))
          (VBD know)    
          (CP-THT (C that / 0)
                  (IP-SUB (NP-SBJ (PRO you))
			  (VBP like)
			  (NP-OB1 (NS vegetables))))
	  (PUNC .)))

Wh- CP

All other types of CP contain both a wh- position and a complementizer position. The schematic struture is as follows, where "WXP" ranges over wh- phrases:

(CP (WXP ...)
    (C that / 0)
    (IP ...))

Both positions can be overtly filled, which was common in Middle English. Empty wh- positions and empty complementizers are both indicated by 0 (zero). The wh- phrase is coindexed with a trace of the same category within the IP. See Wh- trace for details, particularly Position of traces for the counterintuitive of the trace in IP-initial position.

(NP (D the) (N people)
    (CP-REL (WNP-1 (WPRO who))                                   ← overt wh-phrase
            (C 0)                                                ← empty complementizer
            (IP-SUB (NP-OB1 *T*-1)          
                    (NP-SBJ (PRO you))
	            (VBD met))))

(NP (D the) (N people)
    (CP-REL (WNP-1 0)                                            ← empty wh-phrase
            (C that)                                             ← overt complementizer
            (IP-SUB (NP-OB1 *T*-1)          
                    (NP-SBJ (PRO you))
	            (VBD met))))

(NP (D the) (N people)
    (CP-REL (WNP-1 0)                                            ← empty wh-phrase
            (C 0)                                                ← empty complementizer
            (IP-SUB (NP-OB1 *T*-1)          
                    (NP-SBJ (PRO you))
	            (VBD met))))

( (IP-MAT (NP-SBJ (PRO We))
          (VBP understand)  
          (CP-QUE-SUB (WADJP-1 (WADV how) (ADJ serious))         ← overt wh-phrase
                      (C that / 0)                               ← overt / empty complementizer
                      (IP-SUB (ADJP-PRD *T*-1)
                              (NP-SBJ (D the) (N situation))
			      (BEP is)))
	  (PUNC .)))

Verb movement to C

Subject-verb inversion
questions, exclamatives, V1 conditionals, and other constructions is not explicitly represented as verb movement to C in our annotation scheme. The inverted verb remains a daughter of IP. However, clauses with inversion differ structurally from ones without inversion in not containing a C position.

(PP (P if)
    (CP-ADV (C 0)			 ← C, no inversion
            (IP-SUB (NP-SBJ (PRO I))
                    (HVD had)
                    (VBN known))))

(CP-ADV (IP-SUB (HVD had)		 ← inversion, no C
	        (NP-SBJ (PRO I))
                (VBN known)))

( (IP-MAT (NP-SBJ (PRO))
          (VBD wrote)
          (CP-QUE-SUB (WADVP-1 (WADV when))
                      (C 0)			      ← C, no inversion
                      (IP-SUB (ADVP-TMP *T*-1)
                              (NP-SBJ (PRO they))
                              (MD will)
			      (VB come)))
	  (PUNC .)))

( (CP-QUE-MAT (WADVP-1 (WADV when))		      ← inversion, no C
              (IP-SUB (ADVP-TMP *T*-1)
   	              (MD will)
	              (NP-SBJ (PRO they))
		      (VB come))
	      (PUNC ?)))