Internal structure of phrases

See Conjunction and gapping for discussion of the following labels: CONJP, ADJX, ADVX, NX, NUMX

This section of the documentation focuses on the internal structure of phrases, abstracting away from the grammatical or semantic roles that they play within larger structures, particularly within clauses. These roles are indicated by so-called dash tags and discussed in the section on Grammatical and semantic functions. The internal structure of clauses (IPs and CPs) is also discussed separately.

The general schema

The general schema for phrases consists of a unique head, possibly accompanied by syntactic dependents. Dependents is a cover term for complements (arguments) and modifiers (adjuncts); for more discussion, see Grammatical and semantic functions. Dependents are uniformly labelled as phrases, regardless of how many words they contain. There are very few exceptions, which follow because of other annotation conventions; see Heads for discussion.

( (ADJP (ADVP (ADV very))		← ADVP = pre-head dependent
	(ADJ happy)			← head of ADJP
	(PP (P with)			← PP = post-head dependent
	    (NP (D the)
		(N outcome)))))

( (ADVP (NP-MSR (NUMP (NUM two))	← NP-MSR = pre-head dependent
		(NS years))		
	(ADV ago))) 			← head of ADVP

( (NP (D the)				← head (exceptional sister to N)
      (ADJP (ADJS best))		← ADJP = dependent of N
      (N outcome)))			← head of NP

Heads

The following table lists the phrase types in our annotation system along with their possible heads.

Phrase Possible head
ADJP ADJ, ADJR, ADJS, ADJ-COMP
ADVP ADV, ADVR, ADVS
INTJP INTJ
NP N, NS, NPR, NPRS, Q+N, N-COMP
in the absence of the above: D, EX, PRO
NUMP NUM, NUM-COMP
PP P, RP
QP Q, QR, QS
VP MD, inflectional variants of BE, DO, GT, HV, VB
wh- phrase same as for corresponding ordinary phrase

As mentioned earlier, phrases each have a unique head. It should not, therefore, be possible for two categories from the preceding table to be sisters. This is generally true, but there are a very few exceptions, which follow from the fact that not all POS categories always project phrases in our system. (Recall that our annotation conventions are intended to facilitate searches and do not express a theoretical commitment to the details of a particular structure.) The ramifications are discussed in more detail in the appropriate sections.

Empty heads. Heads of phrases can be empty (= silent). As noted in Known issues, such empty heads are not annotated consistently. At least for the moment, they are added opportunistically to clarify otherwise difficult or apparently unmotivated structures.
( (IP-MAT (NP-SBJ (PRO He))
	  (VP (BED was)
	      (ADJP-PRD (NP-MSR (NUMP (NUM twenty-one))
				(NS years))
			(ADJ old)))
	  (PUNC ,)))

( (IP-MAT (CONJ and)
	  (NP-SBJ (PRO she))
	  (VP (BED was)
	      (ADJP-PRD (NP-MSR (NUMP (NUM eighteen))
				(NS 0))
			(ADJ 0)))
	  (PUNC .)))

Complement

As mentioned earlier, complements (by virtue of being dependents) are always treated as phrases. The following phrasal categories can be complements:

ordinary: ADJP (with linking verb or, rarely, with P), ADVP, NP, PP, VP
clausal: CP, IP
other: FRAG, QTP, XX

NUMP and QP do not serve as complements of their own, but must be dominated by some type of NP. (Stranded IP-level QP, an apparent exception, is not a complement).

Modifier

As mentioned earlier, modifiers (by virtue of being dependents) are treated as phrases (apart from particles in particle-verb combinations).

At the VP/IP level, modifiers generally bear a dash tag indicating a semantic function. But ADVPs are bare if they are not marked as directional (-DIR), locative (-LOC), or temporal (-TMP) modifiers, and PPs are always unmarked for function.

Modifiers at levels lower than the VP/IP level generally do not bear a dash tag. But measure NPs are always marked as such (-MSR), and adnominal NP modifiers are marked as possessive (-POS) or adverbial (-ADV, -DIR, -LOC, -TMP) in order to avoid confusion about the dependency relations within NP.

Head or dependent?

It is sometimes difficult to resolve the dependency relations among phrases, particularly in cases with several spatial adverbs, particles, and prepositions. The following considerations, based on diagnostic distributional patterns, should resolve doubtful cases. The distinction between complements and modifiers is not explicitly represented in the annotation, but taking the distinction into account generally facilitates the decision process. The section on the internal structure of PPs contains further examples of difficult cases.

Ordinary phrases

ADJP

( (ADJP (ADVP (ADV very))
	(ADJ proud)
	(PP (P of)
	    (NP (D the)
		(N result)))))

( (ADJP (ADJR happier)
	(PP (P than)
	    (NP (NP-POS (PRO$ their))
		(NS competitors)))))

( (ADJP (ADJ happy)
	(CP-THT (C that)
		(IP-SUB (NP-SBJ (PRO they))
			(VP (VBD won))))))

( (PP (P for)
      (ADJP (ADJ sure))))

ADVP

( (ADVP (QP (QR more))
        (ADV happily)
        (PP (P than)
	    (ADVP (ADV before)))))

( (ADVP (NP-MSR (QP (Q many))
		(NS years))
	(ADVR later)))

ADVP is the default category for phrases with adverbial function. This is true even if it gives rise to category mismatches in connection with exocentric structures.

( (IP-MAT (NP-SBJ (PRO I))
	  (VP (VBD arrived)
	      (ADVP-TMP (CP-FRL (WNP-1 (ADVP (ADV just))
				       (WD what)
				       (N time))
				(IP-SUB (NP-SBJ (PRO he))
					(VP (NP-TMP *T*-1)
					    (VBD left))))))
	  (PUNC .)))

INTJP

The label INTJP is used in the following cases:

( (QTP (IP-MAT (NP-SBJ (N-COMP (NPR Mister) (NPR Long)))
	       (VP (VBD told)
		   (NP-OB2 (PRO me))
		   (IP-INF (TO to)
			   (VP (VB go)
			       (PP (P with)
				   (NP (N-COMP (NPR Emory) (NPR Cook)))))))
	       (PUNC ,)
	       (INTJP (PP (P by)			← BY GOD		
			  (NP (NPR God)))))
       (PUNC ,)))

( (FRAG (INTJ Mmhmm)
	(PUNC ,)
	(INTJ oh)
	(INTJP (NP (N dear)))				← DEAR
	(PUNC .)))


( (FRAG (INTJ Oh)
	(INTJP (ADJP (ADJ good))			← GOOD GRACIOUS
	       (ADJP (ADJ gracious)))
	(PUNC ,)))

( (IP-MAT (INTJ Oh)
	  (INTJP (NP (NP-POS (PRO$ my))))		← MY
	  (PUNC ,)
	  (NP-SBJ (PRO we))
	  (VP (QP (Q all))
	      (DOD did))
	  (PUNC .)))

( (FRAG (INTJP (NP (PRO$ My)
		   (NPR God)))
	(PUNC .)))

( (FRAG (INTJP (NP (PRO$ My)
		   (N goodness)))
	(PUNC .)))

( (FRAG (INTJ Oh)
	(INTJP (NP (NP-POS (PRO$ my))
		   (INTJ gosh)))
	(PUNC .)))

( (IP-MAT (NP-SBJ (D that@))
	  (VP (HVP @'s)
	      (VP (BEN been)
		  (INTJ uh)
		  (CODE {hesitating})
		  (INTJP (WNP (WPRO what)))		← WHAT
		  (PUNC ,)
		  (NP-PRD (NUMP (NUM forty-some))
			  (NS year))))
	  (PUNC ?)
	  (CODE {laughing})))

Otherwise, interjections are annotated as bare INTJ.

( (FRAG (INTJ Well)
	(PUNC ,)
	(INTJ oh)
	(INTJ gaw)
	(PUNC ,)
	(NP (N-COMP (NPR Bob) (NPR Martin)))
	(PUNC .)))

( (IP-MAT (INTJ Goodness)				← honorary INTJ
	  (PUNC ,)
	  (NP-SBJ (D that))
	  (VP (MD must)
	      (VP (HV have)
		  (VP (BEN ben)
		      (ADJP-PRD (ADJ tough)))))
	  (PUNC .)))

( (CP-EXC (INTJ God)					← honorary INTJ
	  (PUNC ,)
	  (WADJP (WADVP (WADV how))
		 (ADJ disappointing))
	  (PUNC .)))

NP

For simplicity, our annotation scheme treats noun phrases as projections of N rather than of D. (Recall that this not imply a theoretical rejection of the DP analysis of noun phrase structure.) D, EX and PRO function as heads of NP in the absence of N.

In general, NP do not dominate other bare NPs, except in connection with conjunction and calendar dates.

( (NP (N water)))

( (NP (NPR Paris)))

( (NP (ADJP (ADJ pretty))
      (NS pictures)

( (NP (NUMP (NUM three))
      (QP (QR more))
      (NS examples)

( (NP (D those)))

( (NP (EX there)))

( (NP (PRO ourselves)))

Because noun phrases are treated as projections of N, prenominal determiners violate our convention that phrases have a unique head (or more precisely, dominate at most one word-level category).

( (NP (D a)
      (ADJP (ADVP (ADV very))
	    (ADJ nice))
      (N room)))

( (NP (D the)
      (N opportunity)
      (PP (P of)
	  (NP (D a)
	      (N lifetime)))))

( (NP (D those)
      (NS books)))

( (NP (D those)
      (NUMP (NUM two))
      (NS books)))

More than with any other phrase type, noun phrases can be headed by a silent nominal head. As mentioned in Known issues, this silent head is not always explicitly indicated, but it informs the annotation. In particular, it prevents other categories (notably, ADJ, NUM, and Q) from serving as heads of NP; rather, when immediately dominated by NP, these categories invariably function as nominal modifiers.

( (NP (D the)
      (ADJP (ADJ rich))))

( (NP (D the)
      (ADJP (ADJ English))))

( (NP (D those)
      (NUMP (NUM two))))

( (NP (QP (Q many))
      (QP (QR more))
      (PP (P of)
          (NP (D that)
	      (N type)))))

Postnominal ADJPs are general annotated as predicates (PRD) of reduced relative clauses (IP-RRC). But postnominal ENOUGH and SUCH are annotated as bare ADJP, since they do not allow an IP-RRC paraphrase.

NUMP

The following examples illustrate simple instances of NUMP. The annotation of number expressions more generally raises special issues and is discussed in full detail in Numbers.
( (NP (NUMP (NUM one))
      (N house)))

( (NP (NUMP (NUM five))
      (NS houses)))
  
( (NP (NUMP (ADVP (ADV about))
	    (NUM five))
      (NS houses)))

( (NP (NUMP (ADVP (ADV around))
	    (NUM ten))
      (NS children)))

( (NP (NUMP (ADVP (ADV probably))
	    (NUM five))
      (NS houses)))

( (NP (NUMP (NUMP (NUM two))
	    (CONJP (CONJ or)
		   (NUMP (ADVP (ADV maybe))
			 (FP even)
			 (NUM three))))
      (NS houses)))

PP

Ordinary PPs are annotated as follows.
( (PP (P without)
      (NP (D the)
	  (N shadow)
	  (PP (P of)
	      (NP (D a)
		  (N doubt))))))

Particles are a subtype of preposition. They are tagged as P or as RP and sometimes fail to project PP.

Difficult cases. Sequences involving several adverbs, prepositions, and particles are a prime breeding ground for doubtful cases concerning dependency relations. They are resolved according to the general principles in Head or dependent? The following examples illustrate the principles in action.

  • Head + complement. These cases generally involve ablative heads such as FROM, OF, and OFF.
    ok stay in from the rain
    *  stay in from the rain
    ok stay in from the rain
    
    ( (VP (VB stay)
          (PP (RP in)
    	  (PP (P from)
    	      (NP (D the)
    		  (N rain))))))
    
    ok keep out of trouble
    *  keep out of trouble
    ok keep out of trouble
    
    ( (VP (VB keep)
          (PP (RP out)
    	  (PP (P of)
    	      (NP (N trouble))))))
    
  • Head + modifier. This case is the default when a PP headed by RP depends on a head expressing motion (most commonly an ordinary verb of moation, but more generally a verb, including BE, that is construable in context as a motion verb or a noun like PATH or WAY). The head-modifier relation is often recursive. (Even with verbs of motion, though, it is possible to have head-complement relations, depending on the diagnostic distributional patterns.)
    ok run on down the hill
    ok run on down the hill
    ok run on down the hill
    
    ( (VP (VB run)
          (PP (RP on)
              (PP (P down)
    	      (NP (D the)
    	          (N hill))))))
    
    ok jingle on home
    ok jingle on home
    ok jingle on home
    
    ( (VP (VB jingle)
          (PP (RP on)
    	  (ADVP (ADV home))))))
    
    ok run on down through the forest
    ok run on down through the forest
    ok run on down through the forest
    
    ok ... on down through the forest
    ok ... on down through the forest
    ok ... on down through the forest
    
    ( (VP (VB run)
          (PP (RP on)
              (PP (RP down)
    	      (PP (P through)
    		  (NP (D the)
    		      (N forest)))))))
    
    ok the path on down through to home
    ok the path on down through to home
    ok the path on down through to home
    
    ok ... on down through to home
    ok ... on down through to home
    ok ... on down through to home
    
    ok ... down through to home
    ok ... down through to home
    ok ... down through to home
    
    ( (NP (D the)
          (N path)
          (PP (RP on)
              (PP (RP down)
    	      (PP (RP through)
    		  (PP (P to)
    		      (ADVP (ADV home))))))))
    

  • Pre-head modifier. This is the default for PPs construed with non-motion verbs (though see above for head-complement cases involving ablative heads).
    ok live on down the hill
    ok live on down the hill
    *  live on down the hill (* on intended reading)
    
    ( (VP (VB live)
          (PP (PP (RP on))
              (P down)
    	  (NP (D the)
    	      (N hill)))))
    
    ok live on down in that area
    ok live on down in that area
    *  live on down in that area (* on intended reading)
    
    ok down in that area
    ok down in that area
    * down in that area (* on intended reading)
    
    ( (VP (VB live)
          (PP (PP (RP on))
              (PP (RP down))
    	  (P in)
    	  (NP (D that)
    	      (N area))))))
    

    QP

    ( (NP (QP (Q many))
          (NS accidents)))
    
    ( (NP (QP (Q many))
          (QP (QR more))
          (NS accidents)))
    
    ( (NP (QP (ADVP (ADV overly))
    	  (Q many))
          (NS accidents)))
    
    ( (NP (QP (ADVP (NP-MSR (QP (Q all)))
    		(ADVR too))
    	  (Q many))
          (NS accidents)))
    

    VP

    See also
    Logical subject (NP-LGS).

    ( (VP (MD might)
          (VP (HV have)
    	  (VP (BEN been)
    	      (VP (BAG being)
    		  (VP (VAN built)))))))
    

    Verbal modifiers attach as low as is consistent with the meaning.

    A and ASP do not project their own VP, but attach as sisters of the adjacent verb.

    ( (VP (A a=)
          (VAG hunting)))
    
    ( (IP-MAT (NP (PRO They))
    	  (VP (ASP done)
    	      (VBD told)
    	      (NP-OB2 (PRO me)))))
    

    Wh- phrases

    Wh- phrases are annotated analogously to their non-wh counterparts. We discuss them separately here for convenience, not for any theoretical consideration.

    WADJP

    ( (WADJP (WADVP (WADV how))
      	 (ADJ beautiful)))
    

    WADVP

    ( (WADVP (WADVP (WADV how))
      	 (ADV quickly)))
    

    In wh- CPs where a trace has adverbial function and its silent antecedent could be WADVP or WNP, the default category is WADVP. This is true even if it gives rise to category mismatches across the CP.

      ( (IP-MAT (NP-SBJ (PRO I))
    	    (VP (VBP remember)
    		(NP-OB1 (D the)
    			(ADJP (ADJ first))
    			(N time)
    			(CP-REL (WADVP-1 (WADV 0))
    				(IP-SUB (NP-SBJ (PRO he))
    					(VP (ADVP-TMP *T*-1)
    					    (VBD came))))))
    	    (PUNC .)))
    

    WNP

    ( (WNP (WD which)
           (N terminal)))
    
    ( (WNP (WD what)
           (ADJP (ADJ quick))
           (N service)))
    
    ( (WNP (WPRO what)))
    

    The proper analysis of WHAT A + noun is not clear, but we annotate it as follows:

    ( (WNP (WNP (WPRO what))
           (D a)
           (N nightmare)))
    

    WNUMP

    ( (WNP (WNUMP (WQP (WADVP (WADV how))
    		   (Q many))
    	      (NUM thousand))
           (NS feet)))
    

    WPP

    ( (WPP (P at)
           (WNP (WD what)
    	    (N point))))
    
    ( (WPP (ADVP (ADV just))
           (P to)
           (WNP (WD what)
    	    (N extent))))
    
    ( (WPP (P during)
           (WNP (WPRO which))))
    

    WQP

    ( (WNP (WQP (WADVP (WADV how))
    	    (Q many))
           (NS people)))
    
    ( (WNP (WQP (WADVP (WADV how))
    	    (Q much))
           (N work)))
    

    Special phrases

    FRAG

    See also
    XX.

    FRAG is used to label the following types of material:

    Root VPs missing more than just the subject are treated as FRAG, and the elided material is not included in the main parse (though it may be indicated by an ELL[ipsis] comment).

    ( (FRAG (CODE (ELL:Do_you})
    	(VP (VB Wan@)
    	    (IP-INF (TO @na)
    		    (VP (VB come)
    			(ADVP (ADV along)))))
    	(PUNC ?)))
    

    Finite VPs without an overt subject are treated as FRAG if the VP is the short answer to a preceding question.

    ( (CP-QUE-MAT (IP-SUB (BEP Are)
                          (NP-SBJ (PRO you))
    		      (VP (VAG coming)))
    	      (PUNC ?)))
    
    ( (FRAG (VP (ADVP (ADV Sure))
    	    (BEP am))
    	(PUNC !)))
    

    Otherwise, finite VPs without a subject are treated as IP with an empty subject.

    ( (IP-MAT (NP-SBJ (PRO I@))
    	  (VP (MD @'ll)
    	      (VP (VB try)))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO *pro*))
    	  (VP (MD might)
    	      (NEG not)
    	      (VP (VB make)
    		  (NP-OB1 (PRO it))))
    	  (PUNC ,)
    	  (ADVP (ADV though))
    	  (PUNC .)))
    

    Quotation (QTP)

    Although the final P in its label is mnemonic for "phrase", QTP is not an endocentric category; in other words, QTP is not the projection of a QT head.

    QTP encloses direct speech and can be a root node (see Direct speech for examples). The material dominated by QTP is annotated in the ordinary way (with the result that QTP is always unary-braching).

    ( (IP-MAT (NP-SBJ (PRO She))
    	  (VP (VBD said)
    	      (PUNC ,)
    	      (QTP (IP-MAT (NP-SBJ (PRO I@))
    			   (VP (BEP @m)
    			       (VP (VAG coming))))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO She))
    	  (VP (VBD said)
    	      (PUNC ,)
    	      (QTP (FRAG (ADVP (NEG Not)
    			       (ADVP (ADVR so))
    			       (ADV fast)))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO She))
    	  (VP (VBD said)
    	      (PUNC ,)
    	      (QTP (INTJ Hello)))
    	  (PUNC .)))
    

    When not a root node, QTP generally functions a complement of an ordinary verb of saying, but it can also be a complement of BE or GO. Especially in the case of BE, QTP is generally preceded by a quotative marker (ALL, LIKE), which is tagged INTJ.

    XX

    See also FRAG.

    XX is used in the following cases:

    Metalinguistic material

    CODE

    CODE is used for material in the parsed structure that is not part of the audio signal, including the time stamps that link the transcription to the audio files, annotators' comments, and so on. The unique
    token identifier is treated separately and given its own label (ID).

    ID

    ID is the label for the node that contains each sentence token's unique identifier. Tokens consisting only of a META node have no explicit ID node (though the ID counter is incremented as with an ordinary token).

    LS

    LS encloses list markers (metalinguistic material used to identify items on a list). The material dominated by LS is annotated as usual. Letters are tagged as N.
    ( (NP (NP (LS (N a))
    	  (D this) (N one))
          (PUNC ,)
          (CONJP (NP (LS (N b))
    		 (D that) (N one)))
          (PUNC ,)
          (CONJP (CONJ and)
    	     (NP (LS (N c))
    		 (D the)
    		 (ADJP (ADJ other))
    		 (N one)))))
    
    ( (NP (NP (LS (N number) (NUM one))
    	  (D a) (N tent))
          (PUNC ,)
          (CONJP (CONJ and)
    	     (NP (LS (N number) (NUM two))
    		 (D a)
    		 (N flag)))))
    

    META

    META is used for material that is part of the audio signal but falls into the following categories.
    List markers have a separate label (LS). For some corpora, it might be useful to give the the three categories separate labels (say, NON, PARA, and META).

    Nonlinguistic material.

    ( (META {coughing}))
    

    Paralinguistic material is enclosed in META brackets and integrated into the larger context in the ordinary way. META can, but need not, have a dominating phrasal node.

    ( (IP-MAT (CONJ and)
    	  (NP-SBJ (PRO I))
    	  (VP (VBD said)
    	      (PUNC ,)
    	      (QTP (META C-A-P-E)))
    	  (PUNC .)))
    
    ( (FRAG (NP (NP (FP Just) (D the) (NS initials))
    	    (PUNC ,)
    	    (CONJP (CONJ or)
    		   (NP (META E-L-V-I-E))))
    	(PUNC ?)))
    
    ( (IP-MAT (NP-SBJ (PRO They))
              (VP (HVD had)
    	      (META (NUM one) (NUM two) (NUM three))
    	      (NP-OB1 (NUMP (NUM four))
    		      (NS children)))
    	  (PUNC .)))
    

    Metalinguistic material (material that is mentioned rather than used in the ordinary way) is annotated as usual. Bare words mentioned as words are only given their part-of-speech tag and do not project a phrase. The material is then enclosed in META brackets, analogously to how direct speech is annotated and enclosed in QTP brackets. Depending on the conventions of individual projects, the META constituent may be redundantly set off by single quotes. The function of the META constituent in the larger context is annotated as usual. This differs from the treatment of QTP, which is not explicitly annotated as, say, the direct object of a verb of saying.

    As mentioned, titles of books, songs, and so on, count as instances of mention (vs. use). Common nouns may be capitalized in accordance with standard orthographic convention in titles, but they are not tagged as proper nouns unless they would be so tagged outside of the title context.

    ( (IP-MAT (NP-SBJ (PRO I))
    	  (VP (DOP do@)
    	      (NEG @n't)
    	      (VP (VB use)
    		  (NP-OB1 (D the)
    			  (N word)
    			  (PUNC ')
    			  (ELAB (META (VB utilize))))))				← no VP
    	  (PUNC ')
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (NP-POS (PRO$ My))
    		  (N copy)
    		  (PP (P of)
    		      (PUNC ')
    		      (NP (META (NP (N Murder)					← capitalized, but tagged as common noun
    				    (PP (P on)
    					(NP (D the)
    					    (N-COMP (NPR Orient)		← compound proper noun outside of title context
    						    (NPR Express)))))))))
    	  (PUNC ')
    	  (VP (HVP has)
    	      (VP (VBN disappeared)))
    	  (PUNC .)))