Part-of-speech (POS) annotation


  • Alphabetic list of POS tags

    Nouns and related categories (back to top)

    Nouns are classified by the features common vs. proper, singular vs. plural, and possessive. The basic tag for common nouns is N. Proper nouns are tagged NPR. Plural is indicated by trailing S, and possessive by an additional trailing $.

      example singular / basic plural possessive singular possessive plural
    common noun school N NS N$ NS$
    proper noun Kentucky NPR NPRS NPR$ NPRS$
    compound quantifier everyone Q+N Q+N$
    ordinary pronoun they, their PRO PRO$
    wh- pronoun who WPRO WPRO$
    expletive with NP associate there, it EX
    ordinary determiner the, that, those D
    wh- determiner which WD
    Possessive head (see NP-POS) 's POS

    Common noun (N)

    Formally singular count nouns in clearly plural contexts are tagged as NS.
    five mile_NS out of town
    in five year_NS
    

    Bare gerunds are tagged as N when this leads to a significant simplication of the structure.

    ( (IP-MAT (NP-SBJ (PRO I))
    	  (VP (VBP enjoy)
    	      (NP-OB1 (N hunting)))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO I))
    	  (VP (VBP enjoy)
    	      (NP-OB1 (IP-PPL (VP (VAG hunting)))))
    	  (PUNC .)))
    

    Proper noun (NPR)

    The following words are tagged as proper nouns when used as nouns rather than as adjectives.

    Compound quantifier (Q+N)

    Morphologically complex quantifiers beginning with ANY-, EVERY-, NO-, or SOME- are tagged as Q+N($). See Known issues.

    everything_Q+N
    someone's_Q+N$
    

    Pronoun (PRO)

    Ordinary (referential) personal pronoun are tagged as PRO or PRO$, as are the reciprocal pronouns EACH OTHER and ONE ANOTHER.

    The treatment of expletive IT depends on whether it is construed with an NP associate. See IT for details and examples.

    Wh- pronoun (WPRO)

    Wh- pronouns on their own are tagged as WPRO. By contrast, the same items preceding overt nouns are tagged as WD.

    Which_WPRO do you want ?
    the girl who_WPRO came
    

    Expletive with NP associate (EX)

    Expletive THERE (or its pronunciation variant THEY) is tagged as EX.

    When construed with an NP associate (that is, when analogous to standard existential THERE), expletive IT is also tagged EX. See IT for details.

    Ordinary determiner (D)

    Ordinary determiners, whether intransitive or transitive, are tagged as D.
    this_D and that_D
    this_D one and that_D one
    

    Wh- determiner (WD)

    Wh- determiners (WD) are homonymous with wh- pronouns (WPRO) but distinguished from them by preceding an overt noun rather than standing alone.

    Which_WD one do you want ?
    What_WD kind is most popular ?
    

    Verbs and related categories (back to top)

    Verbs consist of a basic label (for BE, DO, GET, HAVE, or an ordinary verb), with variants depending on whether the verb functions as the infinitive, present or past tense, active or passive participle, gerund, or imperative. Verbs are tagged with the same POS tags, regardless of whether they are used as main verbs or auxiliary verbs.

    Lexical infinitive present past participle, active participle, passive gerund imperative
    be BE BEP BED BEN BAG BEI
    do DO DOP DOD DON DAN DAG DOI
    get GT GTP GTD GTN GAN GTG GTI
    have HV HVP HVD HVN HAN HAG HVI
    other verbs VB VBP VBD VBN VAN VAG VBI

    Aspectual A

    Aspectual A is treated as a
    clitic that attaches as a sister to gerunds (whether verbal or nominal).
    ( (VP (A a=)
          (VAG coming)
          (PP (P round)
    	  (NP (D the)
    	      (N mountain)))))
    
    ( (IP-MAT (NP-SBJ (EX They))
    	  (VP (BED was)
    	      (NP-LGS (NP (A a=) (N bickering))
    		      (CONJP (CONJ and)
    			     (NP (A a=) (N fussing))))
    	      (VAG going)
    	      (RP on))
    	  (PUNC .)))
    

    For so-called intrusive A, see X.

    Aspectual marker ASP

    ASP is treated as a sister to a (default) past tense verb.
    ( (IP-MAT (NP-SBJ (PRO She))
    	  (VP (ASP done)
    	      (VBD left))
    	  (PUNC .)))
    

    FOR

    When FOR is used as a complementizer, it is tagged with its own POS tag and attaches as a sister of infinitival TO.
    ( (IP-MAT (NP-SBJ (PRO I@))
    	  (VP (MD @'d)
    	      (VP (VB prefer)
    		  (IP-INF (FOR for)
    			  (NP-SBJ (PRO them))
    			  (TO to)
    			  (VP (VB leave)))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO I@))
    	  (VP (VBP want)
    	      (IP-INF (FOR for)
    		      (TO to)
    		      (VP (VB leave))))
    	  (PUNC .)))
    

    FROM

    When FROM functions as the (implicit) head of a gerund, analogous to TO in infinitival clauses, it is tagged with its own POS tag and attaches as a sister of the gerund VP.
    ( (IP-MAT (NP-SBJ (PRO They))
    	  (VP (MD could@)
    	      (NEG @n't)
    	      (VP (VB prevent)
    		  (IP-ECM (NP-SBJ (PRO him))
    			  (FROM from)
    			  (VP (VAG losing)
    			      (NP (D the)
    				  (N job))))))
    	  (PUNC .)))
    

    MD

    Ordinary modals: CAN, COULD, MAY, MIGHT, MUST, SHALL, SHOULD, WILL, WOULD
    Pseudo-modals: DARE, NEED, OUGHT

    MD counts as the head of a verb phrase. In standard English, modals are necessarily finite, but in Appalachian English, they occur in nonfinite contexts (or at least appear to do so). In particular, multiple modals are both tagged as MD for ease of retrieval (even if a multiple modal analysis turns out not to be correct).

    ( (IP-MAT (NP-SBJ (PRO You))
    	  (VP (MD can)
    	      (VP (VB try)
    		  (NP-OB1 (PRO it))
    		  (RP out)))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO she))
    	  (VP (DOD did@)
    	      (NEG @n't)
    	      (VP (MD might)
    		  (VP (HV =uv)
    		      (VP (VBN went)
    			  (PP (P to)
    			      (NP (N school)))
    			  (NP-MSR (QP (Q some)))))))
    	  (PUNC .)))
    
    ( (IP-MAT (CONJ and)
    	  (NP-SBJ (D a) (N body))
    	  (VP (MD might)
    	      (VP (MD could)
    		  (VP (VB fall))))
    	  (PUNC .)))
    

    The pseudo-modals DARE and NEED are tagged MD when they precede negation (even if they govern an infinitival clause rather than a bare VP).

    ( (IP-MAT (NP-SBJ (PRO They))
    	  (VP (MD dared)			← modal DARE
    	      (NEG not)
    	      (IP-INF (TO to)
    		      (VP (VB come))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO They))
    	  (VP (DOD did@)
    	      (NEG @n't)
    	      (VP (VB dare)			← verbal DARE
    		  (IP-INF (TO to)
    			  (VP (VB come)))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO You))
    	  (VP (MD need@)			← modal NEED
    	      (NEG @n't)
    	      (VP (VB come)))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO You))
    	  (VP (DOP do@)				← verbal NEED
    	      (NEG @n't)
    	      (VP (VB need)
    		  (IP-INF (TO to)
    			  (VP (VB come)))))
    	  (PUNC .)))
    

    The pseudo-modal OUGHT is always tagged MD.

    ( (IP-MAT (NP-SBJ (PRO You))
    	  (VP (MD ought)
    	      (IP-INF (TO to)
    		      (VP (VB try)
    			  (NP-OB1 (PRO it)))))
    	  (PUNC .)))
    

    Silent modals. Appalachian English features notorious amounts of verbal syncretism. In particular, it is often difficult or impossible to tell whether a particular verb form is a (nonstandard) past tense form or a bare form (infinitive) licensed by the silent counterpart of WOULD, marking habitual aspect. Our conventions for annotating silent WOULD (and silent modals more generally) are as follows. If the overt modal were appropriate in context and the larger context contains a licensing instance of that modal, we assume a silent modal in the clause at issue. The silent modal contains a reference to the licensing modal in the form of a plus or minus sign, followed by a counter. The sign refers to whether the licensing modals occurs in the previous (-) or following (+) context, respectively. The counter indicates the distance in sentence tokens from the silent modal. The counter for licensing modals in the same sentence token is zero (0). In some instances, a silent modal analysis seems clearly appropriate, but there is no licensing overt modal in the context. Such cases are indicated by appending "x" to the silent modal.

    ( (IP-MAT (NP-SBJ (PRO We@))
    	  (VP (MD @'d)
    	      (VP (VB go)
    		  (ADVP-DIR (ADV home))
    		  (ADVP-TMP (NP-MSR (QP (Q many))
    				    (NS years))
    			    (ADV ago))))
    	  (PUNC ,)))
    
    ( (IP-MAT (CONJ and)
    	  (NP-SBJ (PRO we))
    	  (VP (MD 0-1)
    	      (VP (VB celebrate)
    		  (NP-OB1 (NPR Christmas))
    		  (ADVP-LOC (ADV there))))
    	  (PUNC .)))
    
    
    ( (IP-MAT (NP-SBJ (PRO We)) (VP (MD 0+1) (VP (VB go) (ADVP-DIR (ADV home)) (ADVP-TMP (NP-MSR (QP (Q many)) (NS years)) (ADV ago)))) (PUNC ,))) ( (IP-MAT (CONJ and) (NP-SBJ (PRO we@)) (VP (MD @'d) (VP (VB celebrate) (NP-OB1 (NPR Christmas)) (ADVP-LOC (ADV there)) (PUNC .)))
    ( (IP-MAT (CP-ADV (C When) (IP-SUB (NP-SBJ (PRO we)) (VP (MD 0+0) (VP (VB go) (ADVP-DIR (ADV home)) (ADVP-TMP (NP-MSR (QP (Q many)) (NS years)) (ADV ago)))))) (PUNC ,) (NP-SBJ (PRO we@)) (VP (MD @'d) (VP (VB celebrate) (NP-OB1 (NPR Christmas)) (ADVP-LOC (ADV there)))) (PUNC .)))

    TO

    When TO functions as the (implicit) head of an infinitival clause (IP-ECM, IP-INF), it is tagged with its own POS tag and attaches as a sister of a nonfinite VP.
    ( (IP-INF (IP-INF (TO to)
    		  (VP (BE be)))
    	  (CONJP (CONJ or)
    		 (IP-INF (NEG not)
    			 (TO to)
    			 (VP (BE be))))))
    

    Modifiers (back to top)

    Modifiers are labelled by basic syntactic category (adjective, adverb, quantifier, numeral). The first three categories have comparative and superlative variants.

      basic comparative superlative wh-
    adjective ADJ ADJR ADJS
    adverb ADV ADVR ADVS WADV
    quantifier Q QR QS
    numeral NUM

    Adjective

    The following expressions are tagged as ADJ:

    Adverb

    Some adverbs that are used as interjections are tagged
    INTJ.

    Degree heads are discussed in detail in Degree and comparative constructions.

    Numeral (see Numbers)

    Quantifier

    Quantifiers including the following items:

    ALL, EVERY, MANY, MORE, MOST, NO, NONE, SOME

    Doubly-marked comparatives are annotated as sequencss of the comparative quantifier MORE and a comparative adjective or adverb.

    ( (ADJP (QP (QR more))
    	(ADJR happier)))
    
    ( (ADVP (QP (QR more))
    	(ADVR quicker)))
    

    Analogously for doubly-marked superlatives.

    ( (ADJP (QP (QS most))
    	(ADJS happiest)))
    
    ( (ADVP (QP (QS most))
    	(ADVS quickest)))
    

    Minor categories (back to top)

    The annotation scheme contains the following minor categories.

    Category Example POS tag
    complementizer because, that C
    coordinating conjunction and, or CONJ
    focus particle only FP
    foreign word mangia FW
    interjection oh INTJ
    negation not, n't NEG
    particle down, in, up RP
    preposition about, in P
    punctuation . , ? PUNC
    wh- complementizer if, whether WQ
    unknown X

    Complementizer (C)

    See

    Coordinating conjunction (CONJ)

    AND, BUT, BOTH, EITHER, OR, NEITHER, NOR

    Anacoluthic clause-medial conjunctions are tagged as X.

    ( (IP-MAT (CONJ-TEMP And)
    	  (CP-ADV (C after)
    		  (IP-SUB (NP-SBJ (PRO we))
    			  (VP (GTD got)
    			      (VP (VAN organized)))))
    	  (PUNC ,)
    	  (X and)
    	  (NP-SBJ (NS things))
    	  (VP (VBD begin)
    	      (IP-INF (TO to)
    		      (VP (VB pick)
    			  (RP up))))
    	  (PUNC .)))
    
    ( (IP-MAT (PP (P through)
    	      (NP (N spring)
    		  (PP (P of)
    		      (NP (D the) (N year)))))
    	  (PAREN (IP-MAT (NP-SBJ (PRO you))
    			 (VP (MD might)
    			     (NEG not)
    			     (VP (VB believe)
    				 (NP-OB1 (PRO it))))))
    	  (X but)
    	  (NP-SBJ (PRO they))
    	  (VP (HVD had)
    	      (INTJ uh)
    	      (IP-ECM (NP-SBJ (NS men))
    		      (VP (A a=)
    			  (VAG going)
    			  (ADVP-X (ADV around)))))
    	  (PUNC ,)))
    

    Dangling conjunctions (AND, BUT, OR) are included as part of a preceding sentence token. They precede BREAK and attach as high as structurally possible (as daughter of IP, generally the root IP).

    ( (IP-MAT (NP-SBJ (PRO He))
    	  (VP (VBD looks)
    	      (NP-PRD (D the) (N part)))
    	  (PUNC ,)
    	  (CONJ but)
    	  (CODE <BREAK>)))
    

    Focus particle (FP)

    The following words can count as focus particles. BUT and JUST also have other uses.

    BUT, EVEN, JUST, ONLY

    Focus particles attach as daughters of the phrase with which they are construed.

    False start (FS)

    Foreign word (FW)

    TO BE ADDED

    Interjection (INTJ)

    INTJ is used for expressions like the following (along with spelling and pronunciation variants):

    AH, AMEN, AW, AYE,
    BOO, BYE,
    DANG, DARN, DARNIT, DOGGONE,
    EH, EW,
    GAD, GAW, GEE, GOLLY, GOOD-BYE, GOSH,
    HA, HEH, HELLO, HEY, HI, HM, HOWDY, HUSH,
    JEEZ,
    KABOOM,
    LORDY,
    MM,
    NAH, NAW, NO, NOPE, NUH-UH
    OKAY, OKIE-DOKIE, OH, OOPS, OY,
    PHEW, PSH,
    SH, SHOO, SHUCKS
    UH, UH-UH, UH-HUH, UM,
    WHOA, WHEW, WHIZ, WHOO, WOO, WOOPS, WOW,
    YAY, YEAH, YES, YESSIRREE, YUP

    Honorary interjections. In addition to these true interjections, the following words, when used as interjections on their own without any modification, are treated as honorary interjections and tagged INTJ. The annotation of honorary interjections forming part of a larger expression is discussed in INTJP.

    ALRIGHT, BOY, DUDE, FUCK and related expressions, GOD, GOODNESS, HEAVENS, HELL LIKE, LORD, MAN, PLEASE, RIGHT, SHOOT, WELL, WHY

    Quotative markers. When introducing quotation phrases (QTP), ALL and LIKE are tagged as INTJ.

    Negation (NEG)

    For negation in conjunction contexts, see also Negation, ALSO, and related particles.

    Otherwise, the default is for NEG to attach as high as structurally possible.

    Particle (RP)

    Particles are a subtype of preposition and are defined as items belonging to the following list.

    ABOUT, ACROSS, BY, DOWN, FRO, IN, OFF, ON, OUT, OVER, THROUGH, (stressed) TO, UNDER, UP, WITH (in varieties that allow COME WITH)

    Depending on their syntactic context, the above items are tagged as RP or P. For details concerning individual items, follow the links in the list. For more general discussion, notably on which tag is appropriate and whether RP projects a phrase (PP) or not, see Internal structure of phrases.

    Possessive morpheme (POS)

    When the possessive morpheme 'S (or bare apostrophe) takes scope over a constituent larger than a simple NP, it is split off and tagged as POS, which then functions as the head of a possessive NP (NP-POS). See the dash tag -POS for examples and discussion.

    Preposition (P)

    Prepositions include the homonymous items listed under Particle and others (DURING, EXCEPT, SINCE, etc.).

    Wh- complementizer (WQ)

    WQ is the POS tag for
    IF when it heads an indirect question and for WHETHER.

    Unknown or mysterious word tag (X)

    X is used for words whose POS tag is unknown. It can also be used for words whose tag is mysterious in context.

    POS tags that are ambiguous between two (or even three) known tags are handled in a more informative manner; the possible tags are concatenated as illustrated under POS ambiguity.

    ( (IP-MAT (CONJ-TEMP and)
    	  (ADVP-TMP (ADV then))
    	  (CP-ADV (C-ADV if)
    		  (IP-SUB (NP-SBJ (PRO we))
    			  (VP (VBD took)
    			      (NP-OB1 (D the)
    				      (ADJP (ADJ whole))
    				      (N family)))))
    	  (CODE {inhaling})
    	  (PUNC ,)
    	  (X that)
    	  (NP-SBJ (PRO he@))
    	  (VP (MD @'d)
    	      (VP (VB take)
    		  (NP-OB1 (D the) (N wagon))))
    	  (PUNC .)))
    
    ( (IP-MAT (NP-SBJ (PRO She))
    	  (VP (VBD died)
    	      (X in)
    	      (ADVP-TMP (NP-MSR (NUMP (ADVP (ADV about))
    				      (NUM two))
    				(NS weeks))
    			(ADV ago)))
    	  (PUNC ,)))
    
    ( (FRAG (INTJ Yeah)
    	(PUNC ,)
    	(NP-SBJ (PRO I))
    	(XX (X like)
    	    (IP-INF (TO ta)
    		    (VP (HV =uv)
    			(VP (VBN died)
    			    (CP-ADV (C-ADV when)
    				    (IP-SUB (NP-SBJ (PRO I))
    					    (VP (VBD moved)
    						(ADVP-DIR (RP over) (ADV here)))))))))
    	(PUNC .)))
    

    Intrusive A. X is used to annotate what the OED calls intrusive A. However, THIS-A-WAY and THAT-A-WAY are treated as simple lexical items.

    ( (IP-MAT (NP-SBJ (PRO They))
    	  (VP (BED was)
    	      (NP-PRD (FP just)
    		      (ADJP (X a=) (ADJ little))
    		      (NS kids)))
    	  (PUNC .)))
    

    Compound word

    Our annotation scheme explicitly indicates compound words, which are enclosed by an appropriate POS tag followed by -COMP. Unlike the corresponding simple POS tags, -COMP labels do not indicate inflectional distinctions (singular vs. plural, proper vs. common noun, and so forth). See
    Joining words for a rule of thumb on when to join two orthographic words and when to treat them as a compound word.

    -COMP labels count as word-level tags. In other words, they count as heads of phrases and are are always dominated by a phrasal label that matches the syntactic category of the -COMP label.

    Apart from the head (generally the rightmost element), the category of a compound word's constituents do not necessarily match the category of the entire compound.

    ( (N-COMP (ADJ high) (N school)))
    
    ( (N-COMP (ADJ Social) (N Security)))
    
    ( (N-COMP (ADJ Social) (N Security) (N card)))
    

    Compound adjective (ADJ-COMP)

    ( (ADJ-COMP (ADJ Greek) (ADJ Orthodox)))
      
    ( (ADJ-COMP (ADJ bright) (ADJ red)))
    

    Compound noun (N-COMP)

    ( (NP (D a)
          (N-COMP (N coal) (N miner))))
    
    ( (NP (QP (Q several))
          (N-COMP (N mine) (NS inspectors))))
    
    ( (NP (D a)
          (N-COMP (NPR Christmas) (N present))))
    
    ( (NP (D a)
          (ADJP (ADJ major))
          (N-COMP (N coal) (N mining) (N operation))))
    
    ( (NP (D the)
          (N-COMP (NPR C) (NPR and) (NPR O) (N canal))))
    

    N-COMP is not recursive, but in some cases, further internal struture is indicated (notably, when a compound noun contains PP).

    ( (NP (D the)
          (N-COMP (NPR District)
                  (PP (P of)
    	          (NP (NPR Columbia))))))
    
    ( (NP (N-COMP (ADJP (ADJ Fourth))
                  (PP (P of)
    	          (NP (NPR July)))
                  (N fireworks))))
    

    Cities or counties together with their states or countries are treated as compound nouns. As usual, N-COMP is non-recursive.

    ( (NP (N-COMP (NPR Harlan) (NPR County))))
    
    ( (NP (N-COMP (NPR Louisville) (PUNC ,) (NPR Kentucky))))
    
    ( (NP (N-COMP (NPR Harlan) (NPR County) (PUNC ,) (NPR Kentucky))))
    
    ( (NP (N-COMP (NPR New) (NPR York) (NPR City))))
    
    ( (NP (N-COMP (NPR New) (NPR York) (PUNC ,) (NPR New) (NPR York))))
    
    ( (NP (N-COMP (NPR Hiroshima) (PUNC ,) (Japan))))
    

    Compound numeral (NUM-COMP) (see Numbers)