POS tags


Noun (and related categories)

List of tags

Category Singular Singular possessive Plural Plural possessive
common noun N N$ NS NS$
proper noun NPR NPR$ NPRS NPRS$
ONE (default) ONE ONE$ ONES ONES$
OTHER (default) OTHER OTHER$ OTHERS OTHERS$

Category Tag
$ possessive marker
EX existential THERE
MAN indefinite pronoun MAN (Middle English)
PRO pronoun
PRO$ pronoun, possessive


Common noun (N, NS)

Singular, collective, or plural noun

Formally singular count nouns are tagged as singular (N), even when construed with a plural verb (as is possible in British usage).

The Government/N have decided to quell the mutiny.

Collective nouns like FOLK and PEOPLE are treated differently in the PPCME2 and in the later corpora.

Compass point

Forms with an overt adjectival suffix (EASTERN, NORTHERN, SOUTHERN, WESTERN, and variants) are tagged ADJ.

the northern/ADJ territories

In connection with proper nouns (SOUTHERN CROSS), the same principles apply to compass point adjectives as to ordinary adjectives.

Otherwise, compass points are tagged N, regardless of whether they precede another noun.

towards the west/N

the north/N face of the mountain

For the treatment of NORTH, SOUTH, EAST, and WEST as part of proper nouns, see Proper noun, especially the section on N + N compounds.

Unit of measure (DAY, POUND, YEAR, etc.)

Units of measure after numbers (TEN YEAR, etc.) are tagged as singular or plural depending on whether number is marked overtly. Middle English forms in -A, -EN, or -S are tagged as plural, and all others as singular.

ix c pound/N, three hondred wynter/N, vii +gere/N			

xl daies/NS, sex monthis/NS, ueale hund wintra/NS, .xx. yeres/NS, .xxx. +gera/NS 

Possessive or genitive noun (N$, NS$, $)

Common nouns standing in a possessive or genitive relationship with other nouns are tagged N$, NS$. As with the plural, genitive marking in early texts predates universal -S. In these cases, N$, NS$ indicate possessive or genitive function rather than any particular morphological form. Conversely, morphologically genitive nouns that do not stand in a relationship with some other noun are not tagged N$, NS$; see
adverbial NPs for examples.
With overt -S marking

+te/D mannes/N$ shrifte/N   'the man's shrift'
kinges/NS$ sunes/NS         'kings' sons'
alre/Q kinge/NS$ king/N     'king of all kings'

Without overt -S marking

+te/D sowle/N$ fode/N       'the soul's food'
his/PRO$ sinne/N$ sore/N    'sorrow of his sin'

The dollar tag ($)

The $ tag generally appears directly only on nominal and pronominal tags (N(S), NPR(S), ONE(S), OTHER(S), (W)PRO), indicating their relationship with other nouns. However, in the absence of an overt noun to host the possessive marker, the $ tag can appear on NUM$ and Q$, notably in the Middle English ALRE plus superlative construction.

In addition to appearing directly on other tags, the $ tag can also appear alone. It is always used alone for HIS in the JOHN HIS BOOK construction, and it is sometimes so used for the possessive clitic ('S, S, and spelling variants), which postdates the texts in the PPCME2, although it appears occasionally in the edited texts. When the possessive clitic is spelled as a separate word, as it sometimes is, it always receives a tag of its own. When spelled together with the preceding word, it is split off in the parsed corpus if it takes scope over a larger constituent than the word to which it is attached. See Genitive/possessive modifier of N for the parsed structures corresponding to the following examples.

the Lord/N his/$ hat                        ← clitic HIS
the Lord's/N$ hat                           ← clitic with apostrophe
the Lords/N$ hat                            ← clitic without apostrophe
the Lordys/N$ hat                           ← variant spelling

the Lord of Bodmin/NPR his/$ hat            ← clitic HIS
the Lord of Bodmin@/NPR @'s/$ hat           ← clitic with apostrophe
the Lord of Bodmin@/NPR @s/$ hat            ← clitic without apostrophe
the Lord of Bodmin@/NPR @ys/$ hat           ← variant spelling

God/NPR almighty@/ADJ @'s/$ mercy           ← clitic with apostrophe
God/NPR almighty@/ADJ @s/$ mercy            ← clitic without apostrophe

Proper noun (NPR, NPRS, NPR$, NPRS$)

General principles

The distinction between common nouns and proper nouns is notoriously difficult to make in a principled way, as it is fundamentally motivated by nonstructural considerations. This is especially true for
named events and unique entities. The general principles below and the guidelines in the rest of this section represent our best effort to establish a system that can be implemented in a reasonably consistent and efficient way.

Many inconsistencies and outright errors likely remain with respect to the tagging of proper nouns.

Minimize use of NPR

In general, when combining with words that are
proper nouns on their own, words that aren't proper nouns on their own are given their ordinary tag. For instance, prepositions are not treated as part of proper nouns except when they are not spelled separately or when they are part of foreign names.

(NP (NPR Gy)
    (PP (P of)
        (NP (NPR Marchia))))

(NP (NPR Berwick)
    (PP (P upon)
        (NP (NPR Tweed))))

(NP (NPR Stratford-upon-Avon))

Systematic exceptions to this principle occur in connection with the following:

Maximize internal structure

In general, our annotation maximizes the internal structure of noun phrases that contain proper nouns. Particularly noteworthy is the case of potential appositive structures. Following THE or possessive pronouns, noun-noun pairs are always treated as appositive structures, even though this is almost certainly the wrong analysis in some cases.

(NP (NPR Henry)					(NP (NPR Alexander)
    (NP-PRN (D the) (ADJ Eighth)))		    (NP-PRN (D the) (ADJ Great)))

(NP (NPR Henry)
    (NP-PRN (NUM VIII)))

(NP (NPR David)					(NP (NPR Iohannes)
    (NP-PRN (D the) (N prophet)))		    (NP-PRN (D +de) (N godspellere)))

(NP (PRO$ my) (N cousin)			(NP (PRO$ my) (N lorde)
    (NP-PRN (NPR Roper)))			    (NP-PRN (NPR Arthure)))

(NP (D the) (N kynge)				(NP (D the) (ADJ grete) (N Lady)
    (NP-PRN (NPR Royns))			    (NP-PRN (NPR Lyle))
    (PP (P of)					    (PP (P of)
        (NP (NPR Northe) (NPR Walis))))			(NP (NPR Avilion))))

(NP (D +te) (ADJ gentil) (N Erl)		(NP (D the) (N Reverend)
    (NP-PRN (NPR Thomas)))			    (NP-PRN (NPR Dr.) (NPR John) (NPR Donne)))

(NP (D the) (N virgin)				(NP (D the) (ADJ blessed) (N virgin)
    (NP-PRN (NPR Mary)))			    (NP-PRN (NPR Mary)))

(NP (D the) (N Castell)				(NP (D the) (N castell)
    (NP-PRN (NPR Aungel)))			    (NP-PRN (NPR Nygurmous)))

(NP (D the) (N flum)			        (NP (D the) (N water)
    (NP-PRN (NPR Iordan)))			    (NP-PRN (NPR Ponte)))

In such cases, the "name" part is tagged NPR even if it is not a noun.

(NP (D the) (N Castell)				(NP (D the) (N Sege)
    (NP-PRN (NPR Terrable)))			    (NP-PRN (NPR Perelous)))

This principle has the following exception:

Foreign name

Foreign names are tagged as proper nouns (NPR) rather than as foreign words (FW).

(NP (NPR Nova) (NPR Scotia))
(NP (NPR Sankgreall)

In contrast to closed-class items in English names, closed-class items in foreign names (DE, DU, LE, LA, etc.) are always treated as part of the name and tagged NPR.

(NP (NPR Leonardo) (NPR da) (NPR Vinci))
(NP (NPR Petir) (NPR de) (NPR Luna))
(NP (NPR Sagramour) (NPR le) (NPR Desyrus))

Plural marking

As with
units of measure, plural tags are used only on items with explicit plural marking.

(NP (D the) (NPR West) (NPRS Saxons))

Words that cannot bear plural marking are tagged as adjectives, not as proper nouns. See Groups of people (ENGLISH, FRENCH) and more generally, NPs with elided heads (THE POOR, THE RICH).

Common noun or proper noun?

Cases by form

Bare noun

Bare nouns denoting offices are not proper nouns on their own. See Office for details.

Bare nouns that are names are proper nouns on their own. These include:

ADJ + N

In adjective-noun pairs, if the head noun is a name (that is, a proper noun on its own), then the adjective is tagged ADJ (in keeping with the principle of minimizing the use of NPR).

(NP (ADJ Good) (NPR Friday))		day
(NP (ADJ Holy) (NPR Saturday)

(NP (ADJ Bloody) (NPR Mary))		person

(NP (ADJ Great) (NPR Britain))		place
(NP (ADJ New) (NPR Troye))

(NP (ADJ holy) (NPR church))		unique entity

(NP (Q+ADJ almighty) (NPR God)

(NP (NPR God)
    (ADJP (Q+ADJ almighty)))

(NP (NPR Lord)
    (ADJP (Q+ADJ almighty)))

(NP (ADJ holy) (NPR scripture))

If the head noun is not a proper noun on its own, then the adjective is tagged NPR along with the noun.

(NP (D the) (NPR Holy) (NPR Land))		places
(NP (D the) (NPR Low) (NPRS Countries))
(NP (D the) (NPR New) (NPR Inn))
(NP (D the) (NPR Red) (NPR Sea))

(NP (D the) (NPR Great) (NPR Seal))		unique entities
(NP (D the) (NPR Holy) (NPR Ghost))
(NP (NPR Holy) (NPR Writ))
(NP (D the) (NPR Old) (NPR Testament))
(NP (D the) (NPR Round) (NPR Table))
(NP (D the) (NPR Southern) (NPR Cross))

D/PRO$ + N

Nouns denoting offices are not proper nouns on their own. See
Office for details.

(NP (D the) (N King))

Specific epithets associated with a specific person do not count as offices. If such an epithet is used without the person's name to refer to that person, it is tagged NPR.

(NP (D the) (NPR Baptist))	← referring to John
(NP (D the) (NPR Conqueror))	← referring to William, etc.
(NP (D the) (NPR Ironside))	← referring to Edmund
(NP (D the) (NPR virgin))	← referring to Mary

By contrast, epithets used with a person's name are treated as appositives (in keeping with the principle of maximizing internal structure).

(NP (NPR John)
    (NP-PRN (N Baptist)))

(NP (NPR Edmund)
    (NP-PRN (N Ironsides)))

D/PRO$ + N + NPR

Instances of the type THE EARL THOMAS, MILORD CROMWELL are always treated as
appositive structures. See Maximize internal structure for examples.

D/PRO$ + NPR + NPR

Instances of the type OUR LORD GOD are treating as flat string, exceptionally not conforming to the principle of
maximizing internal structure.

(NP (PRO$ our) (NPR Lord) (NPR God))
(NP (PRO$ our) (NPR Lord) (NPR Jesus) (NPR Christ))

THE N OF NP

In general, in phrases of the type THE N OF NP, the first noun is tagged N. See CITY, SON, TOWER for some special cases.

Nouns within the PP that are not proper nouns on their own are tagged with their ordinary tags.

Note the counterintuitive result in the following cases that no noun is tagged NPR, even though the noun phrase as a whole refers to a named event or special day. This issue awaits resolution.

(NP (D the) (N War)
    (PP (P of)
        (NP (D the) (NS Roses))))

(NP (D +te) (N feste)
    (PP (P of)
        (NP (D +te) (N camel))))

(NP (D +te) (N day)
    (PP (P of)
        (NP (N doom))))

Nouns within the PP are tagged NPR only if they are proper nouns on their own.

(NP (D the) (N feste)
    (PP (P of)
        (NP (NPR Pentecoste))))			← name of holiday

(NP (D the) (N feste)
    (PP (P of)
        (NP (NPR Ascension))))			← named event

(NP (D the) (N tropic)
    (PP (P of)
        (NP (NPR Cancer))))			← unique entity

N + N

In bare noun-noun pairs where neither of the nouns is a proper noun on its own, both parts are tagged N.

(NP (N lord) (N emperour))
(NP (N Mr.) (N Attorney))
(NP (N Mr.) (N Speaker))
(NP (N Sir) (N Knight))

Otherwise, both nouns are tagged NPR.

(NP (NPR Jhesu) (NPR Crist))
(NP (NPR Julius) (NPR Caesar))
(NP (NPR Robin) (NPR Hood))

This is true even in cases where not all of the nouns are proper noun on their own. Such cases are exceptions to the principle of minimizing use of NPR. The order of the nouns is irrelevant (LONDON BRIDGE, MOUNT ZION).

(NP (NPR Lady) (NPR Lisle))		← title etc. exceptionally tagged NPR
(NP (NPR mrs.) (NPR Lisle))
(NP (NPR seynt) (NPR Gregory))
(NP (NPR sire) (NPR Thomas))

(NP (D the) (NPR West) (NPRS Saxons))
(NP (NPR North) (NPR Galys))            ← cf. (NP (ADJ Great) (NPR Britain))

(NP (NPR Mount) (NPR Zion))

(NP (NPR Penteney) (NPR Abbey))
(NP (NPR London) (NPR Bridge))
(NP (NPR London) (NPR town))
(NP (NPR Sussex) (NPR County))

(NP (NPR Easter) (NPR day))
(NP (NPR Lammas) (NPR term))
(NP (NPR Maundy) (NPR Thursday))	← MAUNDY on its own = N

Genitive/possessive NP + N

Genitive or possessive NPs that are part of proper nouns are tagged by function. If consisting of more than one word, they are surrounded by phrasal brackets in the ordinary way (in keeping with our principle of
maximizing internal structure).

The noun modified by the genitive or possessive NP is exceptionally tagged NPR on a par with the noun-noun cases just discussed.

(NP (NPR$ Lincolns) (NPR Inne))			← INN exceptionally tagged NPR

(NP (NP-POS (NPR New) (NPR$ Year's))
      (NPR day))				← DAY exceptionally tagged NPR

(NP (NP-POS (NPR Seint) (NPR$ Edward))		← NPR$ by function
      (NPR day))					← DAY exceptionally tagged NPR

Cases by referent

Ethnic, ideological, or religious group

Words referring to groups of people (ethnic, ideological, or religious) are handled as follows. If the word has no plural form, it is tagged ADJ.

ADJ English/ADJ (and more generally, nationalities ending in -ISH)

French/ADJ

If the word has a plural form, it is tagged NPR(S).

Jew/NPR			Jews/NPRS

Spaniard/NPR		Spaniards/NPRS
Some words referring to groups of people are systematically ambiguous between a nominal and an adjectival use. If the ambiguous word is overtly marked for plural or if it occurs in a syntactic context where it could be so marked, it is tagged NPR(S). Otherwise, the word is tagged ADJ.

He is a Catholic/NPR .          (cf. They are Catholics/NPRS .)
He is Catholic/ADJ .            (cf. They are Catholic/ADJ .)
the Catholic/ADJ church

analogously: Armenian, German, Greek, etc.

Ideology or religion

Ideologies (DEISM, MARXISM, etc.) and religions (CHRISTENDOM, CHRISTIANITY, MAHOMETANISM, etc.) are tagged as common nouns. But when used in a locative sense, CHRISTENDOM is tagged NPR.

The king and his subjects accepted Christendom/N .

throughout the greater part of Christendom/NPR

Language

Names of languages are tagged ADJ when used as prenominal modifiers and NPR otherwise.

the English/ADJ language	Our native language is English/NPR .
				the langage of Englysshe/NPR
the Latin/ADJ bible		to study Latin/NPR

Named event

Named events, notably Christian ones like the following, are tagged NPR.

(NP (D +te) (NPR Assumpcioun))
(NP (D +te) (NPR incarnacion))
(NP (D the) (NPR Passion))
(NP (D the) (NPR Resurreccion))

Office

Nouns denoting offices (ARCHBISHOP, EARL, JUSTICE, KING, LORD, LADY, POPE, and the like) are treated differently depending on whether they are used on their own or as a title (that is, in conjunction with a person's name). Offices on their own are tagged N.

(NP (D the) (N King))
(NP (D the) (N Pope))
(NP (D the) (ADJ Prime) (N Minister))
(NP (N Lord) (ADJ Chief) (N Justice))

(NP (PRO$ my) (N Lord)
      (NP-PRN (ADJ Chief) (N Justice)))			← NP-PRN because of possessive pronoun

(NP (D the) (N Reverend)
      (NP-PRN (NPR Dr.) (NPR John) (NPR Donne)))	← NP-PRN because of determiner

In conjunction with a name (KING HENRY, LADY LISLE), these nouns are tagged NPR, forming a systematic exception to the principle of minimizing the use of NPR.

(NP (NPR kynge) (NPR Arthure))
(NP (NPR Pope) (NPR John) (NPR Paul))

The same distinction is also made in syntactically more complex cases, notably in ones where the expression denoting the office contains an adjective. When the noun for the office occurs on its own, any adjectives are tagged ADJ (with an accompanying ADJP if postnominal).

(NP (D the) (N Lord) (ADJ Chief) (N Justice))

(NP (N Lord) (ADJ High) (N Admiral))

(NP (D the) (N Attorney)
      (ADJP (ADJ General)))

But when the noun for the office occurs with a name, any adjectives are tagged NPR, and the entire NP is given a flat structure.

(NP (NPR Attorney) (NPR General) (NPR Brown))
(NP (NPR Lord) (NPR Chief) (NPR Justice) (NPR Scrope))
(NP (NPR Lord) (NPR High) (NPR Admiral) (NPR Calvert))

Unique entity

Names of unique entities are proper nouns. SCRIPTURE is treated as a proper noun because it can occur without a determiner.

(NP (D the) (NPR Bible))
(NP (NPR Excalibur))
(NP (ADJ Holy) (NPR Scripture))		← SCRIPTURE counts as NPR
Unique is taken in a strict sense. Nouns like the following are not necessarily proper nouns on their own, but they can be tagged NPR under the right conditions. See also ADJ + N, THE N OF NP.

CITY, GRAIL, MOON, SUN, TESTAMENT, TOWER, WRIT

In general, book titles are not treated as proper nouns, as this would violate the principle of maximizing internal structure. The apparent exceptions BIBLE and SCRIPTURE are proper nouns on their own.

CHURCH in an institutional sense is tagged NPR.

the catholic church/NPR

the church/NPR of England

Names and epithets of the DEVIL (FIEND, SATAN, UNWIHT, WURSE, etc.) are always tagged NPR.

Names and epithets of the Judeo-Christian GOD (CREATOR, LORD, etc.) are always tagged as proper nouns. This includes the TRINITY, its members (FATHER, SON, HOLY GHOST), and relevant epithets (CHRIST, HEALER, SAVIOR). LADY as an epithet for Mary is tagged NPR. In doubtful cases, the default is N. For examples of the type OUR LORD GOD, see D + NPR + NPR.

(NP (PRO$ Oure) (NPR Father))
(NP (PRO$ ure) (NPR helende))
(NP (PRO$ Oure) (NPR Lady))
(NP (NPR Lord))
(NP (NPR Lord) (NPR Iesu))
(NP (PRO$ Oure) (NPR Lord))
(NP (D the) (NPR Trinity))
(NP (NPR +trumnesse))

Certain common Latin liturgical texts are treated as proper nouns.

(NP (NPR Ave) (NPR Maria))
(NP (NPR Credo))
(NP (NPR Pater) (NPR Noster))
(NP (NPR Requiem))
(NP (NPR Te) (NPR Deum) (NPR Laudamus))

ZODIAC and the signs of the zodiac are treated as proper nouns; GEMINI and PISCES are treated as singular.

Pronoun (PRO, PRO$)

All pronouns are tagged PRO except pronominal MAN (also ME) and pronominal ONE.

In cases of ambiguity, which can arise in connection with mixed gerunds, HER is tagged by default as an ordinary pronoun (PRO) rather than as a possessive pronoun.

Possessive pronoun

Possessive pronouns are always tagged PRO$, regardless of whether they modify a noun.

I love my/PRO$ cat .

This book is not mine/PRO$ .

Reflexive pronoun

The first morpheme of reflexive pronouns (MYSELF, YOURSELF, etc.) is tagged PRO or PRO$, depending on the pronoun's form.

(NP (PRO$+N myself))		(NP (PRO me) (N self))
(NP (PRO+N himself))		(NP (PRO hym) (N self))
(NP (PRO+N herself))		(NP (PRO her) (N self))		← PRO by default, like HER

The second morpheme (SELF, SELVES) is tagged N regardless of number.

(NP (PRO$+N yourself))		(NP (PRO$ your) (N self))
(NP (PRO$+N yourselves))	(NP (PRO$ your) (N selues))

Indefinite pronoun MAN (MAN)

If a given text clearly uses MAN (or also ME in early texts) as a pronoun, then all unmodified uses of subject MAN are tagged MAN.
for +dan +de me/MAN nett hem to +dan a+de

+Teih me/MAN niede me to +dan a+de , me/MAN ne net me/PRO noht te forsweri+gen

For +tar man/MAN ne can his mu+des me+de

+Dis word .credo. Mon/MAN mai understonden on +tre wise

The plural MEN also has a pronominal use, but because it is too difficult to distinguish this use from other uses, it is always tagged NS.

Existential THERE (EX)

Existential THERE is tagged EX.

When ambiguous between an existential (EX) and a locative (ADV) reading, THERE is tagged EX by default.

Here/ADV are some pears , and there/ADV are some apples .

There/EX are lots of apples there/ADV .

There/EX are lots of apples .

Verb (and related categories)

List of tags

Category Infinitive Present Past Imperative Present participle Passive participle Perfect participle
be BE BEP BED BEI BAG BEN
do DO DOP DOD DOI DAG DAN DON
have HV HVP HVD HVI HAG HAN HVN
ordinary verb VB VBP VBD VBI VAG VAN VBN

Category Tag
modal MD
modal, untensed (Middle English) MD0
infinitival FOR FOR
infinitival TO, TIL, AT TO


Auxiliary vs. main verb (BE, DO, HAVE)

Forms of the verbs BE, DO, and HAVE are distinguished from all other verbs, but the POS tags do not distinguish their auxiliary and main verb uses. In the parsed files, auxiliary BE and HAVE can be distinguished from main verb uses by the presence of a (possibly silent) participle in the clause (a passive or perfect participle for BE, and a perfect participle for HAVE). DO is
treated slightly differently in the PPCME2 and the later corpora.

When ambiguous between BEP and HVP, the contraction 'S (She's come) is tagged BEP by default.

Participle

See also
Departicipial adjective.

Nominal uses of the present participle are tagged as nouns.

one meeting/N, many meetings/NS

and she graunted hem with wepynge/N it shold be done rychely

Present participle for infinitive

In some early texts (notably the Trinity Homilies), a present participle is commonly used for the infinitive. In such cases the word is tagged by function (VB) rather than by form.
Alse ge hauen giwer lichame don to/TO hersumiende/VB fule lustes ;
and unriht , alse do+d giwer lichame he+d to/TO hersumiende/VB clennesse
. and rihtwisnesse . and holinesse

Subjunctive

Subjunctive forms are not distinguished from indicative forms (unlike in the Old English corpora).

if/P you/PRO be/BEP there/ADV

if/P you/PRO were/BED there/ADV

were/BED you/PRO to/TO come/VB

if/P they/PRO played/VBD with/P you/PRO

Modal

The following items generally count as modals:

CAN, COULD, MAY, MIGHT, MOWE, MUST, SHALL, SHOULD, THARF, UTEN, WILL, WOULD

These items are tagged MD when there is a reading available in which a main verb is elided. Otherwise, the item is tagged as a form of VB.

therefore ye may/MD sey what ye woll/MD                 ← elided SAY

' I woll/VBP well , ' seyde Balan ' that ye so do '

Modal or verb?

DARE and NEED are tagged as modals or verbs depending on their syntactic context. When they take infinitival TO complements or occur with DO support, they are tagged as verbs. Otherwise, they are tagged as modals.

Modal                            Verb

I dare/MD say .                  I dare/VBP to say it .
I dared/MD not say it .          I do not dare/VB (to) say it .

                                 They need/VBP to eat now .
They need/MD not come .          They do not need/VB to come .
                                 It needs/VBP them not to come.

Infinitive of modal

Untensed modal verbs (generally CAN, MOWE), which are attested into Modern English, are tagged MD0.
supposyng +tat he schuld cun/MD0 best rede +te booke

+tu xalt mown/MD0 askyn what +tu wylt

Infinitive marker (FOR, TO)

When ambiguous in infinitival contexts between FOR and P, FOR is tagged FOR.

It is difficult for/FOR me to complete the work .

It is more convenient for/P me for/FOR you to do the work .

For/P me it is difficult to complete the work .

TO is used to tag any form of the infinitive marker (including the northern Middle English forms AT and TIL).

They want to/TO eat dinner now .
+te riht +tidir at/TO cume 'the right to come thither'

+tien entent til/TO understand +tis wrytyng 'thy intent to understand this writing'

Clitic TO (enclitic on FOR or proclitic on a verb) is split off and each component receives its ordinary tag.

and fondede for@/FOR @to/TO slee Iustinianus

&/CONJ/D eadie katerine bigon for@/FOR @te/TO seggen

t@/TO @accept/VB it as his due

t@/TO @aue/HV bounde them up in barrelles

Adjective or adverb

List of tags

Category Positive Comparative Superlative Possessive
adjective ADJ ADJR ADJS ADJ$ (very rare)
adverb ADV ADVR ADVS  


Departicipial adjectives (INTERESTED, INTERESTING, and the like) are tagged as ADJ rather than as participles (VAG, VAN), as in earlier releases.

In doubtful cases, such cases are tagged VAG or VAN.

a shocking/ADJ revelation; a shocked/ADJ expression 

any living/ADJ creature; any creature now living/VAG

Fused forms with the distribution of PPs (ALIVE, ASLEEP) are not treated as adjectives.

Comparative adjective or adverb

a simpler/ADJR solution

Let's work smarter/ADVR, not harder/ADVR .

Degree words are treated as comparatives. See AS, SO (degree), ENOUGH, and TOO.

Positive or comparative?

Except when construed with a THAN phrase or clause, apparently comparative forms that lack a base form are tagged ADJ rather than ADJR.

FORMER, INNER, NETHER, OUTER, UPPER

LATER is tagged ADJ or ADV when not clearly comparative. For unclear cases, the default is ADJR or ADVR.

LATTER is ordinarily tagged ADJ. However, in early texts, it is tagged ADJR when it functions as the ordinary comparative of LATE (later replaced by the innovative LATER).

LOWER is tagged ADJ when contrasting with UPPER, and as ADJR when contrasting with HIGHER. For unclear cases, the default is ADJR.

Superlative adjective or adverb

in the greatest/ADJS church of London

They like this way best/ADVS .

Positive or superlative?

Apparently superlative forms that lack a base form are tagged ADJ rather than ADJS. These include FIRST, LAST and adjectives ending in -MOST (FOREMOST, INNERMOST, etc.).

Color

Color terms in adnominal or predicate position are tagged ADJ, and otherwise N.

a pale blue/ADJ egg              a very pale blue/N

a white/ADJ shirt                the white/N of an egg

a lovely red/ADJ rose            a lovely deep red/N

Ordinal number

Ordinal numbers are tagged ADJ, as are DOUBLE, TREBLE, TRIPLE, and so on.
FIRST also has an adverbial use.

Cases where an ordinal might be expected but without overt ordinal marking are treated as cardinal numbers (NUM).

the .x./NUM of April

Preposition (including subordinating conjunction)

List of tags

P preposition or subordinating conjunction


Prepositions are tagged P.

the borders of/P England

on/P your way

to/P the/D king/N

with/P Merlin

Subordinating conjunction

Subordinating conjunctions (except for true
complementizers) are treated as prepositions taking a clausal complement and tagged P. See Adverbial clause, Clausal complement of P for details.

after/P they arrived     ← cf. after/P their arrival

before/P they left       ← cf. before/P their departure

Prepositional A- (< IN, ON)

Depending on when in the history of the language the reduced preposition A- (< IN, ON) fused with its complement, compounds containing it are treated as
unitary items or as written.

The A HUNTING construction is treated as written.

Items like ALIVE and ASLEEP are tagged P+N and treated as written rather than as unitary adjectives because they are barred from prenominal position.

Preposition with demonstrative (FOR+TAN, FOR+TI, etc.)

A preposition may be followed by a demonstrative (FOR +TAN, FOR +TI, FOR +TAT, IN +TAT, WI+T +TAN, etc.), which in turn may be followed by a complement clause or noun phrase, as in the examples below. If the preposition and demonstrative are cliticized, the combination is tagged P+D. See Preposition plus demonstrative plus clause or NP.
after/P +dan/D +de/C here herte leste , ic hem fol+gede

For+di/P+D +dat/C ich nabbe ihafd rihte ileaue

for/P +ti/D +tt/C +tu ne wilnest bute to seo mi wlite . ne speoke bute to me

We ben tau+gt in/P +tat/D +tat/C we seon in suche creatures +te wonder
werkes of vre Creatour

Punctuation

List of tags

PUNC all punctuation


PUNC is not on the default "ignore_nodes" list of CorpusSearch and needs to be added, if necessary, to the list by the user (best in a preference file).

Punctuation is attached as high as possible in the tree without regard to where it "belongs" semantically.

Dash

Dashes in the original text are represented by hyphens in the annotated corpora. At token boundaries, they go with the first token, if they are the only punctuation at the boundary. Otherwise, they go with the second token.

They are finished with the job ,/PUNC        ← with preceding punctuation

-/PUNC or so they say .                    

They are finished with the job -/PUNC        ← without preceding punctuation

or so they say .

Period

Periods that do not serve as sentence punctuation are not separated from the word they belong to. Common cases include periods indicating abbreviations or surrounding Roman numerals and sometimes ordinary words.

Mr./NPR

.x./NUM days/NS
+Tis/D word/N .credo./NPR Mon/MAN mei/MD understonden/VB ./PUNC on/P
+tro/NUM wise/N ./PUNC CMLAMB1.75.43/ID

Ne/NEG mei/MD na/Q .Mon./N cume/VB in/RP to/P godes/NPR$ riche/N CMLAMB1.73.17/ID

Quotation mark

In general, quotation marks around continuing lines of quoted speech have been removed from the text. But those marking continuing paragraphs of quoted speech have been retained.

Quantifier

List of tags

Category Positive Comparative Superlative Possessive
quantifier Q QR QS Q$


The following words are always tagged Q.

ALL, ANY, EACH, EVERY, FEW, MANY, OUGHT (= ANYTHING), SEVERAL, SOME

FEWER and FEWEST are always tagged QR and QS, respectively.

LESS, LEAST and MUCH, MORE, MOST are not on the list above because they are treated somewhat differently in the PPCME2 and the later corpora.

Wh- word

List of tags

WADV wh-adverb
WD wh-determiner
WPRO wh-pronoun
WPRO$ wh-pronoun, possessive
WQ WHETHER (and IF when heading indirect questions)


Wh- words that are used as interjections are tagged with the appropriate tag from the above list (not as INTJ). Their function is indicated at the phrasal level; see Interjection phrase (INTJP).

Wh- adverb (WADV)

Wh- adverbs (HOW, WHEN, WHENCE, WHERE, WHITHER, WHY) are tagged WADV. WHEN can also be tagged P.

How/WADV would you do that ?
  
When/WADV are they planning to arrive ?

They remembered where/WADV to buy the tickets .

the place where/WADV they are going

Wh- determiner (WD)

WHAT and (THE) WHICH are tagged WD when not acting as the head of a wh- noun phrase, and as WPRO
otherwise. Note the difference in this regard between wh- words and ordinary determiners, which are always tagged D.

(WNP (WD what) (NS horses))

(WNP (WNP (WD which) (N prophet))
     (CONJP (CONJ and)
            (NX (N law))))

(WNP (WNP (WNP what))
     (CONJP (CONJ and)
            (WADVP (WADV where))))

(WNP (WPRO what)			← ADJP modifies WPRO
     (ADJP (ADJ else)))

For the treatment of WHAT THE DEVIL and similar expressions, see INTJP.

Otherwise, WHAT immediately preceding a determiner is tagged WD (even though this is arguably not the correct structure).

(WNP (WD what) (D a) (N nightmare))

Wh- pronoun (WPRO, WPRO$)

WPRO is assigned to WHO, to WHETHER in the meaning WHICH OF TWO, and to WHAT and (THE) WHICH when these are acting as the head of a wh- noun phrase. (Other cases of WHAT and (THE) WHICH are tagged
WD, as just discussed.)

What/WPRO did you say ?

the bigger one , which/WPRO I showed you yesterday

I know not whether/WPRO of hem is come to court .

There began a new batayle, the whych/WPRO was sore and harde .

The tag for WHOSE is WPRO$.

Whose/WPRO$ is that umbrella ?

by whose/WPRO$ commandment

Wh- marker in indirect questions (WQ)

WHETHER and IF are tagged WQ when they introduce indirect questions. The treatment as WQ rather than as C is motivated by the co-occurrence of WHETHER and IF with THAT, at least in Middle English.

We want to know if/WQ that/C they are coming.

Other categories

List of tags

C complementizer
CONJ coordinating conjunction
D determiner
FP focus particle
FW foreign word
INTJ interjection
NEG negation
NUM cardinal number (except ONE)
RP adverbial particle
X unknown POS


Complementizer (C)

THAT, +TE, and variants introducing subordinate clauses, whether complements or adjuncts, are tagged C. See also
AS (complementizer), AS, SO, THAN (preposition), IF.

We know that/C you would like to visit us .

the person that/C you would like to visit

Coordinating conjunction (CONJ)

The following items are tagged CONJ when used as coordinating conjunctions:

AND, NE, NEITHER, NOR, OR, OTHER (= OR)

It is possible for two coordinating conjunctions to be adjacent.

And/CONJ nor/CONJ is this the right answer .

But/CONJ neither/CONJ is this the right answer .

In instances of correlative conjunction, each conjunction is tagged CONJ.

all periods BOTH ... AND, EITHER ... OR, NEITHER ... NOR
only Middle English (EITHER) +GE ... +GE

both/CONJ you and/CONJ I

either/CONJ you or/CONJ I

neither/CONJ you nor/CONJ I
ai+der/CONJ +ge/CONJ hodede +ge/CONJ leawede

Determiner (D)

The following words are tagged D when used as determiners:

A(N), THAT, THE, THESE, THIS, THOSE, YON, YONDER

Demonstratives are always tagged D, whether they precede a noun or not. Note the difference between ordinary determiners and wh- words in this regard.

+Tis/D +tat/C is i-seide in +te comyn table

Focus particle (FP)

The following items are tagged FP when used as focus particles. All of these words also have other uses; follow the links on the words.

all periods ALONE, BUT, EVEN, ONLY
only Middle English FORTH, ONE, YET

They alone/FP know the answer.

Not even/FP they know the answer.

Only/FP they know the answer.
+tat hie ne biholden non iuel ne non un-nut ne for+den/FP idel    ← FORTH

+De mann ne leue+d naht $be bread ane/FP                          ← ONE

hwi wi+d_dra+gest +tu +tin hont . &/ +get/FP +tin king hond       ← YET
of midde +tine bosme

Foreign word (FW)

Our operational definition of a foreign word is one that has no entry in the OED. Words that do have an OED entry are not tagged FW, even nouns that retain a foreign plural.

Certain abbreviations of foreign terms are listed in the OED, but their POS tag in English is difficult to determine. Such abbreviations are tagged X. Some examples include:

B.A., e.g., etc., i.e., M.A.

Foreign names and certain common Latin liturgical texts are treated as proper nouns. But foreign language titles of books are tagged FW.

a passage from Thomas/NPR Mann's/NPR$ Zauberberg/FW

In the Alcoranum/FW it is written

in the prologue on Regum/FW

In foreign language sequences, everything (words, symbols, numbers, etc.) except punctuation is labeled FW.

libro/FW 5=o=/FW ,/PUNC capitulo/FW 24=o=/FW

Interjection (INTJ)

INTJ is only used to tag words that are difficult or impossible to tag any other way, like the following:

AH, ALAS, AMEN, AYE, FAREWELL, FIE, GAR (< God), GOOD-BYE, GRAMERCY, HA, HULLO, LA, LO, NAY, NO, OH, PARDEE, POOF, WASSAIL, WELAWEI, YEA, YES, WITECRIST

Items like FORSOOTH, MARY (and spelling variants), and various wh- words (WHAT, WHY) are not tagged as INTJ, even when used as interjections, but their function can be indicated at the phrasal level. See Interjection phrase (INTJP) for examples.

NO is tagged INTJ when the negative equivalent of YES.

PRAY (and variants like PRITHEE) is never tagged INTJ.

Negation (NEG)

When used as simple negation, NE and NOT are tagged NEG, as are NO and NONE in WHETHER OR NO clauses. All four items also have other uses; follow the links on the words.

You should not/NEG be late for the lecture .
non senne ne/NEG mai bien idon bute +durh unhersumnesse

me/MAN ne/NEG net me/PRO noht/NEG te forsweri+gen

wheither/WQ he/PRO wol/MD doon/DO or/CONJ no/NEG

wheither it oghte nedes be doon or noon/NEG

Proclitic negation on verbs and modals in Middle English is split, as is enclitic negation in later stages of English.

I n@/NEG @el/MD neuere go hennys .

We can@/MD @not/NEG help them .

We ca@/MD @n't/NEG help them .

Cardinal number (NUM)

See also
Ordinal number. ONE has its own tag.

When overtly marked for plural (DOZENS, SCORES, HUNDREDS, THOUSANDS, MILLIONS, etc.), number words are tagged NS. Often, such forms take PP complements (nine millions of subjects), which clearly shows the status of the number word as a nominal head. In a very few cases, where a plural number word is immediately followed by another number word and would in modern usage be replaced by the singular (as in nine millions three-hundred thousand), the number word is tagged as NUM.

Singular number words (DOZEN, SCORE, and sometimes HUNDRED, THOUSAND, MILLION) are treated as N when followed by a PP complement.

Numbers in foreign language sequences are treated as foreign words.

Otherwise, unless used as list markers (LS), all cardinal numbers except for ONE are tagged NUM, whether spelled out, in numeral form, or in some combination of the two. In the lemmatized corpora, all numbers are represented as single orthographic words; see the lemmatization guidelines for details.

22/NUM parts

xxij./NUM parts

twenty-two/NUM parts

two_and_twenty/NUM parts

xxx=ti=/NUM parts

xxx_c/NUM parts

TWICE and THRICE are tagged NUM, as is ONCE when analogous in meaning.

Twice/NUM , we ran out of cash . 

The horse turned around thrice/NUM .

Cases where an ordinal might be expected but without overt ordinal marking are treated as cardinal number and tagged NUM.

the ij./NUM day

the .ix./NUM chapter

Adverbial particle (RP)

The criteria for distinguishing adverbial particles (RP) from other adverbs (ADV) are difficult to make explicit in every case. Following the Brown Corpus, we tag the following words RP when they do not take a complement.

ABOUT, ACROSS, BY, DOWN, FRO, IN, OFF, ON, OUT, OVER, THROUGH, TO, UP

Humpty Dumpty fell down/RP from his proud perch .

Sir Hector tried to pull out/RP the sword .

And Sir Ralph of Beeston gave up/RP the castle to the king .

Items from the above list that modify a prepositional phrase continue to be tagged as particles as long as they are spelled as separate words (notably IN TO and UP ON, but not ADOWN, APON, INTO, UNTO, or UPON).

kneeling down/RP upon/P his knees

out/RP of/P the castle down/RP to/P the earth

Some items from the above list can combine with -WARD. See -WARD for details.

Unknown POS (X)

Words with unknown POS are tagged X.
(NP-OB1 (NUM C) (N myle)
        (X li))

( (IP-MAT (CONJ And)
          (X +tet)              ← mistranslation by Dan Michel
          (PUNC /)
          (PP (P yef)
              (CP-ADV (C 0)
                      (IP-SUB (NP-SBJ (PRO hit))
                              (NP-OB2 (PRO him))
                              (NEG ne)
                              (BEP is)
                              (NEG na+gt)
                              (ADJP-PRD (ADJ wor+t)))))
          (PUNC :)
          (NP-SBJ (PRO he))
          (NP-OB2 (PRO him))
          (VBP zay+t)
          (PUNC .)
          (PUNC ')
          (IP-IMP-SPE (VBI (VBI eth) (PUNC /) (CONJ an) (VBI drink))
                      (PP (P ase)
                          (NP (D +te) (ADJ ilke))))
      (PUNC /))
  (ID CMAYENBI-M2,54.978))

( (IP-MAT (CONJ and)
          (NP-SBJ (QP (ADVR so) (Q meny))
                  (CP-DEG *ICH*-1))
          (BED were)
          (VAN slayne)
          (PUNC ,)
          (PP (X what)                       ← likely mistranslation of French "que ... que"
              (PP (P in)
                  (NP (ONE o) (N side)))
              (CONJP (CONJ and)
                     (PP (P in)
                         (NP (D +tat) (OTHER o+tere)))))
          (PUNC ,)
          (CP-DEG-1 (C +tat)
                    (IP-SUB (NP-SBJ (PRO hit))
                            (BED was)
                            (NP-PRD (ADJ grete) (N pite)
                                    (CP-TMC (WNP-2 0)
                                            (IP-INF (IP-INF (NP-OB1 *T*-2)
                                                            (TO to)
                                                            (VB wete))
                                                    (CONJP (CONJ and)
                                                           (IP-INF (NP-OB1 *T*-2)
                                                                   (TO to)
                                                                   (VB seen))))))))
          (PUNC .)) 
  (ID CMBRUT3-M3,87.2622))

( (IP-MAT (META (NP (CODE <font>) (NPR Ga~mer) (CODE <$$font>)))
	  (INTJ Alas)
	  (NP-VOC (N sir))
	  (PUNC ,)
	  (NP-SBJ (PRO hee@))
	  (MD @l)
	  (BE be)
	  (ADVP-LOC (ADV here))
	  (ADVP-TMP (ADV anon))
	  (PUNC ,)
	  (FRAG (X ha)
	        (BE be)
  	        (VAN handled)
	        (ADVP (ADVR to) (ADV bad)))
	  (PUNC .))
  (ID STEVENSO-1558-E1-H,57.293))

As mentioned earlier, X is also used to tag foreign abbreviations with an OED entry, but without an obvious POS tag in English (notably, ETC.).