Category | Singular | Singular possessive | Plural | Plural possessive |
---|---|---|---|---|
common noun | N | N$ | NS | NS$ |
proper noun | NPR | NPR$ | NPRS | NPRS$ |
ONE (default) | ONE | ONE$ | ONES | ONES$ |
OTHER (default) | OTHER | OTHER$ | OTHERS | OTHERS$ |
Category | Tag |
---|---|
$ | possessive marker |
EX | existential THERE |
MAN | indefinite pronoun MAN (Middle English) |
PRO | pronoun |
PRO$ | pronoun, possessive |
The Government/N have decided to quell the mutiny.
Collective nouns like FOLK and PEOPLE are
treated differently in the
PPCME2 and in the later corpora.
In connection with proper
nouns (SOUTHERN CROSS), the same principles apply to compass point
adjectives as to ordinary adjectives.
Otherwise, compass points are tagged N, regardless of whether
they precede another noun.
For the treatment of NORTH, SOUTH, EAST, and WEST as part of proper
nouns, see Proper noun,
especially the section on N + N
compounds.
The $ tag generally appears directly only on nominal and
pronominal tags
(N(S), NPR(S), ONE(S), OTHER(S), (W)PRO),
indicating their relationship with other nouns. However, in the absence
of an overt noun to host the possessive marker, the $ tag can
appear on NUM$ and Q$, notably in the Middle
English ALRE plus
superlative construction.
In addition to appearing directly on other tags, the $ tag can
also appear alone. It is always used alone for HIS in the JOHN HIS BOOK
construction, and it is sometimes so used for the possessive clitic ('S,
S, and spelling variants), which postdates the texts in the PPCME2,
although it appears occasionally in the edited texts. When the possessive
clitic is spelled as a separate word, as it sometimes is, it always
receives a tag of its own. When spelled together with the preceding word,
it is split off in the parsed corpus if it takes scope over a larger
constituent than the word to which it is attached.
See Genitive/possessive modifier of
N for the parsed structures corresponding to the following examples.
Compass point
Forms with an overt adjectival suffix (EASTERN, NORTHERN, SOUTHERN,
WESTERN, and variants) are tagged ADJ.
the northern/ADJ territories
towards the west/N
the north/N face of the mountain
Unit of measure (DAY, POUND, YEAR, etc.)
Units of measure after numbers (TEN YEAR, etc.) are tagged as singular or
plural depending on whether number is marked overtly. Middle English
forms in -A, -EN, or -S are tagged as plural, and all others as singular.
ix c pound/N, three hondred wynter/N, vii +gere/N
xl daies/NS, sex monthis/NS, ueale hund wintra/NS, .xx. yeres/NS, .xxx. +gera/NS
Possessive or genitive noun (N$, NS$, $)
Common nouns standing in a possessive or genitive relationship with other
nouns are tagged N$, NS$. As with the plural, genitive marking
in early texts predates universal -S. In these cases, N$, NS$
indicate possessive or genitive function rather than any particular
morphological form. Conversely, morphologically genitive nouns that do
not stand in a relationship with some other noun are not tagged
N$, NS$; see adverbial NPs
for examples.
With overt -S marking
+te/D mannes/N$ shrifte/N 'the man's shrift'
kinges/NS$ sunes/NS 'kings' sons'
alre/Q kinge/NS$ king/N 'king of all kings'
Without overt -S marking
+te/D sowle/N$ fode/N 'the soul's food'
his/PRO$ sinne/N$ sore/N 'sorrow of his sin'
The dollar tag ($)
the Lord/N his/$ hat ← clitic HIS
the Lord's/N$ hat ← clitic with apostrophe
the Lords/N$ hat ← clitic without apostrophe
the Lordys/N$ hat ← variant spelling
the Lord of Bodmin/NPR his/$ hat ← clitic HIS
the Lord of Bodmin@/NPR @'s/$ hat ← clitic with apostrophe
the Lord of Bodmin@/NPR @s/$ hat ← clitic without apostrophe
the Lord of Bodmin@/NPR @ys/$ hat ← variant spelling
God/NPR almighty@/ADJ @'s/$ mercy ← clitic with apostrophe
God/NPR almighty@/ADJ @s/$ mercy ← clitic without apostrophe
Proper noun (NPR, NPRS, NPR$, NPRS$)
General principles
The distinction between common nouns and proper nouns is notoriously
difficult to make in a principled way, as it is fundamentally motivated by
nonstructural considerations. This is especially true
for named events and
unique entities.
The general principles below and the guidelines in the rest of this
section represent our best effort to establish a system that can be
implemented in a reasonably consistent and efficient way.
Many inconsistencies and outright errors likely remain with respect to the tagging of proper nouns. |
(NP (NPR Gy) (PP (P of) (NP (NPR Marchia)))) (NP (NPR Berwick) (PP (P upon) (NP (NPR Tweed)))) (NP (NPR Stratford-upon-Avon))
Systematic exceptions to this principle occur in connection with the following:
(NP (NPR Henry) (NP (NPR Alexander) (NP-PRN (D the) (ADJ Eighth))) (NP-PRN (D the) (ADJ Great))) (NP (NPR Henry) (NP-PRN (NUM VIII))) (NP (NPR David) (NP (NPR Iohannes) (NP-PRN (D the) (N prophet))) (NP-PRN (D +de) (N godspellere))) (NP (PRO$ my) (N cousin) (NP (PRO$ my) (N lorde) (NP-PRN (NPR Roper))) (NP-PRN (NPR Arthure))) (NP (D the) (N kynge) (NP (D the) (ADJ grete) (N Lady) (NP-PRN (NPR Royns)) (NP-PRN (NPR Lyle)) (PP (P of) (PP (P of) (NP (NPR Northe) (NPR Walis)))) (NP (NPR Avilion)))) (NP (D +te) (ADJ gentil) (N Erl) (NP (D the) (N Reverend) (NP-PRN (NPR Thomas))) (NP-PRN (NPR Dr.) (NPR John) (NPR Donne))) (NP (D the) (N virgin) (NP (D the) (ADJ blessed) (N virgin) (NP-PRN (NPR Mary))) (NP-PRN (NPR Mary))) (NP (D the) (N Castell) (NP (D the) (N castell) (NP-PRN (NPR Aungel))) (NP-PRN (NPR Nygurmous))) (NP (D the) (N flum) (NP (D the) (N water) (NP-PRN (NPR Iordan))) (NP-PRN (NPR Ponte)))
In such cases, the "name" part is tagged NPR even if it is not a noun.
(NP (D the) (N Castell) (NP (D the) (N Sege) (NP-PRN (NPR Terrable))) (NP-PRN (NPR Perelous)))
This principle has the following exception:
(NP (NPR Nova) (NPR Scotia)) (NP (NPR Sankgreall)
In contrast to closed-class items in English names, closed-class items in foreign names (DE, DU, LE, LA, etc.) are always treated as part of the name and tagged NPR.
(NP (NPR Leonardo) (NPR da) (NPR Vinci)) (NP (NPR Petir) (NPR de) (NPR Luna)) (NP (NPR Sagramour) (NPR le) (NPR Desyrus))
(NP (D the) (NPR West) (NPRS Saxons))
Words that cannot bear plural marking are tagged as adjectives, not as
proper nouns. See
Groups of people (ENGLISH,
FRENCH) and more generally, NPs
with elided heads (THE POOR, THE RICH).
Common noun or proper noun?
Cases by form
Bare noun
Bare nouns denoting offices are not proper nouns on their own. See Office for details. |
Bare nouns that are names are proper nouns on their own. These include:
In adjective-noun pairs, if the head noun is a name (that is, a proper noun on its own), then the adjective is tagged ADJ (in keeping with the principle of minimizing the use of NPR).
(NP (ADJ Good) (NPR Friday)) day (NP (ADJ Holy) (NPR Saturday) (NP (ADJ Bloody) (NPR Mary)) person (NP (ADJ Great) (NPR Britain)) place (NP (ADJ New) (NPR Troye)) (NP (ADJ holy) (NPR church)) unique entity (NP (Q+ADJ almighty) (NPR God) (NP (NPR God) (ADJP (Q+ADJ almighty))) (NP (NPR Lord) (ADJP (Q+ADJ almighty))) (NP (ADJ holy) (NPR scripture))
If the head noun is not a proper noun on its own, then the adjective is tagged NPR along with the noun.
(NP (D the) (NPR Holy) (NPR Land)) places (NP (D the) (NPR Low) (NPRS Countries)) (NP (D the) (NPR New) (NPR Inn)) (NP (D the) (NPR Red) (NPR Sea)) (NP (D the) (NPR Great) (NPR Seal)) unique entities (NP (D the) (NPR Holy) (NPR Ghost)) (NP (NPR Holy) (NPR Writ)) (NP (D the) (NPR Old) (NPR Testament)) (NP (D the) (NPR Round) (NPR Table)) (NP (D the) (NPR Southern) (NPR Cross))
(NP (D the) (N King))
Specific epithets associated with a specific person do not count as offices. If such an epithet is used without the person's name to refer to that person, it is tagged NPR.
(NP (D the) (NPR Baptist)) ← referring to John (NP (D the) (NPR Conqueror)) ← referring to William, etc. (NP (D the) (NPR Ironside)) ← referring to Edmund (NP (D the) (NPR virgin)) ← referring to Mary
By contrast, epithets used with a person's name are treated as appositives (in keeping with the principle of maximizing internal structure).
(NP (NPR John) (NP-PRN (N Baptist))) (NP (NPR Edmund) (NP-PRN (N Ironsides)))
(NP (PRO$ our) (NPR Lord) (NPR God)) (NP (PRO$ our) (NPR Lord) (NPR Jesus) (NPR Christ))
In general, in phrases of the type THE N OF NP, the first noun is tagged N. See CITY, SON, TOWER for some special cases.
Nouns within the PP that are not proper nouns on their own are tagged with their ordinary tags.
Note the counterintuitive result in the following cases that no noun is tagged NPR, even though the noun phrase as a whole refers to a named event or special day. This issue awaits resolution. |
(NP (D the) (N War) (PP (P of) (NP (D the) (NS Roses)))) (NP (D +te) (N feste) (PP (P of) (NP (D +te) (N camel)))) (NP (D +te) (N day) (PP (P of) (NP (N doom))))
Nouns within the PP are tagged NPR only if they are proper nouns on their own.
(NP (D the) (N feste) (PP (P of) (NP (NPR Pentecoste)))) ← name of holiday (NP (D the) (N feste) (PP (P of) (NP (NPR Ascension)))) ← named event (NP (D the) (N tropic) (PP (P of) (NP (NPR Cancer)))) ← unique entity
In bare noun-noun pairs where neither of the nouns is a proper noun on its own, both parts are tagged N.
(NP (N lord) (N emperour)) (NP (N Mr.) (N Attorney)) (NP (N Mr.) (N Speaker)) (NP (N Sir) (N Knight))
Otherwise, both nouns are tagged NPR.
(NP (NPR Jhesu) (NPR Crist)) (NP (NPR Julius) (NPR Caesar)) (NP (NPR Robin) (NPR Hood))
This is true even in cases where not all of the nouns are proper noun on their own. Such cases are exceptions to the principle of minimizing use of NPR. The order of the nouns is irrelevant (LONDON BRIDGE, MOUNT ZION).
(NP (NPR Lady) (NPR Lisle)) ← title etc. exceptionally tagged NPR (NP (NPR mrs.) (NPR Lisle)) (NP (NPR seynt) (NPR Gregory)) (NP (NPR sire) (NPR Thomas)) (NP (D the) (NPR West) (NPRS Saxons)) (NP (NPR North) (NPR Galys)) ← cf. (NP (ADJ Great) (NPR Britain)) (NP (NPR Mount) (NPR Zion)) (NP (NPR Penteney) (NPR Abbey)) (NP (NPR London) (NPR Bridge)) (NP (NPR London) (NPR town)) (NP (NPR Sussex) (NPR County)) (NP (NPR Easter) (NPR day)) (NP (NPR Lammas) (NPR term)) (NP (NPR Maundy) (NPR Thursday)) ← MAUNDY on its own = N
The noun modified by the genitive or possessive NP is exceptionally tagged NPR on a par with the noun-noun cases just discussed.
(NP (NPR$ Lincolns) (NPR Inne)) ← INN exceptionally tagged NPR (NP (NP-POS (NPR New) (NPR$ Year's)) (NPR day)) ← DAY exceptionally tagged NPR (NP (NP-POS (NPR Seint) (NPR$ Edward)) ← NPR$ by function (NPR day)) ← DAY exceptionally tagged NPR
Words referring to groups of people (ethnic, ideological, or religious) are handled as follows. If the word has no plural form, it is tagged ADJ.
ADJ English/ADJ (and more generally, nationalities ending in -ISH) French/ADJ
If the word has a plural form, it is tagged NPR(S).
Some words referring to groups of people are systematically ambiguous between a nominal and an adjectival use. If the ambiguous word is overtly marked for plural or if it occurs in a syntactic context where it could be so marked, it is tagged NPR(S). Otherwise, the word is tagged ADJ.Jew/NPR Jews/NPRS Spaniard/NPR Spaniards/NPRS
He is a Catholic/NPR . (cf. They are Catholics/NPRS .) He is Catholic/ADJ . (cf. They are Catholic/ADJ .) the Catholic/ADJ church analogously: Armenian, German, Greek, etc.
The king and his subjects accepted Christendom/N . throughout the greater part of Christendom/NPR
the English/ADJ language Our native language is English/NPR . the langage of Englysshe/NPR the Latin/ADJ bible to study Latin/NPR
(NP (D +te) (NPR Assumpcioun)) (NP (D +te) (NPR incarnacion)) (NP (D the) (NPR Passion)) (NP (D the) (NPR Resurreccion))
(NP (D the) (N King)) (NP (D the) (N Pope)) (NP (D the) (ADJ Prime) (N Minister)) (NP (N Lord) (ADJ Chief) (N Justice)) (NP (PRO$ my) (N Lord) (NP-PRN (ADJ Chief) (N Justice))) ← NP-PRN because of possessive pronoun (NP (D the) (N Reverend) (NP-PRN (NPR Dr.) (NPR John) (NPR Donne))) ← NP-PRN because of determiner
In conjunction with a name (KING HENRY, LADY LISLE), these nouns are tagged NPR, forming a systematic exception to the principle of minimizing the use of NPR.
(NP (NPR kynge) (NPR Arthure)) (NP (NPR Pope) (NPR John) (NPR Paul))
The same distinction is also made in syntactically more complex cases, notably in ones where the expression denoting the office contains an adjective. When the noun for the office occurs on its own, any adjectives are tagged ADJ (with an accompanying ADJP if postnominal).
(NP (D the) (N Lord) (ADJ Chief) (N Justice)) (NP (N Lord) (ADJ High) (N Admiral)) (NP (D the) (N Attorney) (ADJP (ADJ General)))
But when the noun for the office occurs with a name, any adjectives are tagged NPR, and the entire NP is given a flat structure.
(NP (NPR Attorney) (NPR General) (NPR Brown)) (NP (NPR Lord) (NPR Chief) (NPR Justice) (NPR Scrope)) (NP (NPR Lord) (NPR High) (NPR Admiral) (NPR Calvert))
Unique is taken in a strict sense. Nouns like the following are not necessarily proper nouns on their own, but they can be tagged NPR under the right conditions. See also(NP (D the) (NPR Bible)) (NP (NPR Excalibur)) (NP (ADJ Holy) (NPR Scripture)) ← SCRIPTURE counts as NPR
CITY, GRAIL, MOON, SUN, TESTAMENT, TOWER, WRIT |
In general, book titles are not treated as proper nouns, as this would
violate the principle
of maximizing internal
structure. The apparent exceptions BIBLE and SCRIPTURE are proper
nouns on their own.
CHURCH in an institutional sense is tagged NPR.
Names and epithets of the DEVIL (FIEND, SATAN, UNWIHT, WURSE, etc.)
are always tagged NPR.
Names and epithets of the Judeo-Christian GOD (CREATOR, LORD, etc.) are
always tagged as proper nouns. This includes the TRINITY, its members
(FATHER, SON, HOLY GHOST), and relevant
epithets (CHRIST, HEALER, SAVIOR). LADY as an epithet for Mary is
tagged NPR. In doubtful cases, the default is N. For
examples of the type OUR LORD GOD,
see D + NPR + NPR.
Certain common Latin liturgical texts are treated as proper nouns.
ZODIAC and the signs of the zodiac are treated as proper nouns; GEMINI
and PISCES are treated as singular.
the catholic church/NPR
the church/NPR of England
(NP (PRO$ Oure) (NPR Father))
(NP (PRO$ ure) (NPR helende))
(NP (PRO$ Oure) (NPR Lady))
(NP (NPR Lord))
(NP (NPR Lord) (NPR Iesu))
(NP (PRO$ Oure) (NPR Lord))
(NP (D the) (NPR Trinity))
(NP (NPR +trumnesse))
(NP (NPR Ave) (NPR Maria))
(NP (NPR Credo))
(NP (NPR Pater) (NPR Noster))
(NP (NPR Requiem))
(NP (NPR Te) (NPR Deum) (NPR Laudamus))
Pronoun (PRO, PRO$)
All pronouns are tagged PRO except pronominal
MAN (also ME) and
pronominal ONE.
In cases of ambiguity, which can arise in connection with mixed gerunds, HER is tagged by default as an ordinary pronoun (PRO) rather than as a possessive pronoun. |
I love my/PRO$ cat . This book is not mine/PRO$ .
(NP (PRO$+N myself)) (NP (PRO me) (N self)) (NP (PRO+N himself)) (NP (PRO hym) (N self)) (NP (PRO+N herself)) (NP (PRO her) (N self)) ← PRO by default, like HER
The second morpheme (SELF, SELVES) is tagged N regardless of number.
(NP (PRO$+N yourself)) (NP (PRO$ your) (N self)) (NP (PRO$+N yourselves)) (NP (PRO$ your) (N selues))
for +dan +de me/MAN nett hem to +dan a+de +Teih me/MAN niede me to +dan a+de , me/MAN ne net me/PRO noht te forsweri+gen For +tar man/MAN ne can his mu+des me+de +Dis word .credo. Mon/MAN mai understonden on +tre wise
The plural MEN also has a pronominal use, but because it is too difficult
to distinguish this use from other uses, it is always tagged NS.
Existential THERE (EX)
Existential THERE is tagged EX.
When ambiguous between an existential (EX) and a locative (ADV) reading, THERE is tagged EX by default. |
Here/ADV are some pears , and there/ADV are some apples . There/EX are lots of apples there/ADV . There/EX are lots of apples .
Category | Infinitive | Present | Past | Imperative | Present participle | Passive participle | Perfect participle |
---|---|---|---|---|---|---|---|
be | BE | BEP | BED | BEI | BAG | — | BEN |
do | DO | DOP | DOD | DOI | DAG | DAN | DON |
have | HV | HVP | HVD | HVI | HAG | HAN | HVN |
ordinary verb | VB | VBP | VBD | VBI | VAG | VAN | VBN |
Category | Tag |
---|---|
modal | MD |
modal, untensed (Middle English) | MD0 |
infinitival FOR | FOR |
infinitival TO, TIL, AT | TO |
When ambiguous between BEP and HVP, the contraction 'S (She's come) is tagged BEP by default. |
Nominal uses of the present participle are tagged as nouns.
one meeting/N, many meetings/NS and she graunted hem with wepynge/N it shold be done rychely
Alse ge hauen giwer lichame don to/TO hersumiende/VB fule lustes ; and unriht , alse do+d giwer lichame he+d to/TO hersumiende/VB clennesse . and rihtwisnesse . and holinesse
if/P you/PRO be/BEP there/ADV if/P you/PRO were/BED there/ADV were/BED you/PRO to/TO come/VB if/P they/PRO played/VBD with/P you/PRO
CAN, COULD, MAY, MIGHT, MOWE, MUST, SHALL, SHOULD, THARF, UTEN, WILL, WOULD |
These items are tagged MD when there is a reading available in which a main verb is elided. Otherwise, the item is tagged as a form of VB.
therefore ye may/MD sey what ye woll/MD ← elided SAY ' I woll/VBP well , ' seyde Balan ' that ye so do '
Modal Verb I dare/MD say . I dare/VBP to say it . I dared/MD not say it . I do not dare/VB (to) say it . They need/VBP to eat now . They need/MD not come . They do not need/VB to come . It needs/VBP them not to come.
Infinitive of modal
Untensed modal verbs (generally CAN, MOWE), which are attested into Modern
English, are tagged MD0.
supposyng +tat he schuld cun/MD0 best rede +te booke
+tu xalt mown/MD0 askyn what +tu wylt
Infinitive marker
(FOR, TO)
When ambiguous in infinitival contexts between FOR and P, FOR is tagged FOR. |
It is difficult for/FOR me to complete the work . It is more convenient for/P me for/FOR you to do the work . For/P me it is difficult to complete the work .
TO is used to tag any form of the infinitive marker (including the northern Middle English forms AT and TIL).
They want to/TO eat dinner now .
+te riht +tidir at/TO cume 'the right to come thither' +tien entent til/TO understand +tis wrytyng 'thy intent to understand this writing'
Clitic TO (enclitic on FOR or proclitic on a verb) is split off and each component receives its ordinary tag.
and fondede for@/FOR @to/TO slee Iustinianus &/CONJ/D eadie katerine bigon for@/FOR @te/TO seggen t@/TO @accept/VB it as his due t@/TO @aue/HV bounde them up in barrelles
Category | Positive | Comparative | Superlative | Possessive |
---|---|---|---|---|
adjective | ADJ | ADJR | ADJS | ADJ$ (very rare) |
adverb | ADV | ADVR | ADVS |
Departicipial adjectives (INTERESTED, INTERESTING, and the like) are tagged as ADJ rather than as participles (VAG, VAN), as in earlier releases.
In doubtful cases, such cases are tagged VAG or VAN. |
a shocking/ADJ revelation; a shocked/ADJ expression any living/ADJ creature; any creature now living/VAG
Fused forms with the distribution
of PPs (ALIVE, ASLEEP) are not treated as adjectives.
Degree words are treated as comparatives. See
AS, SO (degree),
ENOUGH,
and TOO.
Comparative adjective or adverb
a simpler/ADJR solution
Let's work smarter/ADVR, not harder/ADVR .
Positive or comparative?
Except when construed with a THAN phrase or clause, apparently comparative
forms that lack a base form are tagged ADJ rather than ADJR.
FORMER, INNER, NETHER, OUTER, UPPER |
LATER is tagged ADJ or ADV when not clearly comparative. For unclear cases, the default is ADJR or ADVR.
LATTER is ordinarily tagged ADJ. However, in early texts, it is tagged ADJR when it functions as the ordinary comparative of LATE (later replaced by the innovative LATER).
LOWER is tagged ADJ when contrasting with UPPER, and
as ADJR when contrasting with HIGHER. For unclear cases, the
default is ADJR.
Superlative adjective or adverb
in the greatest/ADJS church of London
They like this way best/ADVS .
Positive or superlative?
Apparently superlative forms that lack a base form are
tagged ADJ rather than ADJS. These include FIRST,
LAST and adjectives ending in -MOST (FOREMOST, INNERMOST, etc.).
Color
Color terms in adnominal or predicate position are tagged ADJ,
and otherwise N.
a pale blue/ADJ egg a very pale blue/N
a white/ADJ shirt the white/N of an egg
a lovely red/ADJ rose a lovely deep red/N
Ordinal number
Ordinal numbers are tagged ADJ, as are DOUBLE, TREBLE, TRIPLE,
and so on. FIRST also has an adverbial use.
Cases where an ordinal might be expected but without overt ordinal marking are treated as cardinal numbers (NUM).
the .x./NUM of April
P | preposition or subordinating conjunction |
Prepositions are tagged P.
the borders of/P England on/P your way to/P the/D king/N with/P Merlin
after/P they arrived ← cf. after/P their arrival before/P they left ← cf. before/P their departure
The
Items like ALIVE and ASLEEP are tagged P+N and treated
as written rather than as unitary
adjectives because they are barred from prenominal position.
Preposition with demonstrative (FOR+TAN, FOR+TI, etc.)
A preposition may be followed by a demonstrative (FOR +TAN, FOR +TI, FOR
+TAT, IN +TAT, WI+T +TAN, etc.), which in turn may be followed by a
complement clause or noun phrase, as in the examples below. If the
preposition and demonstrative are cliticized, the combination is
tagged P+D.
See Preposition plus demonstrative
plus clause or NP.
after/P +dan/D +de/C here herte leste , ic hem fol+gede For+di/P+D +dat/C ich nabbe ihafd rihte ileaue for/P +ti/D +tt/C +tu ne wilnest bute to seo mi wlite . ne speoke bute to me We ben tau+gt in/P +tat/D +tat/C we seon in suche creatures +te wonder werkes of vre Creatour
PUNC | all punctuation |
PUNC is not on the default "ignore_nodes" list of CorpusSearch and needs to be added, if necessary, to the list by the user (best in a preference file). |
Punctuation is attached as high as possible in the tree without regard to
where it "belongs" semantically.
Dash
Dashes in the original text are represented by hyphens in the annotated
corpora. At token boundaries, they go with the first token, if they are
the only punctuation at the boundary. Otherwise, they go with the second
token.
They are finished with the job ,/PUNC ← with preceding punctuation
-/PUNC or so they say .
They are finished with the job -/PUNC ← without preceding punctuation
or so they say .
Period
Periods that do not serve as sentence punctuation are not separated from
the word they belong to. Common cases include periods indicating
abbreviations or surrounding Roman numerals and sometimes ordinary words.
Mr./NPR
.x./NUM days/NS
+Tis/D word/N .credo./NPR Mon/MAN mei/MD understonden/VB ./PUNC on/P
+tro/NUM wise/N ./PUNC CMLAMB1.75.43/ID
Ne/NEG mei/MD na/Q .Mon./N cume/VB in/RP to/P godes/NPR$ riche/N CMLAMB1.73.17/ID
Quotation mark
In general, quotation marks around continuing lines of quoted speech have
been removed from the text. But those marking continuing paragraphs of
quoted speech have been retained.
Quantifier
List of tags
Category | Positive | Comparative | Superlative | Possessive |
---|---|---|---|---|
quantifier | Q | QR | QS | Q$ |
The following words are always tagged Q.
ALL, ANY, EACH, EVERY, FEW, MANY, |
FEWER and FEWEST are always tagged QR and QS, respectively.
LESS, LEAST and MUCH, MORE, MOST are not on the list above because they
are treated somewhat differently in
the PPCME2 and the later corpora.
Wh- word
List of tags
WADV | wh-adverb |
WD | wh-determiner |
WPRO | wh-pronoun |
WPRO$ | wh-pronoun, possessive |
WQ | WHETHER (and IF when heading indirect questions) |
Wh- words that are used as interjections are tagged with the
appropriate tag from the above list (not as INTJ). Their
function is indicated at the phrasal level;
see Interjection phrase
(INTJP).
Wh- adverb (WADV)
Wh- adverbs (HOW, WHEN, WHENCE, WHERE, WHITHER, WHY) are
tagged WADV. WHEN can also be
tagged P.
How/WADV would you do that ? When/WADV are they planning to arrive ? They remembered where/WADV to buy the tickets . the place where/WADV they are going
(WNP (WD what) (NS horses)) (WNP (WNP (WD which) (N prophet)) (CONJP (CONJ and) (NX (N law)))) (WNP (WNP (WNP what)) (CONJP (CONJ and) (WADVP (WADV where)))) (WNP (WPRO what) ← ADJP modifies WPRO (ADJP (ADJ else)))
For the treatment of WHAT THE DEVIL and similar expressions, see INTJP.
Otherwise, WHAT immediately preceding a determiner is tagged WD (even though this is arguably not the correct structure).
(WNP (WD what) (D a) (N nightmare))
What/WPRO did you say ? the bigger one , which/WPRO I showed you yesterday I know not whether/WPRO of hem is come to court . There began a new batayle, the whych/WPRO was sore and harde .
The tag for WHOSE is WPRO$.
Whose/WPRO$ is that umbrella ? by whose/WPRO$ commandment
We want to know if/WQ that/C they are coming.
C | complementizer |
CONJ | coordinating conjunction |
D | determiner |
FP | focus particle |
FW | foreign word |
INTJ | interjection |
NEG | negation |
NUM | cardinal number (except ONE) |
RP | adverbial particle |
X | unknown POS |
We know that/C you would like to visit us . the person that/C you would like to visit
AND,
NE,
NEITHER,
NOR,
OR,
|
It is possible for two coordinating conjunctions to be adjacent.
And/CONJ nor/CONJ is this the right answer . But/CONJ neither/CONJ is this the right answer .
In instances of correlative conjunction, each conjunction is tagged CONJ.
all periods | |
only Middle English |
both/CONJ you and/CONJ I either/CONJ you or/CONJ I neither/CONJ you nor/CONJ I
ai+der/CONJ +ge/CONJ hodede +ge/CONJ leawede
A(N), THAT, THE, THESE, THIS, THOSE, YON, YONDER |
Demonstratives are always tagged D, whether they precede a noun or not. Note the difference between ordinary determiners and wh- words in this regard.
+Tis/D +tat/C is i-seide in +te comyn table
all periods | ALONE, BUT, EVEN, ONLY |
only Middle English | FORTH, ONE, YET |
They alone/FP know the answer. Not even/FP they know the answer. Only/FP they know the answer.
+tat hie ne biholden non iuel ne non un-nut ne for+den/FP idel ← FORTH +De mann ne leue+d naht $be bread ane/FP ← ONE hwi wi+d_dra+gest +tu +tin hont . &/ +get/FP +tin king hond ← YET of midde +tine bosme
Certain abbreviations of foreign terms are listed in the OED, but their POS tag in English is difficult to determine. Such abbreviations are tagged X. Some examples include:
B.A., e.g., etc., i.e., M.A.
Foreign names and certain common Latin liturgical texts are treated as proper nouns. But foreign language titles of books are tagged FW.
a passage from Thomas/NPR Mann's/NPR$ Zauberberg/FW In the Alcoranum/FW it is written in the prologue on Regum/FW
In foreign language sequences, everything (words, symbols, numbers, etc.) except punctuation is labeled FW.
libro/FW 5=o=/FW ,/PUNC capitulo/FW 24=o=/FW
AH, ALAS, AMEN, AYE, FAREWELL, FIE, GAR (< God), GOOD-BYE, GRAMERCY, HA, HULLO, LA, LO, NAY, NO, OH, PARDEE, POOF, WASSAIL, WELAWEI, YEA, YES, WITECRIST |
Items like FORSOOTH, MARY (and spelling variants), and various wh- words (WHAT, WHY) are not tagged as INTJ, even when used as interjections, but their function can be indicated at the phrasal level. See Interjection phrase (INTJP) for examples.
NO is tagged INTJ when the negative equivalent of YES.
PRAY (and variants like
PRITHEE) is never tagged INTJ.
Negation (NEG)
When used as simple negation, NE
and NOT are tagged NEG, as are
NO and NONE in
WHETHER OR NO clauses. All
four items also have other uses; follow the links on the words.
You should not/NEG be late for the lecture .
non senne ne/NEG mai bien idon bute +durh unhersumnesse me/MAN ne/NEG net me/PRO noht/NEG te forsweri+gen wheither/WQ he/PRO wol/MD doon/DO or/CONJ no/NEG wheither it oghte nedes be doon or noon/NEG
Proclitic negation on verbs and modals in Middle English is split, as is enclitic negation in later stages of English.
I n@/NEG @el/MD neuere go hennys . We can@/MD @not/NEG help them . We ca@/MD @n't/NEG help them .
When overtly marked for plural (DOZENS, SCORES, HUNDREDS, THOUSANDS,
MILLIONS, etc.), number words are tagged NS. Often, such forms
take PP complements (nine millions of subjects), which clearly
shows the status of the number word as a nominal head. In a very few
cases, where a plural number word is immediately followed by another
number word and would in modern usage be replaced by the singular (as in
nine millions three-hundred thousand), the number word is tagged
as NUM.
Singular number words (DOZEN, SCORE, and sometimes HUNDRED,
THOUSAND, MILLION) are treated as N when followed by a PP
complement.
Numbers in foreign language sequences are treated as
foreign words.
Otherwise, unless used as list markers
(LS), all cardinal numbers except for ONE
are tagged NUM, whether spelled out, in numeral form, or in
some combination of the two. In the lemmatized corpora, all numbers are
represented as single orthographic words; see the lemmatization
guidelines for details.
TWICE and THRICE are tagged NUM, as
is ONCE when analogous in meaning.
Cases where an ordinal might be expected but without overt ordinal
marking are treated as cardinal number
and tagged NUM.
22/NUM parts
xxij./NUM parts
twenty-two/NUM parts
two_and_twenty/NUM parts
xxx=ti=/NUM parts
xxx_c/NUM parts
Twice/NUM , we ran out of cash .
The horse turned around thrice/NUM .
the ij./NUM day
the .ix./NUM chapter
Adverbial particle (RP)
The criteria for distinguishing adverbial particles (RP) from
other adverbs (ADV) are difficult to make explicit in every
case. Following the Brown Corpus, we tag the following words
RP when they do not take a complement.
ABOUT, ACROSS, BY, DOWN, FRO, IN, OFF, ON, OUT, OVER, THROUGH, TO, UP |
Humpty Dumpty fell down/RP from his proud perch . Sir Hector tried to pull out/RP the sword . And Sir Ralph of Beeston gave up/RP the castle to the king .
Items from the above list that modify a prepositional phrase continue to be tagged as particles as long as they are spelled as separate words (notably IN TO and UP ON, but not ADOWN, APON, INTO, UNTO, or UPON).
kneeling down/RP upon/P his knees out/RP of/P the castle down/RP to/P the earth
Some items from the above list can combine with -WARD.
See -WARD for details.
As mentioned earlier, X is also used to tag
foreign abbreviations with an OED
entry, but without an obvious POS tag in English (notably, ETC.).
Unknown POS (X)
Words with unknown POS are tagged X.
(NP-OB1 (NUM C) (N myle)
(X li))
( (IP-MAT (CONJ And)
(X +tet) ← mistranslation by Dan Michel
(PUNC /)
(PP (P yef)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (PRO hit))
(NP-OB2 (PRO him))
(NEG ne)
(BEP is)
(NEG na+gt)
(ADJP-PRD (ADJ wor+t)))))
(PUNC :)
(NP-SBJ (PRO he))
(NP-OB2 (PRO him))
(VBP zay+t)
(PUNC .)
(PUNC ')
(IP-IMP-SPE (VBI (VBI eth) (PUNC /) (CONJ an) (VBI drink))
(PP (P ase)
(NP (D +te) (ADJ ilke))))
(PUNC /))
(ID CMAYENBI-M2,54.978))
( (IP-MAT (CONJ and)
(NP-SBJ (QP (ADVR so) (Q meny))
(CP-DEG *ICH*-1))
(BED were)
(VAN slayne)
(PUNC ,)
(PP (X what) ← likely mistranslation of French "que ... que"
(PP (P in)
(NP (ONE o) (N side)))
(CONJP (CONJ and)
(PP (P in)
(NP (D +tat) (OTHER o+tere)))))
(PUNC ,)
(CP-DEG-1 (C +tat)
(IP-SUB (NP-SBJ (PRO hit))
(BED was)
(NP-PRD (ADJ grete) (N pite)
(CP-TMC (WNP-2 0)
(IP-INF (IP-INF (NP-OB1 *T*-2)
(TO to)
(VB wete))
(CONJP (CONJ and)
(IP-INF (NP-OB1 *T*-2)
(TO to)
(VB seen))))))))
(PUNC .))
(ID CMBRUT3-M3,87.2622))
( (IP-MAT (META (NP (CODE <font>) (NPR Ga~mer) (CODE <$$font>)))
(INTJ Alas)
(NP-VOC (N sir))
(PUNC ,)
(NP-SBJ (PRO hee@))
(MD @l)
(BE be)
(ADVP-LOC (ADV here))
(ADVP-TMP (ADV anon))
(PUNC ,)
(FRAG (X ha)
(BE be)
(VAN handled)
(ADVP (ADVR to) (ADV bad)))
(PUNC .))
(ID STEVENSO-1558-E1-H,57.293))