Known issues

We hope to address the following issues in future releases:

Currently treated as a single lexical item, as in (i), ANOTHER should perhaps better be split, as in (ii).

(i)  (D another)
(ii) (D an@) (ADJ @other)

Complex "plus" tags

Morphologically complex quantifiers, consisting of one of the quantifiers ANY-, EVERY-, NO-, or SOME- combining with one of the nominal heads -BODY, -ONE or -THING, are currently tagged with special "plus" tags, as in (i), but should perhaps better be split, as in (ii).

(i)  (Q+N everyone)
(ii) (Q every@) (N @one)

Empty heads

Empty heads of phrases are not always explicitly included.

Extralinguistic material

There is no current annotation guideline for whether non-linguistic material such as coughs, laughter, and so on, forms part of an adjacent token or stands alone. In the future, we plan to impose a stand-alone default.

Extraposition

Extraposed elaborations on subjects inconsistently attach as sisters of the subject (daughters of IP) or as daughters of VP.

The corpus contains sporadic *ICH* traces of extraposition.

False starts

The internal structure of short false starts is sometimes annotated, contrary to the current guidelines.

There are no current guidelines about when to annotate disfluent material as a single long false start or a sequence of several.

POS tags within false starts are more likely to contain errors than those in the main parsed structure.

According to the current guidelines, trailing hyphens are intended to mark incomplete words. But in many cases, trailing hyphens mark complete words within false starts, notably the final ones.

FRAG

Some FRAGs should likely be annotated as IPs with missing subjects, and vice versa.

Hyphenation

The conventions for hyphenation in the Praat transcripts and in the parsed corpus are not completely consistent. Users unable to find a hyphenated word from the parsed corpus in the Praat transcripts should keep this in mind and revise searches accordingly.

Interjections

Interjections sometimes form separate sentence tokens, even when they shouldn't according to the current guidelines.

Punctuation

Punctuation in the AAPCAppE is not always consistent with the current guidelines. But since punctuation is not a part of the audio signal, even widespread inconsistency should not affect the utility of the corpus for linguistic research.

Even the documentation (in sections other than those devoted to punctuation) may be inconsistent.

QTP

There are likely to be instances of direct speech that are not marked as QTP.

XX

Some material tagged as XX probably contains overlooked false starts.