Hindle, Don. 1983.
Deterministic parsing of syntactic non-fluencies.
In: Proceedings of the 21st Annual Meeting of the
Association for Computational Linguistics.
123–128.
https/doi.org/10.3115/981311.981336.
FS (false start)
FS indicates a false start and is the default annotation for syntactic
disfluencies. False starts can involve incomplete words, copies of
single words and strings, and incomplete structures followed by a
reprise containing (largely) parallel structures. False starts are
determined by eliminating disfluent material until the result is a
grammatical structure. When there is more than one way to eliminate
disfluent material, as there generally is, it is early material that is
eliminated rather than late material (hence, the term "false start").
In other words, the last copy in a sequence of copies counts as the
text; the earlier copies count as false starts (Hindle
1983, p. 125, section 4.1). Material that is part of a false start
remains available for computation in other modules
(Hindle 1983, p. 126).
In practical terms, the most efficient way to determine the boundaries of a false start is to start at the end of a sentence token and work backwards. |
FS can be used as a POS label to tag incomplete words, which are marked as such with a trailing hyphen. In general, however, FS encloses sequences of words, including ones labelled as FS. False starts attach as high as is structurally possible. (This is to eliminate needlessly time-consuming attachment decisions that will inevitably result in inconsistencies.)
( (IP-MAT (NP-SBJ (PRO They)) (VP (VBD said) (CP-THT (C 0) (FS (FS sh-) (PRO she)) (IP-SUB (NP-SBJ (PRO she)) (VP (HVD had) (NP-OB1 (D the) (N-COMP (N bold) (NS hives))))))) (PUNC .)))
Interjections do not count as part of false starts.
( (IP-MAT (NP-SBJ (PRO They)) (VP (VBD said) (CP-THT (C 0) (FS (FS sh-) (PRO she)) (PUNC ,) (INTJ uh) (PUNC ,) (FS (PRO she) (HVD had)) (IP-SUB (NP-SBJ (PRO she)) (VP (HVD had) (NP-OB1 (D the) (N-COMP (N bold) (NS hives))))))) (PUNC .)))
With some exceptions in the case of very long false starts, the internal constituent structure of false starts is not annotated.
( (IP-MAT (INTJ Uh) (PUNC ,) (FS (NP-SBJ (PRO I)) (VP (VBP think) (CP-THT (C 0) (IP-SUB (NP-SBJ (D the) (N story) (PP (P of) (NP (CP-FRL (WNP-1 (WPRO what@)) (IP-SUB (NP-SBJ *T*-1) (VP (HVP @'s) (VP (VBN happened) (PP (P to) (NP (PRO me)))))))))) (VP (MD will) (VP (VB make) (NP-OB1 (D a) (ADJP (ADVP (ADV very)) (FS in-))))))))) (INTJ oh) (PUNC ,) (NP-SBJ (PRO I)) (VP (VBP know) (CP-THT (C 0) (IP-SUB (NP-SBJ (PRO it@)) (VP (MD @'ll) (VP (VB make) (NP-OB1 (D a) (ADJP (ADVP (ADV very)) (ADJ interesting)) (N book))))))) (PUNC .)))
( (IP-MAT (NP-SBJ (PRO They)) (VP (MD $could) (NEG $n't) (VP (VB believe) (CP-THT (C 0) (IP-SUB (NP-SBJ (PRO we)) (VP (VBD bought) (NP-OB1 (D a) (CODE <BREAK>))))))))) ← token-final BREAK ( (IP-MAT (NP-SBJ (D the) (N-COMP (N church) (N house))) (VP (BED was) (NP-MSR (FP just) (D a) (ADJP (ADJ little)) (CODE <BREAK>)) ← token-internal BREAK (CODE <$$RNapier_xmax=746.16>) (CODE) (ADVP-LOC (CP-FRL (WADVP-1 (WADV where)) (IP-SUB (NP-SBJ (PRO I)) (VP (BED =uz) (VP (ADVP-LOC *T*-1) (VAN raised))))))) (PUNC .)))
Ordinary ellipsis is not annotated with BREAK.
( (IP-MAT (NP-SBJ (PRO I)) (VP (VBD said) (CP-THT (C that) (IP-SUB (NP-SBJ (PRO I)) (VP (MD would) (VP (VB help)))))) (PUNC ,))) ( (IP-MAT (CONJ and) (NP-SBJ (PRO I)) (VP (DOD did)) ← no explicit indication of elided main verb (PUNC ,)))
Elaborations attach as daughters of the constituent they are construed with if that is structually possible. Otherwise, they attach as low and close as they can to the constituent. Eventually, the annotation will include an *ICH* trace.
It is sometimes difficult to distinguish between elaborations and conjuncts. In general, conjunction structures are marked with at least one overt conjunction (if only one, generally introducing the last conjunct). In other words, asyndetic phrases are more likely to be elaborations than conjuncts, and so, unless a phrase is easily interpreted as a non-final conjunct, the default for cases that are ambiguous between elaboration and conjunction structure is ELAB.
( (IP-MAT (NP-SBJ (PRO It@)) (VP (BEP @s) (NP-PRD (D a) (N problem) (ELAB (NP (D a) (ADJP (ADJ real)) (N problem))))) (PUNC .))) ( (IP-MAT (CONJ and) (NP-TMP (D a) (N lot) (PP (P of) (NP (NS times)))) (NP-SBJ (PRO hit@)) (VP (MD @'ll) (VP (VP (VB fall)) (CONJP (CONJ and) (VP (VB kill) (NP-OB1 (D the) (NS people)))) (PUNC ,) (ELAB (VP (GT get) (ADJP-PRD (ADJ loose)))))) (PUNC ,))) ( (IP-MAT (INTJ Well) (PUNC ,) (PP (P for) (NP (D a) (N while))) (PUNC ,) (NP-SBJ (PRO they@)) (VP (HVD @'d) (VP (A a=) (VBN used) (NP-OB1 (NS mules)) (PUNC ,) (ELAB (VP (VBN pulled) (NP-OB1 (PRO it)) (RP out) (PP (P with) (CODE <BREAK>)))))) (PUNC ,))) ( (IP-MAT (INTJ Hmm) (PUNC ,) (INTJ well) (PUNC ,) (CP-ADV (C after) (IP-SUB (NP-SBJ (PRO they)) (VP (VBD quit) (NP-OB1 (N work))))) (PUNC ,) (NP-SBJ (PRO I)) (VP (VBP imagine) (CP-THT (C 0) (IP-SUB (NP-SBJ (D the) (QP (QS most)) (PP (P of) (NP (PRO them)))) (VP (VBD drifted) (ADVP-DIR (ADV away)) (PUNC ,) (ELAB (VP (VBD left))))))) (PUNC .))) ( (IP-MAT (NP-SBJ (NPR Indianapolis) (PUNC ,) (ELAB (PP (P to) (NP (NPR Indiana))))) (PUNC ,) (PAREN (IP-MAT (NP-SBJ (PRO I)) (VP (VBP imagine)))) (VP (MD =ud) (VP (BE be) (NP-PRD (ADVP (ADV about)) (D the) (ADJP (ADJS farthest)) (N place)))) (PUNC .)))
The repetition must be exact; near repetitions are annotated as elaborations (ELAB). Otherwise, the conventions for false starts (FS) are applied.
The attachment of repetitions obeys the same rules as for elaborations.
In order to enable searches for exact sentential repetition, clauses can be enclosed in REP.
( (IP-MAT (NP-SBJ (PRO It@)) (VP (BEP @s) (NP-OB1 (D a) (N problem) (PUNC ,) (REP (D a) (N problem)) (PUNC ,) (REP (D a) (N problem)))) (PUNC .))) ( (IP-MAT (NP-SBJ (PRO I)) (VP (DOP do)) (PUNC ,) (REP (PRO I) (DOP do)) (PUNC .)))
I DON'T KNOW, I WOULD SAY, IT SEEMS (LIKE), LET'S SEE, LOOK, SEE, WAIT, YOU KNOW, YOU SEE |
PAREN is not the default annotation; in other words, when verbs in these expressions can be interpreted as taking ordinary complements, the default is to annotate them that way.
( (IP-MAT (NP-SBJ (PRO It)) (VP (VBP seems) (CP-THT (C 0) ← default - no PAREN (IP-SUB (NP-SBJ (D that@)) (VP (BEP @'s) (NP-PRD (D a) (N problem)))))) (. .))) ( (IP-MAT (NP-SBJ (D that@)) (VP (BEP @'s) (NP-PRD (D a) (N problem))) (, ,) (PAREN (IP-MAT (NP-SBJ (PRO it)) (VP (VBP seems)))) (. .))) ( (IP-MAT (NP-SBJ (D that)) (, ,) (PAREN (IP-MAT (NP-SBJ (PRO it)) (VP (VBP seems)))) (, ,) (VP (BEP is) (NP-PRD (D a) (N problem))) (. .)))
PAREN also encloses the following constructions. Again, the list is intended to be illustrative rather than exhaustive.
( (IP-MAT (NP-SBJ (PRO they@)) (VP (MD @'d) (VP (VB issue) (NP-OB1 (D that)) (PUNC ,) (PAREN (IP-MAT (NP-SBJ (PRO you)) (VP (VBP know)))) (PUNC ,) (PP (P at) (NP (D the) (N office))))) (PUNC .))) ( (CP-QUE-MAT (CONJ And) (PUNC ,) (PAREN (IP-IMP (VP (VBI let@) (IP-ECM (NP-SBJ (PRO @'s)) (VP (VB see)))))) (PUNC ,) (WNP-1 (WD what) (ADJP (ADJ other)) (N-COMP (N childhood) (NS diseases))) (IP-SUB (IGNORE-BEP-2 are) (NP-SBJ (EX there)) (VP (BEP *-2) (NP-LGS *T*-1))) (PUNC ?)))
Instances of PAREN are generally clauses (IP or CP) or instances of parenthetical gapping. There are a handful of exceptions, which are arguably better treated as elaborations (ELAB).
( (PP (PP (P to) (NP (NPR Verda))) (PUNC ,) (CONJP (CONJ or) (NP (ADJP (ADJR further)) (PUNC ,) (ELAB (PP (ADVP (ADV almost)) (PAREN (NP (QP (Q some)) (PP (P of) (NP (PRO them))))) (P from) (NP (NUMP (ADVP (ADV about)) (NUM seven)) (NS mile))))))))
( (IP-MAT (NP-SBJ (PRO They)) (VP (MD 0) (VP (FP just) (CODE <$$MJohnson_xmax=347.34>) (SPCH (CODE <JReynolds_xmin=345.9>) (INTJ Right) (PUNC ,) (CODE <$$JReynolds_xmax=346.18>)) (CODE <$$overlap>) (CODE <MJohnson_xmin=347.95>) (VB take) (NP-OB1 (D the) (N test)))) (PUNC ,)))