Disfluencies


The following labels and dash tags are used to annotate disfluencies (Hindle 1983) where necessary. Disfluences are almost entirely restricted to plays in the PPCMBE2.

Hindle, Don. 1983. Deterministic parsing of syntactic non-fluencies. In Proceedings of the 21th Annual Meeting of the Association for Computational Linguistics, 123–128.

BREAK

BREAK indicates that a phrase or sentence breaks off or is otherwise left unfinished.

( (IP-MAT (NP-SBJ (PRO They))
          (MD could@)
          (NEG @n't)
          (VB believe)
          (CP-THAT (C 0)
	           (IP-SUB (NP-SBJ (PRO we))
                           (VBD saw)
                           (NP-OB1 (D a) (BREAK 0))))
           (PUNC ...)))

-ELAB

The dash tag -ELAB indicates that a parenthetical constituent (generally already bearing a -PRN dash tag) elaborates on its host. Exact copies of a constituent are annotated as
repetitions (REP).

( (IP-MAT (NP-SBJ (PRO It))
  	  (BEP is)
	  (NP-PRD (D a) (N trifle)
         	  (PUNC ,)
	  	  (NP-PRN-ELAB (D a) (ADJ mere) (N trifle)))
	  (PUNC .)))

FS

FS indicates a false start. It is the default annotation for disfluencies.

( (IP-MAT (FS (PRO They) (PRO they) (PRO they))              ← like this, with FS
          (NP-SBJ (PRO they))
          (ADVP (ADV really))
          (VBD meant)
          (NP-OB1 (PRO it))
          (PUNC ?)))

( (IP-MAT (NP-SBJ (PRO They))
          (REP (PRO they) (PRO they) (PRO they))             ← not like this, with REP
          (ADVP (ADV really))
          (VBD meant)
          (NP-OB1 (PRO it))
          (PUNC ?)))

Internal structure within false starts may be indicated, but breaks are not explicitly indicated.

( (IP-MAT (FS (IP-MAT (NP-SBJ (PRO I))
                      (DOD did@)
		      (NEG @n't)
		      (VB know)))
          (PUNC -)
          (FS (IP-MAT (NP-SBJ (CP-FRL (WNP-1 (WPRO what))
	                              (C 0)
			              (IP-SUB (NP-SBJ (PRO I))
			                      (VBP mean)
				              (IP-INF (NP-OB1 *T*-1)
				  	              (TO to)))))))
          (NP-SBJ (PRO I))
          (BEP am)
          (ADJP-PRD (ADV very) (ADJ sorry)
	            (IP-INF (TO to)
                            (VB hear)
                            (PP (P of)
			        (NP (PRO$ your) (N loss)))))
          (PUNC .)))

REP

REP (in contrast to the dash tag -ELAB) is a full-fledged tag indicating the exact repetition of a constituent, chiefly for rhetorical reasons.

In cases that are ambiguous between REP and FS, FS is the default, as illustrated earlier.

Near repetitions are annotated as elaborations (-ELAB).

( (IP-MAT (NP-SBJ (PRO It))
  	  (BEP is)
	  (NP-PRD (D a) (N trifle)
                  (PUNC ,)
	  	  (REP (NP (D a) (N trifle))))
	  (PUNC .)))