"Labels" are the all upper-case tags inserted by the linguists who prepared the corpus (e.g., "IP", "CONJ", "N".) "Words" refers to the mostly lower-case original words of text (e.g. "so", "hit".) Every node in the sentence-tree has a label, and the leaf nodes also have words. CorpusSearch can conduct searches on labels or words. In practice, the vast majority of searches look for labels only.
CorpusSearch uses case-sensitive character-by-character string matching to match search-function arguments to strings found in the input. Therefore, spelling and upper-case/lower-case variations must be described explicitly (usually with an argument list.) For instance, this query searches for a complementizer whose associated text is "that" or "That":
(C iDominates that|That)
and finds sentences such as this:
/~* and he shalle do yow remedy, that youre herte shal be pleasyd. ' (CMMALORY,3.47) *~/ /* 12 CP-ADV: 13 C that */ (NODE (12 CP-ADV (13 C that) (14 IP-SUB (15 NP-SBJ (16 PRO$ youre) (17 N herte)) (18 MD shal) (19 BE be) (20 VAN pleasyd))) (ID CMMALORY,3.47))
For the purposes of dominance, a words and its associated node label are considered separate objects. Thus, in the sentence below, "PRO" dominates "hit". For the purposes of precedence, a word and its associated label are considered to be one object. Thus, "that" sister-precedes "rocche" in this sentence, because the labels associated with "that" and "rocche" are sisters.
/~* and so hit londid undir that rocche. (CMMALORY,667.4861) *~/ /* 1 IP-MAT: 11 D that, 12 N rocche */ (0 (1 IP-MAT (2 CONJ and) (3 ADVP (4 ADV so)) (5 NP-SBJ (6 PRO hit)) (7 VBD londid) (8 PP (9 P undir) (10 NP (11 D that) (12 N rocche)) (13 E_S .)) (ID CMMALORY,667.4861))
Definition Files Table of Contents |