Note the absence of a hyphen in all variants of the name of this search function. |
"x cCommands y" is true if and only if:
B c-commands C and F. Both C and F c-command B, D and E. D and E c-command only each other. A c-commands no node because, being the root of the tree, it dominates all of the other nodes.A / \ B C / \ \ D E F
Example query:
query: (NP-SBJ* cCommands PP*)
Example output:
/* 1 IP-MAT: 2 NP-SBJ, 13 PP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
query: (CODING-IP-MAT column 4 !x|y|z)
Example output:
/* 1 IP-MAT: 2 CODING-IP-MAT, 3 a:b:c:d:e */ (0 (1 IP-MAT (2 CODING-IP-MAT a:b:c:d:e) (4 NP-SBJ (5 D The) (7 ADJ quick) (9 ADJ brown) (11 N fox)) (13 VBP jumps) (15 PP (16 P over) (18 NP (19 D the) (21 ADJ lazy) (23 N dog))) (25 . .)) (27 ID SAMPLE,1))
Example query:
(IP-MAT* dominates ADJ)
Example output:
/* 1 IP-MAT: 1 IP-MAT, 5 ADJ 1 IP-MAT: 1 IP-MAT, 7 ADJ 1 IP-MAT: 1 IP-MAT, 19 ADJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Text can serve as the second argument of "dominate". But searches where
text serves as the first argument of "dominates" are not sensible, since
text consists of terminal nodes and terminal nodes by definition do not
dominate a subtree. CorpusSearch performs such searches without issuing
a warning, but they return no hits.
domsWords (variants: DomsWords, domswords)
"domsWords" matches nodes that dominate the specified number of
words. For instance, "domsWords 4" means "dominates 4 words". A
word is defined as a terminal that is not on the
ignore_words list.
Example query:
node: IP-MAT* (NP* domsWords 3)
Example output:
/* 16 NP: 16 NP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (NP domsWords< 3)
Example output:
The sample sentence doesn't match the query.
domsWords> (variants: DomsWords>, domswords>)
"domsWords>" (read "domsWordsMoreThan") is just
like domsWords except that it returns nodes
that dominate strictly more than the given number of words.
Example query:
node: IP-MAT* query: (NP* domsWords> 3)
Example output:
2 NP-SBJ: 2 NP-SBJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (VBP exists)
Example output:
1 IP-MAT: 11 VBP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
(NP* hasLabel NP-SBJ)
Example output:
2 NP-SBJ: 2 NP-SBJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
A common error is to assume that "x hasSister y" implies that x precedes y. It does not, as the example output makes clear. Any precedence relations must be stated separately. |
Example query:
node: IP-MAT* query: (N hasSister ADJ)
Example output:
/* 1 IP-MAT: 9 N, 5 ADJ 1 IP-MAT: 9 N, 7 ADJ 1 IP-MAT: 21 N, 19 ADJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
query: (IP-MAT* iDominates NP*)
Example output:
If the asterisk were missing after "NP" in the query, the query would return no hits, since the only bare NP in the sentence is a daughter of PP, not of IP-MAT. Compare the bare NP query just discussed to the example query for iDomsMod./* 1 IP-MAT: 1 IP-MAT, 2 NP-SBJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (NP-SBJ iDomsFirst D)
Example output:
/* 1 IP-MAT: 2 NP-SBJ, 3 D */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (IP-MAT* iDomsLast PP)
Example output:
/* 1 IP-MAT: 1 IP-MAT, 13 PP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (IP-MAT* iDomsMod ADVP|PP NP)
Example output:
/* 1 IP-MAT: 1 IP-MAT, 16 NP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Replacing "NP" in the example query by "NP*" would result in two matches - the object of the preposition (with intervening PP), as above, and the subject NP (without an intervener).
A very common use of this function is in connection with conjunction structures, allowing non-first conjuncts to be treated on a par with first conjuncts.
Example query:
node: IP-MAT* query: (NP-SBJ iDomsMod NP*|CONJ* PRO)
Example output:
/* 1 IP-MAT: 2 NP-SBJ, 12 PRO */ (0 (1 IP-MAT (2 NP-SBJ (3 NP (4 D The) (6 N king)) (8 CONJP (9 CONJ and) (11 NP (12 PRO I)))) (14 VBD danced) (16 NP-MSR (17 Q all) (19 N night)) (21 . .)) (23 ID SAMPLE,2))
Example query:
node: IP-MAT* query: (NP iDomsNumber 2 ADJ)
Example output:
If "NP" in the query were replaced by "NP*", the query would also match the subject NP./* 1 IP-MAT: 16 NP, 19 ADJ */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1)) /*
Example query:
(NP* iDomsOnly N)
Example output:
The sample sentence doesn't match the query, since all of the NP nodes
have more than one child.
iDomsTotal (variants: idomstotal)
"iDomsTotal" returns structures containing nodes with the specified
number of daughters.
Though traces and "0" are on the default ignore_words list, they are not on the default ignore_nodes list and need to be added, if necessary, with add_to_ignore. |
Example query:
(NP* iDomsTotal 3)
Example output:
/* 1 IP-MAT: 16 NP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1)) /*
The sample sentence matches the following query because punctuation is ignored by default.
Example query:
(IP-MAT* iDomsTotal 3)
Example output:
/* 1 IP-MAT: 1 IP-MAT */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
(NP* iDomsTotal< 4)
Example output:
/* 1 IP-MAT: 16 NP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
(NP* iDomsTotal> 4)
Example output:
The sample sentence doesn't match the query.
The syntactic categories and indices of x and z must match (apart from any
indices). CorpusSearch considers any string enclosed in asterisks and
followed by a shared index as a trace. (In corpora following the
annotation guidelines for the Penn Parsed Corpora of Historical English,
only "*T*" or "*ICH*", followed by an index, count as legal traces.) The
label of the trace is specified in the query, using ordinary regular
expression syntax; see the example query for details. The node boundary
(IP-MAT* in the example below) must include both the trace and the
coindexed constituent.
This function is useful to facilitate searches dealing with extraposed
constituents. We begin with a single line of a query that we embed
directly in a more realistic search context.
Example query:
In the example output:
Example output:
The following query calls "iDomsViaTrace" in a more realistic context. x,
y, z, and t remained unchanged from above.
Example query:
Example output:
Example query:
Example output:
The inID command is especially useful in coding queries, where it can be used to "import"
information known to be associated with a particular file (date of
composition, dialect, information about author, etc.) into a coding
string. For instance:
Example columns from coding query:
The algorithm for "x iPrecedes y" runs as follows:
iDomsViaTrace (variants: idomsviatrace)
Node x immediately dominates node y via trace t (with antecedent z) if and
only if:
node: IP-MAT*
query: (CP-REL iDomsViaTrace \*ICH* IP-SUB*)
/~*
The quick brown fox jumped over the sleeping dog quickly who woke up with a
start.
(SAMPLE,3)
*~/
/*
1 IP-MAT: 23 CP-REL, 32 IP-SUB
*/
(0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
(11 VBD jumped)
(13 PP (14 P over)
(16 NP (17 D the)
(19 ADJ sleeping)
(21 N dog)
(23 CP-REL *ICH*-1)))
(25 ADVP (26 ADV quickly))
(28 CP-REL-1 (29 WNP-2 (30 WPRO who))
(32 IP-SUB (33 NP-SBJ *T*-2)
(35 VBD woke)
(37 RP up)
(39 PP (40 P with)
(42 NP (43 D a) (45 N start)))))
(47 . .))
(49 ID SAMPLE,3))
node: IP-MAT*
query: (NP iDoms CP-REL)
AND (CP-REL iDomsViaTrace \*ICH* IP-SUB*)
AND (IP-SUB* iDomsMod NP-SBJ* \*T*)
/*
1 IP-MAT: 16 NP, 23 CP-REL, 32 IP-SUB, 34 *T*-2
*/
(0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
(11 VBD jumped)
(13 PP (14 P over)
(16 NP (17 D the)
(19 ADJ sleeping)
(21 N dog)
(23 CP-REL *ICH*-1)))
(25 ADVP (26 ADV quickly))
(28 CP-REL-1 (29 WNP-2 (30 WPRO who))
(32 IP-SUB (33 NP-SBJ *T*-2)
(35 VBD woke)
(37 RP up)
(39 PP (40 P with)
(42 NP (43 D a) (45 N start)))))
(47 . .))
(49 ID SAMPLE,3))
inID (variants: InID)
"inID" is true of substrings of the ID node. This function is
necessary because the ID node, being outside the parsed sentence,
cannot serve as an ordinary search-function argument. For instance,
"(ID iDoms *)" returns no hits.
query: (SAMP* inID)
/*
0 : 25 ID, 26 SAMPLE,1
*/
(0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
(11 VBP jumps)
(13 PP (14 P over)
(16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
(23 . .))
(25 ID SAMPLE,1))
// date of author's birth
11: {
\1490: (ABOTT-E1* inID)
\1630: (ALHATTON2-E3* inID)
\1680: (ALHATTON-E3* inID)
...
}
// author's sex
13: {
f: (ABOTT*|ALHATTON* in ID)
m: ELSE
}
iPrecedes (variants: iprecedes, iPres, ipres)
"iPrecedes" is true if and only if x does not dominate y, and x comes
immediately before y in the string.
A common error is to assume that "x iPrecedes y" implies that x and y are sisters. It does not, as the example output makes clear. Any sisterhood relations must be stated separately. |
Example query:
query: (N iPrecedes VBP)
Example output:
/* 1 IP-MAT: 9 N, 11 VBP */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
// boundary node must be specified, but is ignored node: NP query: (IP* isRoot)
Example output:
/* 0 : 1 IP-MAT */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
(VBP precedes N)
Example output:
1 IP-MAT: 11 VBP, 21 N */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBP jumps) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ lazy) (21 N dog))) (23 . .)) (25 ID SAMPLE,1))
Example query:
node: IP-MAT* query: (NP* sameIndex CP*) AND (NP* iDoms \*exp*)
Example output:
/* 1 IP-MAT: 2 NP-SBJ-1, 3 *exp*, 9 CP-THT-1 */ (0 (1 IP-MAT (2 NP-SBJ-1 *exp*) (4 NP-OB2 (5 PRO hym)) (7 VBD thought) (9 CP-THT-1 (10 C 0) (12 IP-SUB (13 NP-SBJ-2 (14 EX there)) (16 BED was) (18 VBN com) (20 PP (21 P into) (23 NP (24 PRO$ hys) (26 N londe))) (28 NP-2 (29 NS gryffens) (31 CONJ and) (33 NS serpentes)))) (35 E_S ,)) (37 ID CMMALORY,33.1031))
When searching for antecedents of traces, it is the trace itself (not the category dominating it) that is the argument of sameIndex.
Example query:
node: IP-MAT* query: (CP-REL* sameIndex \*ICH*)
Example output:
/* 1 IP-MAT: 28 CP-REL-1, 24 *ICH*-1 */ (0 (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox)) (11 VBD jumped) (13 PP (14 P over) (16 NP (17 D the) (19 ADJ sleeping) (21 N dog) (23 CP-REL *ICH*-1))) (25 ADVP (26 ADV quickly)) (28 CP-REL-1 (29 WNP-2 (30 WPRO who)) (32 IP-SUB (33 NP-SBJ *T*-2) (35 VBD woke) (37 RP up) (39 PP (40 P with) (42 NP (43 D a) (45 N start))))) (47 . .)) (49 ID SAMPLE,3))