Search functions


General considerations

Search functions

In the examples below, the output sentence is generally the same, but it matches the example queries for different reasons, as is evident from the indices in the
result block.

cCommands (variants: ccommands, CCommands)

Note the absence of a hyphen in all variants of the name of this search function.

"x cCommands y" is true if and only if:

In the following tree,
    A
   / \
  B   C
 / \   \
D   E   F
B c-commands C and F. Both C and F c-command B, D and E. D and E c-command only each other. A c-commands no node because, being the root of the tree, it dominates all of the other nodes.

Example query:

query: (NP-SBJ* cCommands PP*)

Example output:

/*
1 IP-MAT:  2 NP-SBJ, 13 PP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

column (variants: Column, col, Col)

"column" searches columns of
CODING nodes, or any other leaf whose text contains characters separated by a colon.

Example query:

query:  (CODING-IP-MAT column 4 !x|y|z)

Example output:

/*
1 IP-MAT:  2 CODING-IP-MAT, 3 a:b:c:d:e
*/
(0  (1 IP-MAT (2 CODING-IP-MAT a:b:c:d:e)
	      (4 NP-SBJ (5 D The) (7 ADJ quick) (9 ADJ brown) (11 N fox))
	      (13 VBP jumps)
	      (15 PP (16 P over)
		     (18 NP (19 D the) (21 ADJ lazy) (23 N dog)))
	      (25 . .))
    (27 ID SAMPLE,1))

dominates (variants: Dominates, doms, Doms)

"x dominates y" means that y is contained in the subtree rooted in x, no matter now deeply embedded.

Example query:

(IP-MAT* dominates ADJ)

Example output:

/*
1 IP-MAT:  1 IP-MAT, 5 ADJ
1 IP-MAT:  1 IP-MAT, 7 ADJ
1 IP-MAT:  1 IP-MAT, 19 ADJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

Text can serve as the second argument of "dominate". But searches where text serves as the first argument of "dominates" are not sensible, since text consists of terminal nodes and terminal nodes by definition do not dominate a subtree. CorpusSearch performs such searches without issuing a warning, but they return no hits.

domsWords (variants: DomsWords, domswords)

"domsWords" matches nodes that dominate the specified number of words. For instance, "domsWords 4" means "dominates 4 words". A word is defined as a terminal that is not on the ignore_words list.

Example query:

node: IP-MAT*

(NP* domsWords 3)

Example output:

/*
16 NP:  16 NP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

domsWords< (variants: DomsWords<, domswords<)

"domsWords<" (read "domsWordsLessThan") is just like
domsWords except that it returns nodes that dominate strictly less than the given number of words.

Example query:

node: IP-MAT*

query: (NP domsWords< 3)

Example output:

The sample sentence doesn't match the query.

domsWords> (variants: DomsWords>, domswords>)

"domsWords>" (read "domsWordsMoreThan") is just like domsWords except that it returns nodes that dominate strictly more than the given number of words.

Example query:

node: IP-MAT*

query: (NP* domsWords> 3)

Example output:

2 NP-SBJ:  2 NP-SBJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

exists (variants: Exists)

"exists" searches for labels or orthographic words anywhere within the boundary node. (If the boundary node were NP* in the example query, the search would yield no hits.)

Example query:

node: IP-MAT*

query: (VBP exists)

Example output:

1 IP-MAT:  11 VBP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

hasLabel (variants: HasLabel, haslabel)

"x hasLabel y" is true if and only if the label of node x is the string y.

Example query:

(NP* hasLabel NP-SBJ)

Example output:

2 NP-SBJ:  2 NP-SBJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

hasSister (variants: HasSister, hassister)

"x hasSister y" is true if and only if x and y have the same mother. The linear order of x and y is irrelevant.

A common error is to assume that "x hasSister y" implies that x precedes y. It does not, as the example output makes clear. Any precedence relations must be stated separately.

Example query:

node: IP-MAT*

query: (N hasSister ADJ)

Example output:

/*
1 IP-MAT:  9 N, 5 ADJ
1 IP-MAT:  9 N, 7 ADJ
1 IP-MAT:  21 N, 19 ADJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

iDominates (variants: idominates, iDoms, idoms)

"iDominates" means "immediately dominates". That is, x iDominates y if x is the mother of y.

Example query:

query:     (IP-MAT* iDominates NP*)

Example output:

/*
1 IP-MAT:  1 IP-MAT, 2 NP-SBJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))
If the asterisk were missing after "NP" in the query, the query would return no hits, since the only bare NP in the sentence is a daughter of PP, not of IP-MAT. Compare the bare NP query just discussed to the example query for
iDomsMod.

iDomsFirst (variants: idomsfirst)

"iDomsFirst" means "immediately dominates as the first child."

Example query:

node: IP-MAT*

query: (NP-SBJ iDomsFirst D)

Example output:

/*
1 IP-MAT:  2 NP-SBJ, 3 D
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

iDomsLast (variants: idomslast)

"iDomsLast" means "immediately dominates as the last child."

Example query:

node: IP-MAT*

query: (IP-MAT* iDomsLast PP)

Example output:

/*
1 IP-MAT:  1 IP-MAT, 13 PP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

iDomsMod (variants: idomsmod)

"iDomsMod" is true if and only if x dominates y, and any nodes intervening on the path from x to y are instances of z, which can be a single label or a list. "iDomsMod" is also satisfied if there are no interveners between x and y; that is, if x immediately dominates y.

Example query:

node: IP-MAT*

query: (IP-MAT* iDomsMod ADVP|PP NP)

Example output:

/*
1 IP-MAT:  1 IP-MAT, 16 NP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

Replacing "NP" in the example query by "NP*" would result in two matches - the object of the preposition (with intervening PP), as above, and the subject NP (without an intervener).

A very common use of this function is in connection with conjunction structures, allowing non-first conjuncts to be treated on a par with first conjuncts.

Example query:

node: IP-MAT*

query: (NP-SBJ iDomsMod NP*|CONJ* PRO)

Example output:

/*
1 IP-MAT:  2 NP-SBJ, 12 PRO
*/
(0  (1 IP-MAT (2 NP-SBJ (3 NP (4 D The) (6 N king))
			(8 CONJP (9 CONJ and)
				 (11 NP (12 PRO I))))
	      (14 VBD danced)
	      (16 NP-MSR (17 Q all) (19 N night))
	      (21 . .))
    (23 ID SAMPLE,2))

iDomsNumber (variants: idomsnumber, iDomsNum, idomsnum)

"iDomsNumber" means "immediately dominates as the n-th child" (in other words, x immediately dominates y and y is the n-th child of x). "iDomsNumber 1" is synonymous with "iDomsFirst".

Example query:

node: IP-MAT*

query: (NP iDomsNumber 2 ADJ)

Example output:

/*
1 IP-MAT:  16 NP, 19 ADJ
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))
/*
If "NP" in the query were replaced by "NP*", the query would also match the subject NP.

iDomsOnly (variants: idomsonly)

"iDomsOnly" means "immediately dominates as an only child".

Example query:

(NP* iDomsOnly N)

Example output:

The sample sentence doesn't match the query, since all of the NP nodes have more than one child.

iDomsTotal (variants: idomstotal)

"iDomsTotal" returns structures containing nodes with the specified number of daughters.

Though traces and "0" are on the default ignore_words list, they are not on the default ignore_nodes list and need to be added, if necessary, with add_to_ignore.

Example query:

(NP* iDomsTotal 3)

Example output:

/*
1 IP-MAT:  16 NP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))
/*

The sample sentence matches the following query because punctuation is ignored by default.

Example query:

(IP-MAT* iDomsTotal 3)

Example output:

/*
1 IP-MAT:  1 IP-MAT
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

iDomsTotal< (variants: idomstotal<)

"iDomsTotal<" (read "iDomsTotalLessThan") is like
iDomsTotal except that it returns structures containing nodes that immediately dominate strictly less than the given number of daughters.

Example query:

(NP* iDomsTotal< 4)

Example output:

/*
1 IP-MAT:  16 NP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

iDomsTotal> (variants: idomstotal>)

"iDomsTotal>" (read "iDomsTotalMoreThan") is like
iDomsTotal except that it returns structures containing nodes that immediately dominate strictly more than the given number of daughters.

Example query:

(NP* iDomsTotal> 4)

Example output:

The sample sentence doesn't match the query.

iDomsViaTrace (variants: idomsviatrace)

Node x immediately dominates node y via trace t (with antecedent z) if and only if:

The syntactic categories and indices of x and z must match (apart from any indices). CorpusSearch considers any string enclosed in asterisks and followed by a shared index as a trace. (In corpora following the annotation guidelines for the Penn Parsed Corpora of Historical English, only "*T*" or "*ICH*", followed by an index, count as legal traces.) The label of the trace is specified in the query, using ordinary regular expression syntax; see the example query for details. The node boundary (IP-MAT* in the example below) must include both the trace and the coindexed constituent.

This function is useful to facilitate searches dealing with extraposed constituents. We begin with a single line of a query that we embed directly in a more realistic search context.

Example query:

node: IP-MAT*

query:     (CP-REL iDomsViaTrace \*ICH* IP-SUB*)

In the example output:

Example output:

/~*
The quick brown fox jumped over the sleeping dog quickly who woke up with a
start.
(SAMPLE,3)
*~/
/*
1 IP-MAT:  23 CP-REL, 32 IP-SUB
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBD jumped)
	      (13 PP (14 P over)
		     (16 NP (17 D the)
			    (19 ADJ sleeping)
			    (21 N dog)
			    (23 CP-REL *ICH*-1)))
	      (25 ADVP (26 ADV quickly))
	      (28 CP-REL-1 (29 WNP-2 (30 WPRO who))
			   (32 IP-SUB (33 NP-SBJ *T*-2)
				      (35 VBD woke)
				      (37 RP up)
				      (39 PP (40 P with)
					     (42 NP (43 D a) (45 N start)))))
	      (47 . .))
    (49 ID SAMPLE,3))

The following query calls "iDomsViaTrace" in a more realistic context. x, y, z, and t remained unchanged from above.

Example query:

node: IP-MAT*

query:     (NP iDoms CP-REL)
       AND (CP-REL iDomsViaTrace \*ICH* IP-SUB*)
       AND (IP-SUB* iDomsMod NP-SBJ* \*T*)

Example output:

/*
1 IP-MAT:  16 NP, 23 CP-REL, 32 IP-SUB, 34 *T*-2
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBD jumped)
	      (13 PP (14 P over)
		     (16 NP (17 D the)
			    (19 ADJ sleeping)
			    (21 N dog)
			    (23 CP-REL *ICH*-1)))
	      (25 ADVP (26 ADV quickly))
	      (28 CP-REL-1 (29 WNP-2 (30 WPRO who))
			   (32 IP-SUB (33 NP-SBJ *T*-2)
				      (35 VBD woke)
				      (37 RP up)
				      (39 PP (40 P with)
					     (42 NP (43 D a) (45 N start)))))
	      (47 . .))
    (49 ID SAMPLE,3))

inID (variants: InID)

"inID" is true of substrings of the ID node. This function is necessary because the ID node, being outside the parsed sentence, cannot serve as an ordinary search-function argument. For instance, "(ID iDoms *)" returns no hits.

Example query:

query:  (SAMP* inID)

Example output:

/*
0 :  25 ID, 26 SAMPLE,1
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

The inID command is especially useful in coding queries, where it can be used to "import" information known to be associated with a particular file (date of composition, dialect, information about author, etc.) into a coding string. For instance:

Example columns from coding query:

// date of author's birth
11: {
      \1490:  (ABOTT-E1* inID)
      \1630:  (ALHATTON2-E3* inID)
      \1680:  (ALHATTON-E3* inID)
      ...
}

// author's sex
13: {
      f: (ABOTT*|ALHATTON* in ID)
      m:  ELSE
}

iPrecedes (variants: iprecedes, iPres, ipres)

"iPrecedes" is true if and only if x does not dominate y, and x comes immediately before y in the string.

The algorithm for "x iPrecedes y" runs as follows:

  1. Find x.
  2. If x has an immediately following sister, then that sister and all its leftmost descendants (that is, the first child of the sister, the first child of the first child, and on as far as the tree goes) are candidates for y.
  3. If x has no immediately following sister, replace x with its mother and apply step (2) recursively.

A common error is to assume that "x iPrecedes y" implies that x and y are sisters. It does not, as the example output makes clear. Any sisterhood relations must be stated separately.

Example query:

query:     (N iPrecedes VBP)

Example output:

/*
1 IP-MAT:  9 N, 11 VBP
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

isRoot (variants: IsRoot, isroot)

"isRoot" searches for the label at the root of the parsed token. As with any CorpusSearch query, the query must contain a boundary node, but for the purposes of this search function, it is ignored.

Example query:

// boundary node must be specified, but is ignored
node: NP

query: (IP* isRoot)

Example output:

/*
0 :  1 IP-MAT
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

precedes (variants: Precedes, pres, Pres)

"x precedes y" means "x does not dominate y, and x comes before y in the string". Precedence does not imply sisterhood, as is evident from the example output.

Example query:

(VBP precedes N)

Example output:

1 IP-MAT:  11 VBP, 21 N
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBP jumps)
	      (13 PP (14 P over)
		     (16 NP (17 D the) (19 ADJ lazy) (21 N dog)))
	      (23 . .))
    (25 ID SAMPLE,1))

sameIndex (variants: SameIndex, sameindex)

"x sameIndex y" finds structures containing constituents that share indices.

Example query:

node: IP-MAT*

query:    (NP* sameIndex CP*)
      AND (NP* iDoms \*exp*) 

Example output:

/*
1 IP-MAT:  2 NP-SBJ-1, 3 *exp*, 9 CP-THT-1
*/
(0  (1 IP-MAT (2 NP-SBJ-1 *exp*)
	      (4 NP-OB2 (5 PRO hym))
	      (7 VBD thought)
	      (9 CP-THT-1 (10 C 0)
			  (12 IP-SUB (13 NP-SBJ-2 (14 EX there))
				     (16 BED was)
				     (18 VBN com)
				     (20 PP (21 P into)
					    (23 NP (24 PRO$ hys) (26 N londe)))
				     (28 NP-2 (29 NS gryffens) (31 CONJ and) (33 NS serpentes))))
	      (35 E_S ,))
    (37 ID CMMALORY,33.1031))

When searching for antecedents of traces, it is the trace itself (not the category dominating it) that is the argument of sameIndex.

Example query:

node: IP-MAT*

query: (CP-REL* sameIndex \*ICH*)

Example output:

/*
1 IP-MAT:  28 CP-REL-1, 24 *ICH*-1
*/
(0  (1 IP-MAT (2 NP-SBJ (3 D The) (5 ADJ quick) (7 ADJ brown) (9 N fox))
	      (11 VBD jumped)
	      (13 PP (14 P over)
		     (16 NP (17 D the)
			    (19 ADJ sleeping)
			    (21 N dog)
			    (23 CP-REL *ICH*-1)))
	      (25 ADVP (26 ADV quickly))
	      (28 CP-REL-1 (29 WNP-2 (30 WPRO who))
			   (32 IP-SUB (33 NP-SBJ *T*-2)
				      (35 VBD woke)
				      (37 RP up)
				      (39 PP (40 P with)
					     (42 NP (43 D a) (45 N start)))))
	      (47 . .))
    (49 ID SAMPLE,3))