finite_verb: BE[DP]|DO[DP]|HV[DP]|VB[DP]|MD
ERROR! In Meat.CrankThrough: Exception: String index out of range: 0 String index out of range: 0 java.lang.StringIndexOutOfBoundsException: String index out of range: 0 java.lang.StringIndexOutOfBoundsException: String index out of range: 0
regular expression | matches input string | does not match input string |
\*T\* | *T* | *T*-1, *T239, T, The, ATE, VAT |
\*T\** | *T*, *T*-1 | *T239, T, The, ATE, VAT |
\*T* | *T*, *T*-1, *T239 | T, The, ATE, VAT |
*T* | *T*, *T*-1, *T239, T, The, ATE, VAT |
CS query.q file.psd >& ERR
A command like the one just above will either go to completion or result in an error message. If the latter, you know that the mismatched paren is somewhere between where you started the command and the expected end.C-u 400 C-x C-f
Missing wrapper parens
Wrapper parens are the unlabeled parens that delimit each token in Penn
Treebank format, as shown in (1a). If the wrapper parens are missing, the
token looks like (1b), and you will have to add the relevant parens in
order to meet the CorpusSearch's
compatibility requirements.
(1) a. ( (IP-MAT ... )) b. (IP-MAT ... )
(2) a. ( (IP-MAT (NP-SBJ (PRO$ My) (N neighbor)) (VBD told) me ← bare word (missing preterminal) (. .))) b. ( (IP-MAT (NP-SBJ (PRO$ My) (N neighbor)) (VBD told) (NP-OB2 (PRO m e)) ← terminal contains space (. .))) c. ( (IP-MAT (NP-SBJ (PRO$ My ← preterminal isn't unary-branching (N neighbor))) (VBD told) (NP-OB2 (PRO me))) (. .))
When you want to refer to all the variants of a label except for one or two, you have to explicitly list all the desired variants. For instance, if you are interested in all instances of "ADVP*" except for "ADVP-DIR", you must use a disjunction like the following:
ADVP|ADVP-LOC*|ADVP-TMP*
In such cases, best practice is to include the disjunction in a
definition file.
Missing "define"
A common error is to intend to use a definition file, but to omit the
requisite "define" command in
the preamble.
CorpusSearch issues no warning message, but the search will not yield the
intended output because CorpusSearch interprets the strings intended as
definitions as literal strings, which generally do not match anything in
the input.
For more details and suggestions for troubleshooting,
see Definition file.
This query can never return any hits, because the two instances of PRO
are interpreted by default as referring to the same node. But that
configuration is not a possible tree structure, as one and the same node
cannot be simultaneously dominated by the subject and the object. The
two instances of PRO must be distinguished with prefix indices:
Missing prefix indices
A very common error is to forget to add prefix indices to arguments of a
search function (in other words, to unintentionally impose same-instance).
This is the chief cause of a baffling absence of hits. Here is an example
of a query intended to find clauses in which both the subject and the
object are prounous.
query: (NP-SBJ* iDoms PRO)
AND (NP-OB1* iDoms PRO)
query: (NP-SBJ* iDoms [1]PRO)
AND (NP-OB1* iDoms [2]PRO)
Missing same-instance
Errors due by missing prefix indices
are cases of unintentionally overusing same-instance. Same-instance can
also be underused, as in the following query, intended to retrieve
instances of V2 with clause-initial adverb phrases.
query: (IP* iDomsFirst ADVP*) AND (finite_verb iPrecedes NP-SBJ*)
The intended instances will be retrieved, but so will unintended tokens like the following, where the finite verb and the subject are not clausemates of the adverb phrase, as clearly indicated by the node indices in the result block.
The reason for this error is that the query fails to impose the clausemate condition. The solution is to "tie" the constituents in the second clause of the query to one or more constituents in the first clause of the query by exploiting same-instance. Here is one way of doing that:/* 1 IP-MAT: 1 IP-MAT, 2 ADVP-TMP, 16 BEP, 18 NP-SBJ */ (0 (1 IP-MAT (2 ADVP-TMP (3 ADV Yesterday)) (5 NP-SBJ (6 PRO they)) (8 VBD asked) (10 , ,) (12 " ") (14 CP-QUE-MAT-SPE (15 IP-SUB-SPE (16 BEP Are) (18 NP-SBJ (19 PRO you)) (21 VAG coming) (23 PP (24 P with) (26 NP (27 PRO us))))) (29 . ?) (31 " ")))
query: (IP* iDomsNumber 1 ADVP*) AND (IP* iDomsNumber 2 finite_verb) AND (IP* iDomsNumber 3 NP-SBJ*)
Here is another way:
query: (IP* iDomsFirst ADVP*) AND (ADVP* hasSister finite_verb) AND (finite_verb hasSister NP-SBJ*) AND (ADVP* iPrecedes finite_verb) AND (finite_verb iPrecedes NP-SBJ*)
The following query is not ill-formed, but it is inefficient.
query: (NP-SBJ* exists) AND (ADJ exists) AND (NP-SBJ* iDoms ADJ)
The final clause in the query implies the two preceding ones, and the same effect can therefore be obtained more simply with:
query: (NP-SBJ* iDoms ADJ)
The above scenario is a common cause of searches that take an unexpectedly long time. (This can be confirmed with Unix/Linux's "jobs" command.)