In the course of doing research on, say, the verbal syntax of English, you might find yourself making reference to finite verbs over and over again. You might start out with a disjunction like this:
BE[DP]|DO[DP]|HV[DP]|MD|VB[DP]
Then you realize that given the complexity of the annotation system you're using, you need to include versions of the items above with prefixed particles or other material.
*+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP]
So now your list of search terms includes both the simple and the prefixed variants.
BE[DP]|DO[DP]|HV[DP]|MD|VB[DP]|*+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP]
The items on the list have to be separated by a single pipe symbol ("|") to be interpreted correctly. Two adjacent pipe symbols will cause CorpusSearch to crash, but the inadvertent absence of a pipe symbol is more problematic in that CorpusSearch will run, but just not give you the correct results. For instance, the following disjunction will fail to find any instances of simple finite BE or finite DO. Can you see why? (Click here for answer.)
If you have several queries that are intended to refer to the same search term, it can become a job in itself to ensure that those terms are defined consistently across all your queries. Wouldn't it be nice if there were a single place to define all your search terms, so that when you make revisions, they affect all queries making reference to those terms in a uniform way?BE[DP]DO[DP]|HV[DP]|MD|VB[DP]|*+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP]
There is such a place. It's called a definition file.
A definition file is simply a file containing the various labels that you
want to group together, together with a definition (an abbreviation or
alias) that allows you to make reference to them. Here's an example:
An entire query might then read as follows:
The "define" command instructs CorpusSearch where to find the definitions
for "finite_verb" and "nonfinite_verb". Without this command,
"finite_verb" and "nonfinite_verb" would be read as literal strings and
not be replaced by their definitions. Most likely, this would result in
CorpusSearch reporting no hits for the search (since most parsed corpora
will not contain instances of the literal strings "finite_verb" or
"nonfinite_verb").
It is also possible to call a definition file from your preference file.
As usual, specifications in a query file override corresponding
specifications in a preference file.
A directory must not contain more than one preference file, but it can
contain multiple definition files. This is very useful in connection with
running the same queries on corpora with different annotation labels;
see Some reasons to use definition
files.
If a particular search reports no hits or suspiciously low numbers of
hits for a particular search term, this is likely due to:
If you can never remember whether you refer to finite verbs in your
queries as "finite_verb" or "Vfin", you can use recursive definitions to
render the question irrelevant by including lines like the following
(best right after the non-recursive, basic definition, as shown here):
As we mentioned, they offer a powerful assist in enforcing consistency
across queries of all sorts (whether ordinary, revision, or coding
queries). Any revisions that you make to your search terms are made
only once - in the definition file.
Definition files can greatly facilitate comparative searches across
corpora from various languages or linguistic stages that use different
annotation labels for the same (or very similar) linguistic concepts. For
instance, in conducting research on Old English and later stages of
English using the York corpora of Old English and the Penn Parsed
Historical Corpora of English, one can set up distinct definition files
because of the divergent annotation guidelines, but use query files that
are identical in every respect but the "define" line.
Definition files can be used as a "poor person's lemmatizer" and "poor
person's verb classifier" along the following lines. (The entries are
very simplified; there are many more spelling variants to be considered in
historical texts.)
Answer to Can you see why?: There's a pipe symbol
missing after BE[DP]. Unlike in the correct query with the pipe symbol,
the query with the error instructs CorpusSearch to search for the
expression "BE[DP]DO[DP]", which expands to BEDDOD, BEPDOD, BEPDOD, and
BEPDOP. None of these labels exist in the corpus.
Content of definition files
The definition on the left must be an orthographic word (that is, not
contain spaces). It is followed by a colon, and then the list of labels
that it stands for. Each definition must be associated with a list unique
to it. So the following definition file is not good, because the
definition "finite_verb" is ambiguous:
// definitions for Middle English
finite_verb: BE[DP]|DO[DP]|HV[DP]|MD|VB[DP]|*+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP]
nonfinite_verb: BE|DO|HV|MD0|VB|*+BE|*+DO|*+HV|*+MD0|*+VB
However, a list on the right can be associated with more than one
definition, as in the following example:
finite_verb: BE[DP]
finite_verb: VB[DP]
This can be useful, as discussed further in Recursive
definitions.
obj: NP-OB1*|NP-OB2*
object: NP-OB1*|NP-OB2*
Calling definition files
Definition files must have the extension .def, and they must be
stored in the same directory as the command file. They are called by
including a line like the following in a query file.
define: ppche.def
node: IP*
define: ppche.def
query: (finite_verb hasSister nonfinite_verb)
AND (finite_verb precedes nonfinite_verb)
Reference to definition files in output files
The output from a search that calls a definition file will include a
preface along the following lines:
The entry under "definition file" gives the name of the definition file
that CorpusSearch used when running the query in search.q. Since you
might have made changes to the file between running the query and
reviewing the output of the search, CorpusSearch also reports the actual
query that it ran (that is, the query that resulted when it expanded the
definition at runtime). This can be helpful in troubleshooting.
/*
PREFACE:
CorpusSearch copyright Beth Randall 2000.
Date: Thu Apr 13 08:57:07 EDT 2000
command file: search.q
output file: search.out
definition file: ppche.def
node: IP*
query: (BE[DP]|DO[DP]|HV[DP]|MD|VB[DP]|*+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP] precedes BE|DO|HV|MD0|VB|*+BE|*+DO|*+HV|*+MD0|*+VB)
*/
Recursive definitions
Definitions may be recursive, allowing complex definitions to be built
up out of more basic ones. For instance:
finite_verb_simple: BE[DP]|DO[DP]|HV[DP]|MD|VB[DP]
finite_verb_complex: *+BE[DP]|*+DO[DP]|*+HV[DP]|*+MD|*+VB[DP]
finite_verb: $finite_verb_simple|$finite_verb_complex
finite_verb: $finite_verb_simple|$finite_verb_complex
Vfin: $finite_verb
Some reasons to use definition files
As mentioned at the outset, definition files are an extremely useful tool,
powerful and flexible at once, and we urge CorpusSearch users to use them
whenever the search terms in their queries becomes even the slightest bit
complex.
// old-english.def
subj: NP-NOM*
dir-obj: NP-ACC*
indir-obj: NP-DTV*
finite_be: BE[DP]|BE[DP][IS]|BEPH
// later-english.def
subj: NP-SBJ*
dir-obj: NP-OB1*
indir-obj: NP-OB2*
finite_be: BE[DP]
// sample-search-old-english.q
node: IP*
define: oe.def
query: (subj hasSister finite_be)
AND (subj iPrecedes finite_be)
// sample-search-later-english.q
same as previous except for the "define" line, which would read "define: later-english.def".
In conjunction with corpus revision
queries, these entries could be used to associate lemmas with verb
forms. The following revision query gives outputs in the style of
IcePAHC.
give: [gG][eiy][uv]e
gave: [gG]ave
given: [gG][eiy][uv]en
send: [sS]end|[sS]ende
sent: [sS]ent|[sS]ente
GIVE: $give|$gave|$given
SEND: $send|$sent
double-object-verb: $GIVE|$SEND
query: (V* iDoms {1}GIVE)
append_label{1}: =give