CorpusSearch allows output files from previous searches to serve as source files for subsequent searches.
Multiple source files can be searched in a single run.
Command file
The command file minimally contains
a query, which specifies the
structures being searched for, and a
boundary node, which
specifies the syntactic domain within which the search is to take place.
Beyond that, the command file can also include further specifications,
mainly concerning the format of the output.
Output of CorpusSearch
Ordinary search output
An ordinary output
file contains the structures from the source file(s) that match the
specifications in the search query. It is possible to include information
pinpointing where the structures were found, which is useful in the case
of very long sentences. Statistics are kept detailing the number of
structures matching the search query ("hits"), the number of sentence
tokens ("tokens") containing hits, and the total number of tokens in the
file.
The number of hits may change depending on the definition of the boundary node. |
For instance, the value in the first column of a coding string might
encode the syntactic category of a sentence's first constituent. The
second and third columns might encode the same information for the
sentence's second and third constituents, respectively. The information
from all three columns could then be used to calculate the frequency of
basic word order patterns in the corpus ("SVO", "SOV", etc.). (In
principle, the same statistics could also be obtained from the output
of multiple ordinary search queries, but that process would be much more
laborious and prone to human error.)
Corpus revision output
CorpusSearch can produce a copy of a corpus in which certain structures
are automatically revised according to user specifications. This feature
can be used in order to:
Frames output
CorpusSearch can generate the set of local
frames for given words. These frames are defined as the syntactic
sisters of the POS tag of the word in question. This might be helpful in
constructing word classes - for instance, in comparing the distribution of
double-object verbs (The children gave their parents a present) and
double-complement verbs (The children gave a present to their
parents).
Lexicon output
CorpusSearch can generate a lexicon
for a corpus. The output is a list of every word in the corpus along with
the number of times it occurs under each POS tag that it can have.