CorpusSearch is a search program that searches for linguistic structures in a corpus of parsed, labelled sentences.
CorpusSearch needs two pieces of information:
A source file is any file that contains parsed, labelled sentences. This could be a file from the Middle English (or other) corpus, an output file from a previous search, or perhaps a file of sentences that the user has cut and pasted together. Any number of source files can be searched with one run of CorpusSearch.
The command file contains a query, which describes the structures being searched for, and possibly additional material. This additional material may specify the node boundaries in which to search, and may choose various options for printing the output.
CorpusSearch always prints a standard output file, and optionally, will print a complement file.
The output file contains the sentences that were found to contain the searched-for structure, along with comments describing where the structures were found. Statistics are kept detailing the number of distinct boundary nodes containing the structure, or "hits", the number of sentences containing the hits, and the total number of sentences in the file. Notice that the number of hits may change depending on the definition of the boundary node.
A complement file is produced if the command file contains this line:
print_complement: true
The complement file, if there is one, contains all the sentences in the source file that do not contain the searched-for structure. The output file and complement file are complementary sets that together contain all the sentences in the source file.
A First Search on babel Table of Contents |