Coding is used for creating input to multivariate analysis programs like varbrul or datadesk. If you're not using such a program, you don't need to read this chapter.
The development of the coding portion of CorpusSearch has been funded under a grant from the English Arts and Humanities Research Board to Anthony Warner and Susan Pintzuk at the University of York, England.
define: Ann.def 1: { s: (IP-SPE* iDoms NP-OB*) n: ELSE } 2: { m: (IP-MAT* iDoms NP-OB*) s: (IP-SUB* iDoms NP-OB*) i: (IP-INF* iDoms NP-OB*) e: ELSE } 3: { t: ((IP* iDoms NEG) AND (NEG iDoms !ne)) p: (IP* iDoms !NEG) n: ELSE }
In general, coding files have this form:
column_number: { label: condition label: condition . . . }
So, in the example above, column 1 of the coding node will contain an "s" if IP-SPE* iDoms NP-OB*. Otherwise, the column will contain an "n".
Coding files are used instead of query files. So, to code a file, use this command:
java CorpusSearch <coding_file> <file_to_code>
Output files resulting from coding are labelled ".cod". They contain every sentence or node of the input file, with coding nodes inserted. Here's a sentence from the output file resulting from the above coding file:
/~* knewe kyndes & complexciones of men & of bestus (CMHORSES,85.2) *~/ (0 NODE (0 CODING n:s:p) (1 IP-SUB (2 NP-SBJ *T*-1) (3 VBD knewe) (4 NP-OB1 (5 NS kyndes) (6 CONJ &) (7 NS complexciones) (8 PP (9 PP (10 P of) (11 NP (12 NS men))) (13 CONJP (14 CONJ &) (15 PP (16 P of) (17 NP (18 NS bestus))))))) (19 ID CMHORSES,85.2))
The coding node occupies a position like that of the ID node: it is outside of the parsed sentence but inside the "wrapper", the extra set of parentheses surrounding the sentence or node.
Coding nodes may be searched using column. For instance, to find all sentences whose coding node contains "m" or "p" in the 7th column, use this query:
query: (CODING column7 m|p)
Susan Pintzuk has written the following perl script to extract the coding information from CorpusSearch output:
#!/usr/local/bin/perl #Usage: make_cs inputfile > outputfile #this script takes a coded CorpusSearch file and outputs #only the coding strings in the following format: # (f:f:f:f: # (f:f:f:f: #the outputfile should then be imported to a word processor #and the colons removed for varbrul or replaced by #tabs for datadesk while (<>) { if (/\(\d CODING/) { /CODING\s([^\)]+)\)/; $string = $1; print "($string\n"; } }
Search Tips Table of Contents |