|
LING 521
Corpus Phonetics
Spring 2008
|
1. Get the file fe_03_p1_calldata.tbl, which is taken from the Fisher English Training
Speech Transcripts corpus (LDC2004T19).
The file contains information of 5850 conversations, the format of which is described in doc_calldata.tbl.
2. Write python module(s) to do the following:
- Calculate the numbers of female speakers of American English dialects, female speakers of other English dialects, male speakers of
American English dialects, and male speakers of other English dialects. Your output should have the following format or fancier (please
refer to this python documentation for output formatting):
|
| American English dialects
| Other English dialects
|
| Female
| ?
| ?
|
| Male
| ?
| ?
|
- Randomly select 100 female speakers and 100 male speakers from the original file, and write the subject ID, sex, and dialect of each speaker
to a new file called ‘selected.tbl’ (hint: you need to use the sample function in the random module, see this documentation).