|
LING 521
Corpus Phonetics
Spring 2008
|
1. The purpose of this lab is to explore the factors that affect speech rate in conversational speech using the Fisher
transcripts corpus (LDC2004T19).
2. There is a copy of the Fisher corpus on Harris: /corpora/Fisher_English_Training_Transcripts_Part1/. Read the
documentations in the directory docs, starting from the file fe_03_readme.txt. Which file contains
speaker information? The transcripts you will be using are in ./package/fe_03_p1_tran/data/trans/.
(note: if you are new to unix/linux, here is a nice tutorial for beginners:
http://www.ee.surrey.ac.uk/Teaching/Unix/).
3. Write a python program(s) to do the following:
- Calculate the average speech rate, i.e., the number of words per minute, of each speaker in the corpus. Some tokens
appeared in the transcripts are not words, for example, ((, g-, [cough], etc. It’s up to you how to treat
those tokens.
- Calculate the mean speech rate and its standard deviation against the following factors: sex (male and female),
age (it’s up to you how to group ages), and where the speaker was raised (it’s up to you how to group the
places).
4. Write a lab report that includes the following:
- Your results and findings on which factors affect (or do not affect) speech rate, and a brief discussion on your findings.
- The technical issues you met and how you solved the problems, for example, the issues of non-word tokens, speaker information errors, etc.
- A print-out of your code and a sample copy of the output of the code.