Problem Set #3 -- COGS 501

(due 10/18/2010)

1. Download the file pb.Table1 to a place accessible to Matlab. (This is the data from Peterson & Barney, "Control Methods Used in a Study of the Vowels", JASA 23 (1), 1951.)

The table columns are:

1: m = man, w=woman, c=child
2: m=male, f=female
3: N=speaker number
4: V=vowel name (heed/iy, hid/ih, head/eh, had/ae, hod/aa, hawed/ao, hood/uh, who'd/uw, hud/ah, heard/er)
5: IPA
6: F0 in Hz
7: F1
8: F2
9: F3

You can read it into Matlab via a sequence like

fid = fopen('pb.Table1');
PB = textscan(fid, '%s %s %d %s %s %f %f %f %f', 10000);
fclose(fid);

The result of this (in PB) is a "cell array". Here's some code for turning pieces of this into a more useful form:

% Convert column 1 (man, woman, child) from cell array to string array:
MWC=char(PB{1});
% Make some useful binary vectors:
isman=(MWC=='m');
iswoman=(MWC=='w');
ischild=(MWC=='c');

Note, by the way, that we can find out how many rows of what we have by e.g.

sum(iswoman)

And continuing with the conversions:

% Convert column 2 (male, female) to string array:
MF=char(PB{2});
% More useful binary vectors:
ismale=(MF=='m');
isfemale=(MF=='f');
% Convert column 4 (vowel ID) to string signatures
% Note that options are 'iy','ih','eh', 'ae','aa','ao','uh','uw','ah','er'
isiy = strcmp('iy',PB{4});
isih = strcmp('ih',PB{4});
iseh = strcmp('eh',PB{4});
isae = strcmp('ae',PB{4});
isaa = strcmp('aa',PB{4});
isao = strcmp('ao',PB{4});
isuh = strcmp('uh',PB{4});
isuw = strcmp('uw',PB{4});
isah = strcmp('ah',PB{4});
iser = strcmp('er',PB{4});

Now we can (for example) extract all the F0, F1, F2, F3 data:

F0123 = [PB{6} PB{7} PB{8} PB{9}];

Or all the female /ae/ data:

F0123wae = F0123(iswoman & isae,:);

1. "Cheating" classification (within training set)

For each of the 10 vowels, calculate the mean vector and covariance matrix for all 76 speakers taken together. (Thus you'll get 10 mean vectors and 10 covariance matrices.) Also get the mean vector and the covariance matrices for the categories of men, women and children (without distinguishing vowels -- thus you'll have three mean vectors and three covariance matrices).

Using the mahalanobis distance as the measure, how well can you classify each of the 1520 vowels as to whether it was spoken by a man, woman or child? How well can you classify them as to vowel category?

(Note that "classify" here means "find the category with minimum distance".)

Extra credit: Can you do better if you first classify man/woman/child, and then use vowel-specific covariance matrices based on the man/woman/child classification?

2. "Honest" classification (separate training and test sets)

From among the 33 men, 28 women, and 15 children, pick 2 men, 2 women, and 1 child to use for testing. Make an array of the F0, F1, F2, F3 data for the remaining 71 speakers.

Now do the same thing that you did in (1), except you should calculate the mean vectors and covariance matrices based on the 71 training speakers, and then classify the 100 individual vowels of the 5 test speakers.

Extra credit: Try the two-stage classification in this case.

Extra extra credit (to be done if all the above is obvious and easy to you -- note that we haven't taught you any of the following yet, though we're happy to talk with you about it if you're curious):

A. Recast the classification task in (2) in probabilistic terms.

C. Try some other classification techniques on this same data (which we haven't taught you yet, but maybe you know about them, or can look them up on the web): gaussian mixture models; support vector machines; whatever.

D. See what happens if you use F0, F1, F2, F3 measurements from some other source. For example, you could download a podcast and use Praat or WaveSurfer to make formant measurements.