LING5250/LING2250 - Homework 5

LING5250/LING2250
Computational Analysis and Modeling of Biological Signals and Systems
Homework 5

Due: 3/29/2023

Install Matlab2022b

The "deep learning" toolbox is convenient for this exercise, so install the latest version of Matlab if you haven't. Information about how to do so is here. You'll need some of the toolboxes, so you might as well install all of them.

(If your machine doesn't seem to be up to this, there are lots of labs fround campus that have Matlab installed -- ask and we'll find a convenient option for you.)

Background

Download and run homework5.m. (You'll also need nhist.m downloaded into the same directory.) The homework5.m Matlab script creates (data for) a simple XOR classification problem:

The script then shows that a linear discriminant (and therefore a traditional "perceptron") doesn't work -- as Minsky and Papert explained -- but a simple multilayer "feedforward" perceptron, the earliest and simplest kind of "deep net", does the job.

[At least sometimes... You should run it several times, since the iterative "learning" process starts by setting the system's weights to random values, and this changes the outcome, which is sometimes nearly perfect, and sometimes pretty bad.]

Some Small Steps Onward

That script calculated the distribution of classification errors on the training data -- so you should generate some new random data and try it:

NewX = 2*rand(NN,2)-1;
NewClass = xor(NewX(:,1)>0,NewX(:,2)>0);
NewX1=NewX(NewClass,:); NewX2=NewX(~NewClass,:);
%
NewClass1 = net(NewX');
NewResults = [abs(NewClass1)' NewClass];
histogram(abs(NewResults(:,1)-NewResults(:,2)));
xlim([0 1]);
xlabel('Hypothesis - Truth')
title('Feed Forward Net: Performance on XOR Test Data')
sum(abs(NewResults(:,1)-NewResults(:,2))>0.5)

Again, the results will vary from trial to trial, so repeat the whole training and testing process a few times. But usually, if the net worked well on the training data, it will probably work on the new test data.

Now try a quadratic classifier, using Matlab's fitcdiscr() function. Does that work? Graph the resulting classification.

Optional: If you're ambitious (or at least curious, try decision tree classifier, and/or a nearest-neighbor classifier, and/or a random forest classifier...

Some (Slightly) More Serious Generalizations of the problem

The good thing about "deep nets" is that some version often performs well on difficult problems, given enough training data. The trick is to choose an architecture and a set of "hyperparameters" that work.

Depending on your interest, time, and skills, explore this space by creating (and trying to solve) some problems that are a bit more challenging.

For example:

A "one-class" classification, where the point is to distinguish a compact cloud of "red" points from everything else. Try this in spaces of various dimensionality, e.g. 1, 2, 10, ...
A 2-D checkerboard pattern of alternating "red" and "blue" regions.
Try some other tesselations of the plane -- or analogous things in different numbers of dimensions, starting with segmentations of a line...
What about N-ary parity rather than XOR, still with random numerical data rather than binary data. Now the classification depends on whether an odd number of values is greater than zero. What if the order of each example varies (i.e. the input vector might have 3 elements or 20 or 50?)
What about various simple generalizations of such training sets? For example, alternating unit-length "red" and "blue" regions in one dimension, with training data (say) from -10 to 10 -- where you then check generalization beyond that range?
A Swiss Roll", spiral, or etc..

In each case, you can also explore how much training data is needed, and the effects of various kinds of scaling -- multiplicative, additive, polynomial, logarithmic, ...?

If you're feeling ambitious, try dealing with data in which the classification criteria vary slowly over "time", say in a sinusoidal pattern with long period (like 1000 or 5000 samples). Possible solutions include a network architecture that's sensitive to history (like an LSTM). An older idea is the "linear learning" theories of probability matching experiments, foraging models, etc., as described here -- which raises the question of whether the system's goal should be "maximizing" (being right as often as possible) or "matching" (where the distribution of choices matches the distribution of results).

etc.