Fun with Facebook Data in R

(Due 9/25/2014)

As background, take a quick look at two blog posts, "Sex and pronouns" (8/24/2014) and "Sex, age, and pronouns on Facebook" (9/19/2014).

Now in R, try out FacebookPronouns.R. Note that if you want, you can let R handle downloading the script (after making sure that you and R are in agreement about what folder you should be working in), using the command:

> download.file("http://ling.upenn.edu/courses/ling005/FacebookPronouns.R", "FacebookPronouns.R")

Now you can execute that script via

> source("FacebookPronouns.R')

Read through the script and be sure that you understand how it does what it does.

To carry out this assignment, write a new script called "YourNameFB.R". (Assume that the script FacebookPronouns.R has already been sourced, so that you don't need to reload the dataset or recreate the boolean variables, etc.)

Questions for you are in red text on this page. For questions that just require an answer in English text, put the question and the answer in this script as comments, e.g.

# (1) Q: What do the life-cyle changes in first-person singular vs. first-person plural references mean?
# (1) A: As people get older, their set of acquaintances grows, and so they're more likely 
# to refer to their participation in group activities.

(N.B. You should be able to think of better answers than this...)

FacebookPronouns.R starts by showing several different ways of graphing the relative frequencies of First Person Singular and First Person Plural pronouns, by the age of the writer. The first way just plots the frequencies separately:

  

 

The second way of doing it plots both frequencies on the same scale:

And the third way of doing it plots the ratio between them:

(1) What do the life-cyle changes in first-person singular vs. first-person plural references mean?
If you can think of more than one hypothesis, what additional data might help to choose among them?

(2) Which of the plots is most effective in depicting the pattern in the data? Why?

Now the FacebookPronouns.R script makes a plot of same-sex vs. cross-sex reference, as a function of age. By "same sex reference" we mean male Facebookers using masculine third-singular pronouns and female Facebookers using feminine third-singular pronouns; by "cross sex reference" we mean male posters using feminine pronouns, and female posters using masculine pronouns.

(3) What are some hypotheses about what this pattern means? How might you bring other evidence from this same Facebook pronoun-frequency dataset to bear on the question?

(4) Now calculate the rates of same-sex and cross-sex reference, by age, separately for male and female writers. Devise an effective way to present your results graphically.

(5) What do you think this pattern -- the difference between male and female Facebookers in frequency of same-sex vs. cross-sex references -- means? Does it change your opinion about question (3)?