## assignment for lecture 8. To hand in before Friday November 18th, 18.00
## (1)
# I made the following code to create another fake data set.
# The data set my free interpretation of another Diederik Stapel experiment:
# there are three situations: a clean, a messy and a very messy environment (i.e. school
# yard or so).
# Hypotheses:
# 1) The first hypothesis is that when an environment is a mess, people will litter more
# (make more mess).
# 2) The second hypothesis is that a mess makes people more racist, no matter how much mess.
# Below, I create data according to these hypotheses:
# =========================================================================================
# make a participant number
subj = 1:150 # 150 participant numbers
# Randomly make them male or female
sex = sample(c('male','female'), 150, replace = T)
# Give them a random age
age = round(rnorm(n = 150, mean = 40, sd = 10))
# Give them a random IQ
IQ = round(rnorm(150, 100, 15))
# create a condition indicator
condition = rep(c('clean', 'messy', 'very_messy'), each = 50)
# Note that these variables are not yet "infected" by our hypothesis yet. All just random.
# Now, we create a dependent variable "litter" that tells whether each participant
# did or did not litter. So, this variable should be infected by our hypotheses:
# H1: The probability to litter depends on condition.
litter = numeric(150)
litter[condition == 'clean'] = sample(c(0, 1), 50, replace = T, prob = c(.8, .2))
# So, in the clean condition, no litter (0) has prob .8 and litter (1) has prob .2)
# something similar for the other conditions, but then with increased litter probabilities:
litter[condition == 'messy'] = sample(c(0, 1), 50, replace = T, prob = c(.7, .3))
litter[condition == 'very_messy'] = sample(c(0, 1), 50, replace = T, prob = c(.5, .5))
# H2: Participants get more racist when an environment is not clean:
racism = ifelse(condition == 'clean', 0, 10) + # an effect of 0 in the clean condition,
# otherwise 10.
runif(150, 50, 70) # for each pp some noise, sampled from a uniform distribution
# between 10 and 20.
#
# Now that the effects are created, let's put variables together in a data.frame
d = data.frame(condition, subj, age, sex, IQ, racism, litter)
# have a look at d:
head(d)
tail(d)
dim(d)
table(d$condition, d$sex)
# look at the means:
tapply(d$racism, d$condition, mean)
## Question:
# It is actually not so plausible to assume that racism is independent of IQ.
# Adjust the lines of code constructing the racism variable such that for each IQ point,
# .2 point is subtracted from the racism variable. So, someone with an IQ of 100 gets a
# racism value that is attenuated by 20 points.
# =========================================================================================
# =========================================================================================
## (2)
# We didn't include any interaction effects. However, we can of course test for them,
# since we act as if we are dealing with real data.
# In the "formula" of a lm() command in R, an interaction between two predictors
# plus all the main effects is defined using "*" as follows:
# lm(dependent ~ predictor1 * predictor2)
# Thus, your model looks like: d$racism ~ d$condition * d$IQ.
# a) Perform this regression and show the ANOVA table using anova() on the output. Did it work
# as you expected?
# b) Now create an ANOVA with just the additive effect that we included:
# lmout.add = lm(d$racism ~ d$condition + d$IQ)
# c) now type: BIC(lmout, lmout.add) This gives you a model comparison of the two linear
# models. The one with the lowes BIC is the best. Did this gest lead to the right
# conclusion? So, that was your first model comparison in R.
# =========================================================================================
# a)
# b)
# c)
# =========================================================================================
###### That was the new simulation stuff.
###### Now, some rehersals!
## (3)
# Create the following variables
score.test1 = sample(1:10, 10, replace = TRUE)
score.test2 = sample(1:10, 10, replace = TRUE)
score.test3 = sample(1:10, 10, replace = TRUE)
score.test4 = sample(1:10, 10, replace = TRUE)
# Say, these are the scores of ten participants on 4 different tests
# Bind these variables together into a matrix called 'test.scores'
# =========================================================================================
# =========================================================================================
## (4)
# Say, you like the variables in a different order:
# Make test.scores.2, that contains all variables of test.scores, but then first
# score.test2 and 3, and then score.test1 and 4. *Do this by indexing the answer from
# question 3!*
# =========================================================================================
# =========================================================================================
## (5)
# Make a little for loop that checks for each participant in test.scores.2, whether he or
# she scored a seven on any of the tests and print 'yes' or 'no' accordingly. You wanna use
# the any() function here, which tests whether any of its arguments is true.
# =========================================================================================
# =========================================================================================
## (6)
# Do the same easier with one of the apply functions. The easiest way is to tranform the
# whoe test.scores.2 matrix into a matrix of trues and falses by checking whether it
# equals 7. On the resulting logical matrix, apply the any() function. (this gives true's
# and falses instead of yes's and no's but that's ok.)
# =========================================================================================
# =========================================================================================