# Solutions to assignment lecture 3.
## (1)
# Some rehearsal:
# we measure 7 people's heights: heights = c(156, 167, 184, NA, 156, 164, 177)
# One measurement (4th pp) was clearly lost. Now, you could select only the available
# values (the non-NA's) by saying:
# heights[!is.na(heights)]
# You get the same result when you do
# heights[which(!is.na(heights))]
# a) Explain why the second method is a detour for the same result.
# b) Which of the two methods uses logical indexing and which uses numeric indexing?
# =========================================================================================
# a) The second is a detour: !is.na(heights) gives you a logical vector that you can
# immediately use for logical indexing. By saying which, you get the indices where this
# logical vector is true, so in the end you are doing numerical indexing via a detour.
# b) So, the first method uses logical indexing, the second numerical indexing.
# =========================================================================================
## (2)
# Set the working directory to the directory where you saved data_assignment3.Rdata.
# Show how you load that Rdata file. If it works, you'll see you have the object 'dat' in
# your workspace. (type ls() )
# Have a look at dat by typing head(dat). These are running time of runners who did or did
# not take doping (variable cond), both before taking the doping (running.time.pre) and
# after taking the doping (running.time.pre). It also shows whether they got caught using
# doping (1) or not (0).
# Do a t-test to figure out whether those running taking doping were quicker after taking
# the doping.
# t.test(dat$running.time.post ~ dat$cond)
# What do you conclude? (just basic statistics question)
# =========================================================================================
# for example:
setwd('/Users/gdutilh/Dropbox/teaching/R/R2015/lecture_3/materials_assignment_3')
# that line of course depends on where you save your stuff.
# Mind well: folder separator "\\" work on pc and mac. '/' only on mac.
getwd()
dir()
load('data_assignment3.Rdata')
dat = as.data.frame(dat)
t.test(dat$running.time.post ~ dat$cond)
# The two sided test suggests that the runners are quicker with doping:
# t = -2.1755, df = 35.349, p-value = 0.03637
# =========================================================================================
## (3)
# a) Show how you select only those runners who took doping. (that is, those rows where
# cond == 'doping') Remember:
# indexing a matrix goes like dat[ , ]
# b) Create dat.nona which contains dat without the rows that contain NA's. To do so,
# use na.omit(dat). look up ?na.omit to see what this does. (not really a question,
# but nontheless show how you do it)
# c) Let R select the runner who ran a world record after taking the doping (time < 9.58).
# Did he get away with it?
# =========================================================================================
# a)
select.dat = dat[dat$cond == 'doping', ]
# b)
dat.nona = na.omit(dat)
# c)
dat[dat$running.time.post < 9.58, ]
# =========================================================================================
## (4)
# Save this dat.nona in a .Rdata file with an appropriate name, e.g., "cleandat.Rdata" by
# using save().
# Check wheter it worked by writing:
# rm(list = ls()) # removes all variables from workspace
# ls() will then give an empty character vector (since no names of objects available)
# load('thefilenameyouchoose.Rdata')
# and see that you have dat.nona again available.
# =========================================================================================
save(dat.nona, file = 'dat_without_NAs.Rdata')
# =========================================================================================
## (5)
# Read in the example data (practice_RT_data.txt) using read.table where header is
# TRUE and tab is the separator (sep = '\t'). Of course, you first have to set the
# working directory to the right folder.
# The result is a ready to use data.frame. Assign it to a variable called mydata. This
# data set is one of my own lexical decision data sets (speeded word - nonword decisions).
# The data contains, among other things:
# -- Response Times (RT),
# -- correctness of response (correct)
# -- the word frequency (freq)
# -- wheter the stimulus was a word or nonword (wnw, 1:word, 2: nonword)
# -- response button: word or nonword (resp)
# Check whether all worked well by asking head(mydata)
# =========================================================================================
setwd("wherever/you/saved/your/data")
mydata = read.table('practice_RT_data.txt', header = T, sep = '\t')
head(mydata)
# =========================================================================================
## (6)
# Create a logical vector named tooslow that contains TRUE's for all RTs over 1000 ms
# and FALSE's otherwise. Add this tooslow variable to mydata using $
# (answer: two short lines of code)
# =========================================================================================
tooslow = mydata$RT > 1000
mydata$tooslow = tooslow
# =========================================================================================
## (7)
# Show how you calculate the mean of mydata$RT, only for those RTs that are not too slow.
# Herefore, you should index mydata$RT with the tooslow-variable that you just created.
# =========================================================================================
mean(mydata$RT[!tooslow])
# or
mean(mydata$RT[!mydata$tooslow])
# =========================================================================================