Skip to content
This repository was archived by the owner on Jun 16, 2023. It is now read-only.

Clean Data

Savitha edited this page Apr 10, 2017 · 9 revisions
Outliers = unlist(Outliers)
print(length(Outliers))
12
print(summary(Outliers))
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  66.63   70.11   79.20  123.70  128.60  408.20

ExperimentClean = unlist(ExperimentClean)
print(length(ExperimentClean))
198
print(summary(ExperimentClean))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.501   3.028   8.866  14.210  21.320  58.170 

Before using ANOVA we have to check for Normality, Sphericity

1. Normality

hist(ExperimentClean, breaks="FD")
qqnorm(ExperimentClean)
qqline(ExperimentClean)

Alt Text Alt Text Alt Text

From the above graph we see that the data does not satisfy the normality condition. We have to also test the Shapiro-Wilk test for normality

shapiro.test(ExperimentClean)
print(ks.test(ExperimentClean, "pnorm", mean=mean(ExperimentClean), sd=sd(ExperimentClean)))

Shapiro-Wilk normality test

data:  ExperimentClean
W = 0.84025, p-value = 1.767e-13

> print(ks.test(ExperimentClean, "pnorm", mean=mean(ExperimentClean), sd=sd(ExperimentClean)))

	One-sample Kolmogorov-Smirnov test

data:  ExperimentClean
D = 0.17418, p-value = 1.211e-05
alternative hypothesis: two-sided

From the results of Shapiro-Wilk test we see that p-value is very much less than 0.05. Hence we conclude that the data is not normal. To make the data normal we use the log function and the test for normality.

ExperimentCorrected <- ExperimentClean
ExperimentCorrected = log(ExperimentCorrected)
ExperimentCorrected = ExperimentCorrected + abs(summary(ExperimentCorrected)[1])
print(summary(ExperimentCorrected))

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8126  1.5140  2.5880  2.5600  3.4660  4.4700 
hist(ExperimentCorrected, breaks="FD")
qqnorm(ExperimentCorrected)
qqline(ExperimentCorrected)
shapiro.test(ExperimentCorrected)
print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))

Alt Text

Alt Text

Alt Text

shapiro.test(ExperimentCorrected)
print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))

Shapiro-Wilk normality test

data:  ExperimentCorrected
W = 0.93184, p-value = 5.38e-08

> print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))

	One-sample Kolmogorov-Smirnov test

data:  ExperimentCorrected
D = 0.12498, p-value = 0.004116
alternative hypothesis: two-sided

From One-sample Kolmogorov-Smirnov test we see p-value much closer to 0.05 and from the graphs we see it is closer to normal.

#Remove wrong selection times from data frame
data<-data[!(data$SelectionTime < 0),]
#Remove outliers from data frame
data<-data[!(data$SelectionTime > threshold.upper | data$SelectionTime < threshold.lower),]

data$CorrectedSelectionTime<-ExperimentCorrected

head(data, nrow(data))

Calculate means per condition per participant

Subject=list()
Condition=list()
SelectionTime=list()
SelectionError=list()
CorrectedSelectionTime=list()

for(p in unlist(participants))
{
    for(c in unlist(conditions))
    {
        pcData = data[data$Subject==p&data$Condition==c,]
        
        Subject=c(Subject, p)
        Condition=c(Condition, c)
        SelectionTime=c(SelectionTime, mean(unlist(pcData["SelectionTime"])))
        SelectionError=c(SelectionError, sum(unlist(pcData["SelectionError"])))
        CorrectedSelectionTime=c(CorrectedSelectionTime, mean(unlist(pcData["CorrectedSelectionTime"])))
    }
}

Subject = unlist(Subject)
Condition = unlist(Condition)
SelectionTime = unlist(SelectionTime)
SelectionError = unlist(SelectionError)
CorrectedSelectionTime=unlist(CorrectedSelectionTime)

cleanData = data.frame(Subject, Condition, SelectionTime, SelectionError, CorrectedSelectionTime)

head(cleanData, nrow(cleanData))
Clone this wiki locally