This repository was archived by the owner on Jun 16, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Clean Data
Savitha edited this page Apr 10, 2017
·
9 revisions
Outliers = unlist(Outliers)
print(length(Outliers))
12
print(summary(Outliers))
Min. 1st Qu. Median Mean 3rd Qu. Max.
66.63 70.11 79.20 123.70 128.60 408.20
ExperimentClean = unlist(ExperimentClean)
print(length(ExperimentClean))
198
print(summary(ExperimentClean))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.501 3.028 8.866 14.210 21.320 58.170
hist(ExperimentClean, breaks="FD")
qqnorm(ExperimentClean)
qqline(ExperimentClean)
From the above graph we see that the data does not satisfy the normality condition. We have to also test the Shapiro-Wilk test for normality
shapiro.test(ExperimentClean)
print(ks.test(ExperimentClean, "pnorm", mean=mean(ExperimentClean), sd=sd(ExperimentClean)))
Shapiro-Wilk normality test
data: ExperimentClean
W = 0.84025, p-value = 1.767e-13
> print(ks.test(ExperimentClean, "pnorm", mean=mean(ExperimentClean), sd=sd(ExperimentClean)))
One-sample Kolmogorov-Smirnov test
data: ExperimentClean
D = 0.17418, p-value = 1.211e-05
alternative hypothesis: two-sided
From the results of Shapiro-Wilk test we see that p-value is very much less than 0.05. Hence we conclude that the data is not normal. To make the data normal we use the log function and the test for normality.
ExperimentCorrected <- ExperimentClean
ExperimentCorrected = log(ExperimentCorrected)
ExperimentCorrected = ExperimentCorrected + abs(summary(ExperimentCorrected)[1])
print(summary(ExperimentCorrected))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8126 1.5140 2.5880 2.5600 3.4660 4.4700
hist(ExperimentCorrected, breaks="FD")
qqnorm(ExperimentCorrected)
qqline(ExperimentCorrected)
shapiro.test(ExperimentCorrected)
print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))
shapiro.test(ExperimentCorrected)
print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))
Shapiro-Wilk normality test
data: ExperimentCorrected
W = 0.93184, p-value = 5.38e-08
> print(ks.test(ExperimentCorrected, "pnorm", mean=mean(ExperimentCorrected), sd=sd(ExperimentCorrected)))
One-sample Kolmogorov-Smirnov test
data: ExperimentCorrected
D = 0.12498, p-value = 0.004116
alternative hypothesis: two-sided
From One-sample Kolmogorov-Smirnov test we see p-value much closer to 0.05 and from the graphs we see it is closer to normal.
#Remove wrong selection times from data frame
data<-data[!(data$SelectionTime < 0),]
#Remove outliers from data frame
data<-data[!(data$SelectionTime > threshold.upper | data$SelectionTime < threshold.lower),]
data$CorrectedSelectionTime<-ExperimentCorrected
head(data, nrow(data))
Subject=list()
Condition=list()
SelectionTime=list()
SelectionError=list()
CorrectedSelectionTime=list()
for(p in unlist(participants))
{
for(c in unlist(conditions))
{
pcData = data[data$Subject==p&data$Condition==c,]
Subject=c(Subject, p)
Condition=c(Condition, c)
SelectionTime=c(SelectionTime, mean(unlist(pcData["SelectionTime"])))
SelectionError=c(SelectionError, sum(unlist(pcData["SelectionError"])))
CorrectedSelectionTime=c(CorrectedSelectionTime, mean(unlist(pcData["CorrectedSelectionTime"])))
}
}
Subject = unlist(Subject)
Condition = unlist(Condition)
SelectionTime = unlist(SelectionTime)
SelectionError = unlist(SelectionError)
CorrectedSelectionTime=unlist(CorrectedSelectionTime)
cleanData = data.frame(Subject, Condition, SelectionTime, SelectionError, CorrectedSelectionTime)
head(cleanData, nrow(cleanData))
So science, much wiki.