Skip to content

markcom/getdata-031

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data - Course Project

This repo was created for the Coursera Getting and Cleaning Data .

It contains 4 files:

  • README.md : Basic description of the assignement and the run_analysis.R script
  • run_analysis.R : The R script used to generate the tidy data set
  • result.txt : The tidy data set itself
  • Codebook : Description of the data set

Description of the run_analysis.R script

Prerequisites: the UCI HAR Dataset is available in the Workspace. Alternatively, it can be downloaded and unzipped using the script from lines 5-9 (uncommented).

  • Initially all data required for the analysis are read to the memory using the read.table function
  • All Test and Training data are merged (dataX, dataSubject, dataY)
  • The column names of the dataX data frame are modified according to the features - renamed, removed all but "std" & "mean"
  • The column names of the dataX data frame further changed to be "nicer"
  • activity number in dataY replaced by activity name
  • activities(dataY) merged with the dataX and renamed
  • subject(dataSubject) merged with the dataX and renamed
  • Mean calculated for the dataX (grouped by subject & activity) and stored in a new tidy dataframe resultDF
  • resultDF saved to a "result.txt"

Used data frames:

  • data_features: contains all the feature names
  • dataXTrain: Training data
  • dataXTest: Test data
  • dataX: dataXTest merged with dataXTrain
  • dataYTest: activity description for the Test
  • dataYTrain: activity description for the Train
  • dataY: dataYTest merged with dataYTrain
  • dataSubjectTest: contains the subject for Test
  • dataSubjectTrain: contains the subject for Train
  • dataSubject: dataSubjectTest merged with dataSubjectTrain
  • resultDF: means calculated for dataX stored in a dataframe

Used variables:

  • currentDir: Actual directory
  • dataDir: root directory of the UCI HAR Dataset
  • featuresToKeep: vector of columns, which are to be kept

About

Repo for the Coursera Getting and Cleaning Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages