diff --git a/README.md b/README.md index d4c0d752a9e..f3e94b7e035 100644 --- a/README.md +++ b/README.md @@ -1,114 +1,114 @@ -## Introduction - -This assignment uses data from -the UC Irvine Machine -Learning Repository, a popular repository for machine learning -datasets. In particular, we will be using the "Individual household -electric power consumption Data Set" which I have made available on -the course web site: - - -* Dataset: Electric power consumption [20Mb] - -* Description: Measurements of electric power consumption in -one household with a one-minute sampling rate over a period of almost -4 years. Different electrical quantities and some sub-metering values -are available. - - -The following descriptions of the 9 variables in the dataset are taken -from -the UCI -web site: - -
    -
  1. Date: Date in format dd/mm/yyyy
  2. -
  3. Time: time in format hh:mm:ss
  4. -
  5. Global_active_power: household global minute-averaged active power (in kilowatt)
  6. -
  7. Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
  8. -
  9. Voltage: minute-averaged voltage (in volt)
  10. -
  11. Global_intensity: household global minute-averaged current intensity (in ampere)
  12. -
  13. Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
  14. -
  15. Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
  16. -
  17. Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
  18. -
- -## Loading the data - - - - - -When loading the dataset into R, please consider the following: - -* The dataset has 2,075,259 rows and 9 columns. First -calculate a rough estimate of how much memory the dataset will require -in memory before reading into R. Make sure your computer has enough -memory (most modern computers should be fine). - -* We will only be using data from the dates 2007-02-01 and -2007-02-02. One alternative is to read the data from just those dates -rather than reading in the entire dataset and subsetting to those -dates. - -* You may find it useful to convert the Date and Time variables to -Date/Time classes in R using the `strptime()` and `as.Date()` -functions. - -* Note that in this dataset missing values are coded as `?`. - - -## Making Plots - -Our overall goal here is simply to examine how household energy usage -varies over a 2-day period in February, 2007. Your task is to -reconstruct the following plots below, all of which were constructed -using the base plotting system. - -First you will need to fork and clone the following GitHub repository: -[https://github.com/rdpeng/ExData_Plotting1](https://github.com/rdpeng/ExData_Plotting1) - - -For each plot you should - -* Construct the plot and save it to a PNG file with a width of 480 -pixels and a height of 480 pixels. - -* Name each of the plot files as `plot1.png`, `plot2.png`, etc. - -* Create a separate R code file (`plot1.R`, `plot2.R`, etc.) that -constructs the corresponding plot, i.e. code in `plot1.R` constructs -the `plot1.png` plot. Your code file **should include code for reading -the data** so that the plot can be fully reproduced. You should also -include the code that creates the PNG file. - -* Add the PNG file and R code file to your git repository - -When you are finished with the assignment, push your git repository to -GitHub so that the GitHub version of your repository is up to -date. There should be four PNG files and four R code files. - - -The four plots that you will need to construct are shown below. - - -### Plot 1 - - -![plot of chunk unnamed-chunk-2](figure/unnamed-chunk-2.png) - - -### Plot 2 - -![plot of chunk unnamed-chunk-3](figure/unnamed-chunk-3.png) - - -### Plot 3 - -![plot of chunk unnamed-chunk-4](figure/unnamed-chunk-4.png) - - -### Plot 4 - -![plot of chunk unnamed-chunk-5](figure/unnamed-chunk-5.png) - +## Introduction + +This assignment uses data from +the UC Irvine Machine +Learning Repository, a popular repository for machine learning +datasets. In particular, we will be using the "Individual household +electric power consumption Data Set" which I have made available on +the course web site: + + +* Dataset: Electric power consumption [20Mb] + +* Description: Measurements of electric power consumption in +one household with a one-minute sampling rate over a period of almost +4 years. Different electrical quantities and some sub-metering values +are available. + + +The following descriptions of the 9 variables in the dataset are taken +from +the UCI +web site: + +
    +
  1. Date: Date in format dd/mm/yyyy
  2. +
  3. Time: time in format hh:mm:ss
  4. +
  5. Global_active_power: household global minute-averaged active power (in kilowatt)
  6. +
  7. Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
  8. +
  9. Voltage: minute-averaged voltage (in volt)
  10. +
  11. Global_intensity: household global minute-averaged current intensity (in ampere)
  12. +
  13. Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
  14. +
  15. Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
  16. +
  17. Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
  18. +
+ +## Loading the data + + + + + +When loading the dataset into R, please consider the following: + +* The dataset has 2,075,259 rows and 9 columns. First +calculate a rough estimate of how much memory the dataset will require +in memory before reading into R. Make sure your computer has enough +memory (most modern computers should be fine). + +* We will only be using data from the dates 2007-02-01 and +2007-02-02. One alternative is to read the data from just those dates +rather than reading in the entire dataset and subsetting to those +dates. + +* You may find it useful to convert the Date and Time variables to +Date/Time classes in R using the `strptime()` and `as.Date()` +functions. + +* Note that in this dataset missing values are coded as `?`. + + +## Making Plots + +Our overall goal here is simply to examine how household energy usage +varies over a 2-day period in February, 2007. Your task is to +reconstruct the following plots below, all of which were constructed +using the base plotting system. + +First you will need to fork and clone the following GitHub repository: +[https://github.com/rdpeng/ExData_Plotting1](https://github.com/rdpeng/ExData_Plotting1) + + +For each plot you should + +* Construct the plot and save it to a PNG file with a width of 480 +pixels and a height of 480 pixels. + +* Name each of the plot files as `plot1.png`, `plot2.png`, etc. + +* Create a separate R code file (`plot1.R`, `plot2.R`, etc.) that +constructs the corresponding plot, i.e. code in `plot1.R` constructs +the `plot1.png` plot. Your code file **should include code for reading +the data** so that the plot can be fully reproduced. You should also +include the code that creates the PNG file. + +* Add the PNG file and R code file to your git repository + +When you are finished with the assignment, push your git repository to +GitHub so that the GitHub version of your repository is up to +date. There should be four PNG files and four R code files. + + +The four plots that you will need to construct are shown below. + + +### Plot 1 + + +![plot of chunk unnamed-chunk-2](figure/unnamed-chunk-2.png) + + +### Plot 2 + +![plot of chunk unnamed-chunk-3](figure/unnamed-chunk-3.png) + + +### Plot 3 + +![plot of chunk unnamed-chunk-4](figure/unnamed-chunk-4.png) + + +### Plot 4 + +![plot of chunk unnamed-chunk-5](figure/unnamed-chunk-5.png) + diff --git a/plot1.R b/plot1.R new file mode 100644 index 00000000000..d7a0c108772 --- /dev/null +++ b/plot1.R @@ -0,0 +1,25 @@ +temp <- tempfile() +download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",temp) +power <- read.table(unz(temp,"household_power_consumption.txt"), + sep=";", + header = T, + na="?", + colClasses = c("character", + 'character', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric')) + +unlink(temp) +power <- power[which(power$Date == '2/2/2007' | power$Date=='1/2/2007'),] + +power$POSIX <-as.POSIXlt.character(paste(power$Date,power$Time),format = "%d/%m/%Y %H:%M:%S") + +#plot.1 +png(filename="plot1.png",width=480, height=480) +hist(power$Global_active_power, col = 'red', main = 'Global Active Power', xlab = 'Global Active Power (kilowatts)') +dev.off() diff --git a/plot1.png b/plot1.png new file mode 100644 index 00000000000..db485ff3905 Binary files /dev/null and b/plot1.png differ diff --git a/plot2.R b/plot2.R new file mode 100644 index 00000000000..6ff0efc94e6 --- /dev/null +++ b/plot2.R @@ -0,0 +1,25 @@ +temp <- tempfile() +download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",temp) +power <- read.table(unz(temp,"household_power_consumption.txt"), + sep=";", + header = T, + na="?", + colClasses = c("character", + 'character', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric')) + +unlink(temp) +power <- power[which(power$Date == '2/2/2007' | power$Date=='1/2/2007'),] + +power$POSIX <-as.POSIXlt.character(paste(power$Date,power$Time),format = "%d/%m/%Y %H:%M:%S") + +#plot2 +png(filename="plot2.png",width=480, height=480) +plot(x=power$POSIX ,y=power$Global_active_power, type = 'l', xlab='',ylab = 'Global Active Power (kilowatts)') +dev.off() diff --git a/plot2.png b/plot2.png new file mode 100644 index 00000000000..08381f8befb Binary files /dev/null and b/plot2.png differ diff --git a/plot3.R b/plot3.R new file mode 100644 index 00000000000..23d50fd658e --- /dev/null +++ b/plot3.R @@ -0,0 +1,29 @@ +temp <- tempfile() +download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",temp) +power <- read.table(unz(temp,"household_power_consumption.txt"), + sep=";", + header = T, + na="?", + colClasses = c("character", + 'character', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric')) + +unlink(temp) +power <- power[which(power$Date == '2/2/2007' | power$Date=='1/2/2007'),] + +power$POSIX <-as.POSIXlt.character(paste(power$Date,power$Time),format = "%d/%m/%Y %H:%M:%S") + + +#plot3 +png(filename="plot3.png",width=480, height=480) +plot(x=power$POSIX,y=power$Sub_metering_1, type='l', col = 'black', ylab = 'Energy sub metering', xlab = '') +lines(x=power$POSIX,y=power$Sub_metering_2, col='red') +lines(x=power$POSIX,y=power$Sub_metering_3, col='blue') +legend('topright', legend = c('Sub_metering_1',"Sub_metering_2","Sub_metering_3"), col = c('black','red','blue'), lty = 1) +dev.off() \ No newline at end of file diff --git a/plot3.png b/plot3.png new file mode 100644 index 00000000000..38ac5a873ae Binary files /dev/null and b/plot3.png differ diff --git a/plot4.R b/plot4.R new file mode 100644 index 00000000000..35dba57393f --- /dev/null +++ b/plot4.R @@ -0,0 +1,33 @@ +temp <- tempfile() +download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",temp) +power <- read.table(unz(temp,"household_power_consumption.txt"), + sep=";", + header = T, + na="?", + colClasses = c("character", + 'character', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric', + 'numeric')) + +unlink(temp) +power <- power[which(power$Date == '2/2/2007' | power$Date=='1/2/2007'),] + +power$POSIX <-as.POSIXlt.character(paste(power$Date,power$Time),format = "%d/%m/%Y %H:%M:%S") + + +#plot4 +png(filename="plot4.png",width=480, height=480) +par(mfrow=c(2,2)) +plot(x=power$POSIX ,y=power$Global_active_power, type = 'l', xlab='',ylab = 'Global Active Power') +plot(x=power$POSIX ,y=power$Voltage, type = 'l', xlab='datetime',ylab = 'Voltage') +plot(x=power$POSIX,y=power$Sub_metering_1, type='l', col = 'black', ylab = 'Energy sub metering', xlab = '') +lines(x=power$POSIX,y=power$Sub_metering_2, col='red') +lines(x=power$POSIX,y=power$Sub_metering_3, col='blue') +legend('topright', legend = c('Sub_metering_1',"Sub_metering_2","Sub_metering_3"), col = c('black','red','blue'), lty = 1, bty = "n") +plot(x=power$POSIX ,y=power$Global_reactive_power, type = 'l', xlab='datetime',ylab = 'Global_reactive_power') +dev.off() diff --git a/plot4.png b/plot4.png new file mode 100644 index 00000000000..8d42c225a45 Binary files /dev/null and b/plot4.png differ