The department of transportation is tracking every flight taking off and landing on an airport in the US and publishing information through the TranStats system at https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ&QO_fu146_anzr=b0-gvzr.
Files for three months of flights have been added to the data folder. The flights for each month are contained in a compressed csv file each:
dir("data")
#> [1] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2024_11.zip"
#> [2] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2024_12.zip"
#> [3] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2025_1.zip"
Information for each of the variables and explanations for each level can be found on the same website.
- Accept the assignment in Canvas, follow the link to create a repository, and clone this repository to your local machine.
- Create a file named
index.qmd
and add it to the repository. This is the file that should contain your code, results, and interpretations. Make sure to include enough detail that your work is fully reproducible.
-
Import flights for all three months into a single object
flights
. Report on the number of flights by months and the number of features reported for it. -
Determine a key for the data set. Make sure to show that it fulfills the requirements of a key.
-
Give three different examples of transitive dependencies in the data.
-
Airport information is included for both
Origin
andDest
. Create a new data set calledairports
in which you include the information once, then remove all but the required airport information from theflights
object. -
Create a summary for the number of flights each day and plot it. Color the points by day of the week. Make sure to provide labels for the days of the week. Describe the general pattern for the number of flights you see. Which days do not follow this pattern?
-
The variable
Cancelled
contains information about whether a flight was cancelled. Why can we not assume that flight cancellations occur completely randomly? Identify at least two factors contributing to non-random flight cancellations and visualize your findings.
Ensure that the file index.qmd
renders without errors. Read through
the rendered document to check for consistency. Remove excessive
printouts. Add all relevant(!) files to the repository, commit, and
push!