Skip to content

stat-assignments/clean-flights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flights across the US

The department of transportation is tracking every flight taking off and landing on an airport in the US and publishing information through the TranStats system at https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ&QO_fu146_anzr=b0-gvzr.

Files for three months of flights have been added to the data folder. The flights for each month are contained in a compressed csv file each:

dir("data")
#> [1] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2024_11.zip"
#> [2] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2024_12.zip"
#> [3] "On_Time_Marketing_Carrier_On_Time_Performance_Beginning_January_2018_2025_1.zip"

Information for each of the variables and explanations for each level can be found on the same website.

TODO

Prepping

  1. Accept the assignment in Canvas, follow the link to create a repository, and clone this repository to your local machine.
  2. Create a file named index.qmd and add it to the repository. This is the file that should contain your code, results, and interpretations. Make sure to include enough detail that your work is fully reproducible.

To Do Items

  1. Import flights for all three months into a single object flights. Report on the number of flights by months and the number of features reported for it.

  2. Determine a key for the data set. Make sure to show that it fulfills the requirements of a key.

  3. Give three different examples of transitive dependencies in the data.

  4. Airport information is included for both Origin and Dest. Create a new data set called airports in which you include the information once, then remove all but the required airport information from the flights object.

  5. Create a summary for the number of flights each day and plot it. Color the points by day of the week. Make sure to provide labels for the days of the week. Describe the general pattern for the number of flights you see. Which days do not follow this pattern?

  6. The variable Cancelled contains information about whether a flight was cancelled. Why can we not assume that flight cancellations occur completely randomly? Identify at least two factors contributing to non-random flight cancellations and visualize your findings.

Submission

Ensure that the file index.qmd renders without errors. Read through the rendered document to check for consistency. Remove excessive printouts. Add all relevant(!) files to the repository, commit, and push!

About

The department of transportation is publishing data on flights and delays using the TranStats system at https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGK&QO_fu146_anzr=b0-gvzr

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published