imdccal is an R Package designed to help users work with water chemistry lab deliverables from the Cooperative Chemical Analytical Laboratory (CCAL) at Oregon State University.
Functions such as read_ccal()
and format_equis_results()
help users
extract data from .xlsx files provided by CCAL, convert them to a
machine readable format, and begin processing them into the
Environmental Quality Information Systems (EQuIS) Electronic Data
Deliverable (EDD) format.
The package is primarily geared towards data management in the Inventory & Monitoring Division (IMD) of the National Park Service, but may be useful for other users of CCAL data.
imdccal was developed by Sarah Wright and Liam Smith and is maintained by Sarah Wright. Please reach out via Issues with any ideas for improvement.
Disclaimer: This package does its best to parse human-entered data. However, lab deliverables are known to vary across projects and over time. This package will never be able to predict every variation in the input data, although we will do our best to update it as we learn of new edge cases. As is the case with any software tool, a knowledgeable human should ALWAYS verify that its outputs are correct and complete.
You can install the development version of imdccal from its GitHub repository with:
# install.packages("remotes")
remotes::install_github("nationalparkservice/imd-ccal")
To demonstrate the functionality of imdccal, we created two fictitious CCAL deliverables that are downloaded with the package. The story behind the data is that it is 2099 and the NPS now has an Outer Space Network with monitoring locations on the Moon, Mars, Mercury, and Saturn. Users can find the file paths to the example data by running the following code chunk. Since the example data is installed with the package, you can run any of the code in this ReadMe on your own computer.
use_example_data(file_names = use_example_data())
Read and tidy the data and work with it in R, without writing the data to any files.
# Load package
library(imdccal)
# Create tidied CCAL data from demo data stored in the imdccal package
tidy_ccal <- read_ccal(use_example_data(file_names = "SPAC_080199.xlsx"))
# Get the data for a single set of lab results
data <- tidy_ccal$`SPAC_080199.xlsx`$data
# Get the metadata for the same set of results
meta <- tidy_ccal$`SPAC_080199.xlsx`$metadata
Write the tidied data that is currently stored in the R environment to file. Note that destination_folder must already exist.
# Write data stored in environment to file
write_ccal(all_data = tidy_ccal,
format = "xlsx", # alternatively, "csv"
destination_folder = "ccal_tidy", # must already exist
overwrite = TRUE,
suffix = "_tidy",
num_tables = 4)
Instead of the two step process of creating the tidied R object and
writing it to file, users may use the read_write_ccal()
function to
create and write the tidied data to file in one step. Here is an example
of reading data from a single file of CCAL lab data and writing the
machine-readable version to an Excel file or set of CSV files.
# Write tidied CCAL data to file from demo data stored in imdccal package
# Write tidied data to a new .xlsx
read_write_ccal(use_example_data(file_names = "SPAC_080199.xlsx"),
destination_folder = "ccal_tidy")
# Write tidied data to a folder of CSV files
read_write_ccal(use_example_data(file_names = "SPAC_081599.xlsx"),
format = "csv", destination_folder = "ccal_tidy")
All of these functions also work when supplied with a vector of file paths to multiple CCAL deliverables. Here is an example of reading data from multiple files of CCAL lab data and writing the machine-readable version to an Excel files or sets of CSV files.
# Get file paths
all_files <- use_example_data(file_names = use_example_data())
# Write to xlsx
# Write one file of tidied data per input file
read_write_ccal(all_files, destination_folder = "ccal_tidy")
# Write to csv
# Write one folder of tidied CSV data per input file
read_write_ccal(all_files, format = "csv", destination_folder = "ccal_tidy")
By default, these functions create separate tables and files for each
CCAL deliverable supplied as input. To concatenate the results together,
set the concat
argument in read_ccal()
or read_write_ccal()
to
TRUE.
read_write_ccal(all_files, destination_folder = "ccal_tidy", concat = TRUE)
In addition to converting CCAL lab deliverables to a machine readable
format, imdccal also provides functionality to begin processing the
machine readable data into the EQuIS EDD format. Specifically, the
format_equis_results()
function begins to format the data into the
Results table for the EDD.
Uses:
- Process data into one of the tables in the format accepted by EQuIS
- Censor values less than the MDL and LQL and assign Result_Detection_Condition approriately
Limitations:
- Users still need to create Activities table for every deliverable and the Projects and Locations tables when they require edits.
- Within the Results table, users still need to define Activity_ID, modify Result_Status (set to Pre-Cert since additional QC is needed), add flags to Result_Qualifier, and conduct the rest of their own QC processes
For further information regarding the EQuIS EDD, please refer to the NPS EQuIS Resources website (internal DOI link), which provides comprehensive guidelines on formatting data for EQuIS.
Here we demonstrate the usage of the format_equis_results()
function.
Read the data, create the Results table of the EDD, and work with it in
R without writing the data to any files:
# Create results table from demo data stored in the imdccal package
results_incomplete <- format_equis_results(
use_example_data(file_names = "SPAC_080199.xlsx"))
If you inspect the table created above, you will notice that there are so many columns without data that the result is not useful. That’s because our code relies on a table containing detection limits for each analyte. To accommodate changes to detection limits over time, the table includes StartDate and EndDate columns. The issue we have above is that none of the limits are applicable after December 31, 2024, and our example data is from 2099.
By default, format_equis_results()
uses the version of the table that
is stored within the package. However, users may provide their own
version of the table as an argument in the function if that better suits
their needs. This may occur, for example, if detection limits change or
if a user needs to include an analyte that is not yet in the table. To
provide a different version of the table, start with the version stored
in the package, and add rows or edit columns as you see fit. It may be
helpful to review the documentation of the limits table (run
?imdccal::detection_limits
) as you make your edits. For example, we
edit the EndDate for all relevant rows to December 31, 2099 in the code
below.
# Edit limits table
limits <- imdccal::detection_limits |>
dplyr::mutate(EndDate = dplyr::if_else(EndDate == "2024-12-31",
lubridate::ymd("2099-12-31"),
EndDate))
Users may also save the table to a csv or xlsx file and make changes
directly to a spreadsheet. To use one’s updated table in
format_equis_results()
, simply supply the function with the modified
table (as an R object) as the limits argument. We do so with our example
data below.
# Create results table from demo data stored in the imdccal package
results_complete <- format_equis_results(
use_example_data(file_names = "SPAC_080199.xlsx"),
limits = limits)
Like with the machine readable data, we can write the results table from
our R environment to file with the write_ccal()
function.
# Write data stored in environment to file
write_ccal(all_data = results_complete,
format = "xlsx", # alternatively, "csv"
destination_folder = "ccal_tidy", # must already exist
overwrite = TRUE,
suffix = "_edd_results",
num_tables = 1)
Also in the same manner as the machine readable data, instead of a two
step process of creating the results table R object and writing it to
file, users may use the write_equis_results()
function to create and
write the results table to file in one step. Here is an example of
reading data from a single file of CCAL lab data and writing the results
table to an Excel or CSV file.
# Write results table to xlsx from demo data stored in imdccal package
write_equis_results(files = use_example_data(file_names = "SPAC_080199.xlsx"),
limits = limits,
format = "xlsx",
destination_folder = "ccal_tidy",
overwrite = TRUE)
# Write results table to csv from demo data stored in imdccal package
write_equis_results(files = use_example_data(file_names = "SPAC_081599.xlsx"),
limits = limits,
format = "csv",
destination_folder = "ccal_tidy",
overwrite = TRUE)
As before, all of these functions work when supplied with a vector of file paths to multiple CCAL deliverables. Here is an example of reading data from multiple files of CCAL lab data and writing the results tables to Excel or CSV files.
# Get file paths
all_files <- use_example_data(file_names = use_example_data())
# Write to xlsx
write_equis_results(files = all_files,
limits = limits,
destination_folder = "ccal_tidy",
overwrite = TRUE)
# Write to csv
write_equis_results(files = all_files,
limits = limits,
format = "csv",
destination_folder = "ccal_tidy",
overwrite = TRUE)
By default, these functions create separate tables and files for each
CCAL deliverable supplied as input. To concatenate the results together,
set the concat
argument in format_equis_results()
or
write_equis_results()
to TRUE.
write_equis_results(files = all_files,
limits = limits,
destination_folder = "ccal_tidy",
overwrite = TRUE,
concat = TRUE)