-
Notifications
You must be signed in to change notification settings - Fork 108
Description
Follow-up to #1325 and posit-dev/ark#870.
We now examine the DESCRIPTION
and NAMESPACE
files of a package to statically determine its exports, but datasets are not declared in these files.
While there is no top-level static declarations for exported datasets (https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Data-in-packages), Rd documentation has specific markup \docType{data}
for datasets: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Documenting-data-sets (thanks for the pointer @DavisVaughan). Fortunately, this is linted by R CMD check, which means we can reliably use these markups as data export declarations:
* checking for missing documentation entries ... WARNING
Undocumented data sets:
'penguins'
The data/
folder in a source repository may contain:
- txt and CSV files with raw data
- R files with R code defining the dataset
save()
images
What happens to these data sources depends on whether the LazyData
field in DESCRIPTION is set to true:
-
If true, the data is loaded, serialized with compression, and packaged with an index. Both the data and index files are RDS files that are lazy-loaded along with the namespace: https://github.com/r-devel/r-svn/blob/fb6baa45/src/library/base/R/namespace.R#L585-L590.
When the package is attached, the data objects are directly exported to the search path: https://github.com/r-devel/r-svn/blob/fb6baa45069d41a798901c5d39667f9c1a8e1a87/src/library/base/R/namespace.R#L134-L136. This means they can be referenced after a
library()
call. -
If false, the data files are installed as is. Let's call these files "data exporter files". A
library(mypkg)
call doesn't automatically export the datasets to the search path, instead it makes data exporter symbols available indata()
calls. The user must pass exporters todata()
, e.g.data(mydata)
. The exporter then exports one or more datasets to the search path.If
library(mypkg)
has not been called, the exporters are not in scope and the user must explicitly qualify them:data(mydata, package = "mypkg")
. I've not checked what happens when exporter symbols conflict but I would guess they are masked depending on the order of thelibrary()
calls. -
A key thing to note is that a single exporter may export multiple dataset symbols. The dataset doc files must state which data exporter they belong to in the
\usage
section.
So we'll have two support these two ways of exporting data symbols, depending on the use of lazy-loaded data.
Plan:
-
Detect all
\docType{data}
markups inman/
and collect the\name{}
s. These are the datasets exported by the package. -
If the
Lazydata
DESCRIPTION field istrue
, export these objects at top-level. -
If
false
, detect base file names in the installeddata
folder, and export these symbols to a special search path specific todata()
context. -
When
data(package = "foo")
is set, narrow the special data search path to the installed data exporters of the packagefoo
. Match the exporter to the relevant set of datasets and export those.