See LICENSE_CONTENT
for details for each page.
This repository provides the code + pipeline to:
- download the Google Docs for each author
- convert them to markdown
- extract the individual pages / tabs for each term
- clean the content
- upload the data to google sheets
install packages with renv
.
install.packages("renv")
renv::restore()
The following files that are not checked into this repository are needed to run the pipeline:
.slackr-slm-list
(see slackr package documentation)
token: token
incoming_webhook_url: webhook url
icon_emoji: can be empty
username: slackr
channel: #a-channel
.env
SLACK_TOKEN="same slack token as in slackr file"
SLACK_LIST_FILEID="id of Slack List (kanban board)"
USE_MARKDOWN_STRICT=1 # or 0
UPLOAD_SHEET="google sheets url"
GDRIVE_SHARED="shared drive name where google docs reside"
DOC_PREFIX="common prefix of google docs - for searching"
data/meta/authors.csv
: used to match from Slack List to names to use for the CC-BY License
email, name
max.musterperson@email.de, Max Musterperson
The R files are to be run sequentially, indicated per their number prefix.
You can run pipeline.R
to execute them in the right order or run them individually.
You can also use make
to resolve dependencies in a more elegant and efficient way:
make upload
This does not automatically re-download the Google docs and the Slack Kanban board. To do so:
make download-gdocs # optional
make download-kanban # optional
make upload
intermediate targets - again, those do not automatically download their online dependencies. This has to be done using the targets download-gdocs
respectively download-kanban
# make download-gdocs
make data/md/
make download-gdocs # optional
make download-kanban # optional
make data/md_upload/