Skip to content

MechanicalRabbit/FunSQL-TestData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FunSQL-TestData.jl

FunSQL-TestDATA is a set of disparately licensed data sets used for testing FunSQL.

These data are used demonstrate, document, and test the FunSQL ecosystem. They are packaged exactly with a tar.gz release that is used via Julia's Pkg Artifact. Each dataset is licensed individually.

MIMIC-IV Clinical Database Demo

This is a 100 person demostration sample from the Medical Information Mart for Intensive Care (MIMIC)-IV database as prepared from Boston's Beth Israel Deaconess Medical Center electronic health records, released though PhysioNet collaborative repository under the copyleft Open Data Commons Open Database License v1.0.

The full MIMIC IV dataset is available upon completing CITI Program "Massachusetts Institute of Technology Affilates" training and executing the PhysioNet Credentialed Health Data License. PhysioNet credentialing is available to independent researchers.

To use the 100 person demo in Julia, you'll first need this in your Artifacts.toml.

[mimic-iv-demo]
git-tree-sha1 = "e9227b6756a382f42ab91cfb5ea8fda781c7b95e"

    [[mimic-iv-demo.download]]
    url = "https://github.com/MechanicalRabbit/FunSQL-TestData/releases/download/20250504/mimic-iv-demo-2.2.duckdb.tgz"
    sha256 = "e534b45b0d5c48dbe17594b8b74f72a4f5f04cb65c1b283c19247f2792e98c94"

The following Julia program should then work.

using Pkg, Pkg.Artifacts
Pkg.instantiate() # download Artifacts.toml

using DuckDB
conn = DuckDB.DB()
mimic_dbfile = joinpath(artifact"mimic-iv-demo", "mimic-iv-demo-2.2.duckdb")
DuckDB.execute(conn, "ATTACH '$(mimic_dbfile)' AS mimic (READ_ONLY);")
DuckDB.execute(conn, "SELECT count(*) FROM mimic.patients")

eICU Collaborative Research Database Demo

This is a ~2500 ICU stay demonstration sample from the [eICU Collaborative Research Database][eicu-crd] (eICU-CRD) database, a multi-center database comprised of deidentified health data for over 200,000 admissions to ICUs across the United States between 2014-2015. This is released though PhysioNet collaborative repository under the copyleft Open Data Commons Open Database License v1.0.

The full eICU-CRD dataset is available upon completing CITI Program "Massachusetts Institute of Technology Affilates" training and executing the PhysioNet Credentialed Health Data License. PhysioNet credentialing is available to independent researchers.

To use this demo in Julia, you'll first need this in your Artifacts.toml.

[eicu-crd-demo]
git-tree-sha1 = "006da6f242eb440ffec3b706bd3edd06ea9831d0"

    [[eicu-crd-demo.download]]
    url = "https://github.com/MechanicalRabbit/FunSQL-TestData/releases/download/20250514/eicu-crd-demo-2.0.1.duckdb.tgz"
    sha256 = "0e8752f71ac6d802fee5d08809e824fa69c23057dad796c9aec05bcdc6108608"

The following Julia program should then work.

using Pkg, Pkg.Artifacts
Pkg.instantiate() # download Artifacts.toml

using DuckDB
conn = DuckDB.DB()
eicu_dbfile = joinpath(artifact"eicu-crd-demo", "eicu-crd-demo-2.0.1.duckdb")
DuckDB.execute(conn, "ATTACH '$(eicu_dbfile)' AS eicu (READ_ONLY);")
DuckDB.execute(conn, "SELECT count(), count(distinct uniquepid) FROM eicu.patient")

About

FunSQL-TestDATA is a set of disparately licensed data sets used for testing FunSQL.

Resources

License

Stars

Watchers

Forks

Packages

No packages published