Skip to content

Export Data Marts to Parquet #338

@coussens

Description

@coussens

Discussed in #335

Originally posted by coussens January 10, 2025
Thanks again for creating such a valuable open-source project!

I had a minor feature improvement suggestion, which is an option to export Data Marts as parquet files. This would benefit both users of SageRx as well as those downloading the exported Data Marts off of the CodeRx website:

  • Columns would have data types, so there's no risk of users parsing from CSVs into different data types (e.g., the dreaded accidental conversion of NDCs to integers!)
  • File sizes would be much smaller (leading to faster downloads, and less storage required)
  • Better performance, integration with apache arrow, etc

This could be readily incorporated into the export_marts DAG in python, or even by incorporating the pg_parquet extension for Postgres.

Let me know what you think!

Metadata

Metadata

Assignees

No one assigned

    Labels

    optimizationNice to have, but not critical

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions