Skip to content

Python library for working with OpenData #2

@AartGoossens

Description

@AartGoossens

Continuing this discussion here.

I am working on some Python code to make working with OpenData easier. It's far from finished (it only sort of works for my use-case now) but I would like to share it and putting in this repository makes sense. Before I spend more time on polishing it I'd like some input on what the library should look like.

Features I would like to have in the library:

  1. View metadata of all athletes: currently the metadata lives in the blob for each athlete so you need to download all the data to view it. I propose to create a metadata file in the root of this repo that is updated every once in a while to reflect new/changed files in the OSF directory.
  2. Tool to selectively download data: Only download a specific athlete, or only athletes with specific data types, date ranges, amounts of data, etc. based on the metadata.
  3. Should return the activities in a general purpose data format. I propose to use a pandas.DataFrame for this.
  4. Tool to make running computations on large amounts of activities easier: Not sure how to do this yet but with the amount of data that's already in OpenData it's impossible to have it all in memory so some clever batch-processing is needed there and I think some tooling might help there and has it's place in this library.

Any input is welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions