-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Continuing this discussion here.
I am working on some Python code to make working with OpenData easier. It's far from finished (it only sort of works for my use-case now) but I would like to share it and putting in this repository makes sense. Before I spend more time on polishing it I'd like some input on what the library should look like.
Features I would like to have in the library:
- View metadata of all athletes: currently the metadata lives in the blob for each athlete so you need to download all the data to view it. I propose to create a metadata file in the root of this repo that is updated every once in a while to reflect new/changed files in the OSF directory.
- Tool to selectively download data: Only download a specific athlete, or only athletes with specific data types, date ranges, amounts of data, etc. based on the metadata.
- Should return the activities in a general purpose data format. I propose to use a
pandas.DataFrame
for this. - Tool to make running computations on large amounts of activities easier: Not sure how to do this yet but with the amount of data that's already in OpenData it's impossible to have it all in memory so some clever batch-processing is needed there and I think some tooling might help there and has it's place in this library.
Any input is welcome!
Metadata
Metadata
Assignees
Labels
No labels