Smart Geocubes #880
relativityhd
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everybody!
I want to share a small helper package I created, which I am calling "Smart-Geocubes": A high-performance library for intelligent loading and caching of remote geospatial raster data, built with xarray, zarr and icechunk.
Take a look:
I am very interested in what you think about this package, whether it could help you or somebody else in the future, or whether it serves a too specific use case.
I don't know yet how far I will extend the functionality, add more datasets etc.
I have a small roadmap, but very limited time at the moment. However, I am welcome for any ideas and feature & dataset requests. 😄
What is the purpose of this package?
This package solves a specific problem that most people who work with Earth observation data don't need to worry about.
When you're creating new data from existing data (for example, doing image segmentation with machine learning on Sentinel-2 images), people usually:
This "batched-processing" works great if you have a big computer with lots of storage space, like a cluster.
But if you're working on a smaller computer (like a laptop with a few hundred GB of storage and 16GB of RAM), this approach creates problems.
It makes it really hard to test and improve your programs because you don't have enough space.
Using frameworks like Ray for processing is also tricky with this approach.
They work better with "concurrent-processing": when each step of your processing pipeline can be done for each elements separately instead expecting to run a single step for all your data at once.
Plus, if you only need to look at certain areas but don't know which ones ahead of time, downloading everything is wasteful.
So instead, this package downloads the data only when you need it. But downloading the same thing over and over is inefficient. That's why we save (or "cache") the data on your computer's hard drive in form of zarr datacubes.
We call this way of working "procedural download" because you download pieces as you need them.
Therefore, this package does handle:
Beta Was this translation helpful? Give feedback.
All reactions