Smart Geocubes #880

relativityhd · 2025-03-29T00:38:50Z

relativityhd
Mar 29, 2025

Hello everybody!

I want to share a small helper package I created, which I am calling "Smart-Geocubes": A high-performance library for intelligent loading and caching of remote geospatial raster data, built with xarray, zarr and icechunk.

Take a look:

I am very interested in what you think about this package, whether it could help you or somebody else in the future, or whether it serves a too specific use case.

I don't know yet how far I will extend the functionality, add more datasets etc.
I have a small roadmap, but very limited time at the moment. However, I am welcome for any ideas and feature & dataset requests. 😄

What is the purpose of this package?

The following is copied from the Readme

This package solves a specific problem that most people who work with Earth observation data don't need to worry about.
When you're creating new data from existing data (for example, doing image segmentation with machine learning on Sentinel-2 images), people usually:

Download all the data
Run the algorithms and data science on it
Delete the data afterwards

This "batched-processing" works great if you have a big computer with lots of storage space, like a cluster.

But if you're working on a smaller computer (like a laptop with a few hundred GB of storage and 16GB of RAM), this approach creates problems.
It makes it really hard to test and improve your programs because you don't have enough space.
Using frameworks like Ray for processing is also tricky with this approach.
They work better with "concurrent-processing": when each step of your processing pipeline can be done for each elements separately instead expecting to run a single step for all your data at once.
Plus, if you only need to look at certain areas but don't know which ones ahead of time, downloading everything is wasteful.

So instead, this package downloads the data only when you need it. But downloading the same thing over and over is inefficient. That's why we save (or "cache") the data on your computer's hard drive in form of zarr datacubes.
We call this way of working "procedural download" because you download pieces as you need them.

Therefore, this package does handle:

The download "on-demand" (or "procedural download") of the data
The caching of the data on your computer's hard drive
The loading of the data into memory for regions specified by the user
Making everything thread-safe, so you can run on any scaling framework you like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Smart Geocubes #880

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Smart Geocubes #880

Uh oh!

relativityhd Mar 29, 2025

What is the purpose of this package?

Replies: 0 comments

relativityhd
Mar 29, 2025