Replies: 3 comments 1 reply
-
# Upload content to block blob
with open(SOURCE_FILE, "rb") as data:
blob_client.upload_blob(data) See https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python for more. |
Beta Was this translation helpful? Give feedback.
-
Thank you. It's more saving the file on the dask server I'm struggling with. Seems when the file handle is closed the file is gone. Here's what I'm trying import os
from pathlib import Path
import azure.storage.blob
from dask.distributed import Lock
import rasterio
from osgeo import gdal
import rioxarray as rx
def mosaic_tiles(
prefix: str,
client: Client,
storage_account: str = os.environ["AZURE_STORAGE_ACCOUNT"],
container_name: str = "output",
credential: str = os.environ["AZURE_STORAGE_SAS_TOKEN"],
) -> None:
container_client = azure.storage.blob.ContainerClient(
f"https://{storage_account}.blob.core.windows.net",
container_name=container_name,
credential=credential,
)
blobs = [
f"/vsiaz/{container_name}/{blob.name}"
for blob in container_client.list_blobs()
if blob.name.startswith(prefix + '_')
]
with rasterio.open("data/aoi.tif") as t:
bounds = list(t.bounds)
local_prefix = Path(prefix).stem
# This file is lost whether I use vsimem or not
vrt_file = f"/vsimem/data/{local_prefix}.vrt"
gdal.BuildVRT(vrt_file, blobs, outputBounds=bounds)
mosaic_file = f"data/{local_prefix}.tif"
# unable to open vrt_file here
rx.open_rasterio(vrt_file, chunks=True).rio.to_raster(
mosaic_file, compress="LZW", predictor=2, lock=Lock("rio", client=client)
)
blob_client = container_client.get_blob_client(f"{prefix}/test.tif")
with open(mosaic_file, "rb") as data:
blob_client.upload_blob(data) |
Beta Was this translation helpful? Give feedback.
-
yeah the client is just needed for the lock. And sorry, I was conflating some of the local vs worker data sources in the earlier example. But I think what you're suggesting is close to what I had been doing? e.g. def mosaic_tiles(
prefix: str,
client: Client,
storage_account: str = os.environ["AZURE_STORAGE_ACCOUNT"],
container_name: str = "output",
credential: str = os.environ["AZURE_STORAGE_SAS_TOKEN"],
) -> None:
container_client = azure.storage.blob.ContainerClient(
f"https://{storage_account}.blob.core.windows.net",
container_name=container_name,
credential=credential,
)
blobs = [
f"/vsiaz/{container_name}/{blob.name}"
for blob in container_client.list_blobs()
if blob.name.startswith(prefix)
]
with rasterio.open("data/aoi.tif") as t:
bounds = list(t.bounds)
local_prefix = Path(prefix).stem
vrt_file = f"data/{local_prefix}.vrt"
gdal.BuildVRT(vrt_file, blobs, outputBounds=bounds)
mosaic_file = f"data/{local_prefix}.tif"
rx.open_rasterio(vrt_file, chunks=True).rio.to_raster(
mosaic_file, compress="LZW", predictor=2, lock=Lock("rio", client=client)
) And then running "locally" rather than through the gateway. with Client() as local_client:
mosaic_tiles(prefix=prefix, client=local_client) This saves the mosaic on the hub instance. (I'm not just using gdalwarp, for instance, because this is a lot faster). This is not without issues though (limits of local storage, using the local cluster, etc). However, I guess there's not really an alternative. GEE also has limits on mosaic output size (above a certain size it tiles the output), so maybe this is a common issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The documentation offers the following method to save data from the dask gateway to blob storage.
Is there an alternative method where data are large do not fit into a memory buffer? I need to do some large mosaics and I've taken to processing them on the local dask instance so it can save in chunks.
Beta Was this translation helpful? Give feedback.
All reactions