Replies: 2 comments
-
Posting the error I get when I run the large concat (at the moment only 1000 Datasets): 273 if self.status == "error":
274 typ, exc, tb = result
--> 275 raise exc.with_traceback(tb)
276 elif self.status == "cancelled":
277 raise result
KilledWorker: ('concat-some_guid', <WorkerState 'tcp://address', name: SGECluster-2-6, status: closed, memory: 0, processing: 1>) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hmmm.... concat can be slow for large numbers of Datasets because it loops over them multiple times. Have you checked to make sure it isn't adding unnecessary dimensions (usually you want Can you show what the chunksizes of a single dataset is, and that for the concatenated result? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am trying to find the best way to
concat
thousands (>10_000) ofDatasets
from a Dask point of view. Basically, I generate thousands of Datasets using a Dask Distributed Cluster running on HPC hardware and would like to run a largexr.concat
on my futures:At the moment,
xr.concat()
works for about 500 Datasets but anything larger than that it just gives up and kills my workers. I can show some outputs if you guys want but I am curious what would be the best way to merge all my Datasets in memory ?Many thanks in advance !
Beta Was this translation helpful? Give feedback.
All reactions