getting batches with BatchHandler is slower with max_workers>1 than max_workers=1

**Why this feature is necessary:**
Resolving this would enable us to use larger batches and possibly train with data that hasn't been moved to locally mounted SSDs. 

**A possible solution is:**
The maximum time to get N batches should be when max_workers=1. Adding more workers should enable us to parallelize this but right now there appears to be some blocking going on. 

**I have considered the following alternatives:**
The issue can likely be traced to the implementation in https://github.com/NREL/sup3r/blob/34760ba8a7f225a2df4469267712177d506ab6eb/sup3r/preprocessing/batch_queues/abstract.py#L234

I have experimented with a few different ways to use workers in this method and have not seen significant improvement. Attempts included using a `ThreadPoolExecutor` combined with an `as_completed` loop over futures and also without an `as_completed` loop while queueing futures themselves. 

**Additional context**
Reproduce this with the following:

```
bh = BatchHandler(..., sample_shape=(60, 60, 10), 
                               queue_cap=50, batch_size=16, 
                               n_batches=16, mode='lazy', 
                               max_workers=1)
start = time.time()
batches = list(bh)
print(time.time() - start)

bh = BatchHandler(..., sample_shape=(60, 60, 10), 
                               queue_cap=50, batch_size=16, 
                               n_batches=16, mode='lazy', 
                               max_workers=10)
start = time.time()
batches = list(bh)
print(time.time() - start)
```

I have profiled both of these code blocks with `cProfile` and see some strange differences in the timing and number of calls for the `sample_batch` function but don't know what to make of those differences. Without moving data to local SSD: With max_workers=1 I see a per call time of ~40 seconds and 16 calls With max_workers=10 I see a call time of ~400 seconds and 2 calls. Calls to `get_batch` go from ~20 seconds to ~60 seconds. 

profile for max_workers=1:

![max_workers1_profile](https://github.com/user-attachments/assets/91243cea-b72d-4bb5-9de1-792a259cdddf)

profile for max_workers=10:

![max_workers10](https://github.com/user-attachments/assets/54add8a1-8bcb-4593-8210-e2028b5b70dc)

**Urgency / Timeframe**
Not urgent. max_workers=1 works well currently with training data on local SSD.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

getting batches with BatchHandler is slower with max_workers>1 than max_workers=1 #241

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

getting batches with BatchHandler is slower with max_workers>1 than max_workers=1 #241

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions