[BUG] Load latent_gt_path in __init__ method of FFHQBlindDataset may cause memory allocate error.

When using `FFHQBlindDataset` with multi-processing DataLoader and set num_workers > 0, a RuntimeError related to mmap occurs. The error message is:


```
RuntimeError: unable to mmap XXX bytes from file </torch_XXXX>: Cannot allocate memory (12)
```

This issue was traced to the following line in [FFHQBlindDataset.\_\_init\_\_](https://github.com/sczhou/CodeFormer/blob/e878192ee253cfcc8f19e29d3307c181501f53ae/basicsr/data/ffhq_blind_dataset.py#L51):

```python
self.latent_gt_dict = torch.load(self.latent_gt_path)
```

In our case, `latent_gt_path` is a .pth file that is approximately 1.6GB in size.

When DataLoader uses multiple workers (num_workers > 0), PyTorch tries to share the Dataset instance with child processes. Since `self.latent_gt_dict` is a large object, PyTorch attempts to use shared memory (via mmap) to avoid copying the data between processes. If the object is too large or this happens repeatedly, it can easily lead to Shared memory exhaustion, File descriptor exhaustion or mmap-related ENOMEM errors, even when system RAM and /dev/shm space are sufficient.

Solution: move the `torch.load()` call from the `__init__` method to the `__getitem__` method, and load the relevant latent entry on demand per sample. For example:

```python
def __getitem__(self, index):
    if self.latent_gt_path is not None:
        self.load_latent_gt = True        
        if self.latent_gt_dict is None:    
            self.latent_gt_dict = torch.load(self.latent_gt_path)
    else:
        self.load_latent_gt = False 
...
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Load latent_gt_path in init method of FFHQBlindDataset may cause memory allocate error. #435

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Load latent_gt_path in __init__ method of FFHQBlindDataset may cause memory allocate error. #435

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[BUG] Load latent_gt_path in init method of FFHQBlindDataset may cause memory allocate error. #435