memory leak related with MultiprocessFileCache?

I observed the possible memory leak (~1GB/h) related with MultiprocessFileCache during training.

I defined the dataset class with cache as tutorial.

```
class CachedDataset:
    def __init__(
        self,
        common_config
    ) -> None:
        self._reader_dict = {
            dataset.name: File(dataset.name, mode="a") for dataset in common_config.datasets
        }
        self._cache = MultiprocessFileCache(len(self), do_pickle=True)

    def _load_from_disk(self, i: int) -> TrainData:
        return ...

    def __getitem__(self, i: int) -> Any:
        return self._cache.get_and_cache(i, self._load_from_disk)
```

and used this CachedDataset as dataset below for training.

```
train_set, val_set = torch.utils.data.random_split(
    dataset,
    [int(len(dataset) * train_set_ratio), len(dataset) - int(len(dataset) * train_set_ratio)],
)

train_loader = DataLoader(
    train_set, batch_size=train_args.batch_size, shuffle=True, collate_fn=collate_fn
)
```

This leakage was solved when I stopped using MultiprocessFileCache.

It might be due to the wrong usage of MultiprocessFileCache, but do you have any idea about this leakage?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory leak related with MultiprocessFileCache? #339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory leak related with MultiprocessFileCache? #339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions