Skip to content

memory leak related with MultiprocessFileCache? #339

@dshintani-pfn

Description

@dshintani-pfn

I observed the possible memory leak (~1GB/h) related with MultiprocessFileCache during training.

I defined the dataset class with cache as tutorial.

class CachedDataset:
    def __init__(
        self,
        common_config
    ) -> None:
        self._reader_dict = {
            dataset.name: File(dataset.name, mode="a") for dataset in common_config.datasets
        }
        self._cache = MultiprocessFileCache(len(self), do_pickle=True)

    def _load_from_disk(self, i: int) -> TrainData:
        return ...

    def __getitem__(self, i: int) -> Any:
        return self._cache.get_and_cache(i, self._load_from_disk)

and used this CachedDataset as dataset below for training.

train_set, val_set = torch.utils.data.random_split(
    dataset,
    [int(len(dataset) * train_set_ratio), len(dataset) - int(len(dataset) * train_set_ratio)],
)

train_loader = DataLoader(
    train_set, batch_size=train_args.batch_size, shuffle=True, collate_fn=collate_fn
)

This leakage was solved when I stopped using MultiprocessFileCache.

It might be due to the wrong usage of MultiprocessFileCache, but do you have any idea about this leakage?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions