-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I observed the possible memory leak (~1GB/h) related with MultiprocessFileCache during training.
I defined the dataset class with cache as tutorial.
class CachedDataset:
def __init__(
self,
common_config
) -> None:
self._reader_dict = {
dataset.name: File(dataset.name, mode="a") for dataset in common_config.datasets
}
self._cache = MultiprocessFileCache(len(self), do_pickle=True)
def _load_from_disk(self, i: int) -> TrainData:
return ...
def __getitem__(self, i: int) -> Any:
return self._cache.get_and_cache(i, self._load_from_disk)
and used this CachedDataset as dataset below for training.
train_set, val_set = torch.utils.data.random_split(
dataset,
[int(len(dataset) * train_set_ratio), len(dataset) - int(len(dataset) * train_set_ratio)],
)
train_loader = DataLoader(
train_set, batch_size=train_args.batch_size, shuffle=True, collate_fn=collate_fn
)
This leakage was solved when I stopped using MultiprocessFileCache.
It might be due to the wrong usage of MultiprocessFileCache, but do you have any idea about this leakage?
Metadata
Metadata
Assignees
Labels
No labels