Skip to content

Memory leakage  #3

@Po0ria

Description

@Po0ria

Thank you for your great research.
I was trying to reproduce your results for the SNN model using DVS dataset. After running for about 70 to 85 epochs the code overwhelms the CPU memory(200GB).
After using a memory profiler Inoticed that test dataloader is the bottleneck. Here is the profiler analysis for the first epoch and second for the test and training functions:
image

image

From the look of it looks like dataloader is not getting freed after each batch which I don't understand.
looking at the dataloader.py in dvs directory:
trainset = spikedata.DVSGesture(data_dir, train=True, num_steps=100, dt=3000, ds=4) testset = spikedata.DVSGesture(data_dir, train=False, num_steps=600, dt=3000, ds=4)
testset is running for more steps which is understandable. Here is the trace in the log:
Traceback (most recent call last): File "run.py", line 60, in <module> evaluate(Net, config, load_data, train, test, optim_func) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/evaluate.py", line 73, in evaluate test_accuracy = test(config, net, testloader, device) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/dvs/test.py", line 10, in test for data in testloader: File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in <listcomp> return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 19660800 bytes. Error code 12 (Cannot allocate memory)
Thank you for your help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions