-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Thank you for your great research.
I was trying to reproduce your results for the SNN model using DVS dataset. After running for about 70 to 85 epochs the code overwhelms the CPU memory(200GB).
After using a memory profiler Inoticed that test dataloader is the bottleneck. Here is the profiler analysis for the first epoch and second for the test and training functions:
From the look of it looks like dataloader is not getting freed after each batch which I don't understand.
looking at the dataloader.py in dvs directory:
trainset = spikedata.DVSGesture(data_dir, train=True, num_steps=100, dt=3000, ds=4) testset = spikedata.DVSGesture(data_dir, train=False, num_steps=600, dt=3000, ds=4)
testset is running for more steps which is understandable. Here is the trace in the log:
Traceback (most recent call last): File "run.py", line 60, in <module> evaluate(Net, config, load_data, train, test, optim_func) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/evaluate.py", line 73, in evaluate test_accuracy = test(config, net, testloader, device) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/dvs/test.py", line 10, in test for data in testloader: File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in <listcomp> return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 19660800 bytes. Error code 12 (Cannot allocate memory)
Thank you for your help