Currently we are using perf_counter for benchmarking. we want to change this to torch.cuda.Event https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964