Inference speed on windows server 2022 which has Nvidia RTX A4000

i am doing mistral 7b openorca inference using llamacpp-python but its is taken lot of timing.How can i fix that

llama-cpp-python version is 0.2.11
Server Configuration:
1)Windows Server 2022 Standard
2) two Nvidia rtx A4000 gpu

![image](https://github.com/abetlen/llama-cpp-python/assets/89073843/43e63de2-9298-4805-b5be-069d36a8157c)