Skip to content

Performance on CPUs with less than two cores #464

Discussion options

You must be logged in to vote

Good morning @giladgd.
I hope you are well.

The goal of my project is to use VMs using only CPUs (no GPU).
Yes, I tested reducing the number of threads (defined in getLlama() ) and if the number of threads is greater than the number of vCPUs, the inference processing time increases considerably. Another observation made is that, by default, the number of threads is set to 4, so when I tested it on a VM with 2vCPUs this caused this delay, since the ideal number should be 2.

2vCPUs / 2GB

2025-06-23T13:48:43.509Z Loading model: /var/projects/api-ai/models/hf_Qwen_Qwen3-0.6B.Q8_0.gguf
Number of threads used: 1
2025-06-23T13:48:46.160Z User: What is your name?
2025-06-23T13:48:57.658Z AI: I do…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@areumtecnologia
Comment options

Answer selected by areumtecnologia
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants