Skip to content

tests: use adaptive number of threads #12236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JohannesGaessler
Copy link
Collaborator

I noticed that on my machine with an Epyc 7742 test-backend-ops was disproportionally slow. The problem seems to be that for a lot of tests the tensors are so small that the dominating contribution of the runtime is just thread management. This PR scales the number of threads based on the number of threads which makes the tests quite a bit faster. With my Ryzen 5950X the runtime decreases from 63s to 26s, on my Epyc 7742 it decreases from 519s to 55s.

Right now the minimum number of threads is 1; should we increase this to 2 for ggml graph evaluations to ensure that multithreading is also tested?

@github-actions github-actions bot added the testing Everything test related label Mar 6, 2025
@jeffbolznv
Copy link
Collaborator

I did not see a perf increase from this change, maybe a slight slowdown (41s -> 43s). Tested on an i9-14900k using the Vulkan backend.

For me, about half the runtime in test-backend-ops is spent setting up the lookup tables for the IQ formats for the CPU backend. It would be nice to have these precomputed.

static void init_tensor_uniform(ggml_tensor * tensor, float min = -1.0f, float max = 1.0f) {
size_t nels = ggml_nelements(tensor);
std::vector<float> data(nels);
{
// parallel initialization
static const size_t n_threads = std::thread::hardware_concurrency();
static const size_t n_threads = get_n_threads(ggml_nelements(tensor));
Copy link
Member

@slaren slaren Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it makes sense to initialize this from the current tensor, since this is a one-time RNG initialization. It should be done with the maximum number of threads that may be used.

To clarify: this variable is also used later to determine the number of threads to use to initialize the tensor, but it is initialized here as a static and used to initialize the static RNGs. It is necessary to initialize as many RNGs as threads may be used, but you can use a different number of threads for each tensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants