Skip to content

tests: use adaptive number of threads #12236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions tests/test-backend-ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,18 @@
#include <thread>
#include <vector>

static size_t get_n_threads(const int64_t ne) {
const size_t max_threads_hw = std::max(std::thread::hardware_concurrency()/2, (unsigned int)1);
const size_t max_threads_ne = (ne + 1024 - 1) / 1024;
return std::min(max_threads_hw, max_threads_ne);
}

static void init_tensor_uniform(ggml_tensor * tensor, float min = -1.0f, float max = 1.0f) {
size_t nels = ggml_nelements(tensor);
std::vector<float> data(nels);
{
// parallel initialization
static const size_t n_threads = std::thread::hardware_concurrency();
static const size_t n_threads = get_n_threads(ggml_nelements(tensor));
Copy link
Member

@slaren slaren Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it makes sense to initialize this from the current tensor, since this is a one-time RNG initialization. It should be done with the maximum number of threads that may be used.

To clarify: this variable is also used later to determine the number of threads to use to initialize the tensor, but it is initialized here as a static and used to initialize the static RNGs. It is necessary to initialize as many RNGs as threads may be used, but you can use a different number of threads for each tensor.

// static RNG initialization (revisit if n_threads stops being constant)
static std::vector<std::default_random_engine> generators = []() {
std::random_device rd;
Expand Down Expand Up @@ -100,7 +106,7 @@ static void init_tensor_uniform(ggml_tensor * tensor, float min = -1.0f, float m
};

const size_t min_blocks_per_thread = 1;
const size_t n_threads = std::min<size_t>(std::thread::hardware_concurrency()/2,
const size_t n_threads = std::min<size_t>(get_n_threads(ggml_nelements(tensor)),
std::max<size_t>(1, n_blocks / min_blocks_per_thread));
std::vector<std::future<void>> tasks;
tasks.reserve(n_threads);
Expand Down
Loading