Skip to content

Commit 3f60489

Browse files
cfriedtkartben
authored andcommitted
tests: benchmarks: move pthread_pressure to benchmarks/posix
The pthread_pressure test was not a typical test per se. It was a benchmark in search of the proper home. Let's move it to the correct place in the Zephyr tree, add a doc, and provide some reporting. Currently, k_threads out-perform pthreads by almost a factor of 2. The theoretical maximum performance of pthreads would be at parity of k_threads, since pthreads are a wrapper around kernel threads. It would be great to reduce the gap. Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com>
1 parent 7497b8b commit 3f60489

File tree

10 files changed

+327
-268
lines changed

10 files changed

+327
-268
lines changed

tests/posix/pthread_pressure/Kconfig renamed to tests/benchmarks/posix/threads/Kconfig

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,11 @@
11
# Copyright (c) 2023, Meta
2+
# Copyright (c) 2024, Tenstorrent AI ULC
23
#
34
# SPDX-License-Identifier: Apache-2.0
45

5-
source "Kconfig.zephyr"
6+
mainmenu "POSIX Threads Benchmark"
67

7-
config TEST_NUM_CPUS
8-
int "Number of CPUs to use in parallel"
9-
range 1 MP_MAX_NUM_CPUS
10-
default MP_MAX_NUM_CPUS
11-
help
12-
The number of parallel threads to run during the test.
8+
source "Kconfig.zephyr"
139

1410
config TEST_DURATION_S
1511
int "Number of seconds to run the test"
@@ -44,8 +40,7 @@ config TEST_PTHREADS
4440
help
4541
Run tests for pthreads
4642

47-
config TEST_EXTRA_ASSERTIONS
48-
bool "Add extra assertions into the hot path"
49-
default y
43+
config TEST_PERIODIC_STATS
44+
bool "Print statistics periodically"
5045
help
51-
This can be disabled for benchmarking.
46+
Print statistics periodically throughout the benchmark.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
POSIX Thread Benchmark
2+
######################
3+
4+
Overview
5+
********
6+
7+
This benchmark creates and joins as many threads as possible within a configurable time window.
8+
It provides a rough comparison Zephyr's POSIX threads (pthreads) with Zephyr's kernel threads
9+
(k_threads) API, highlighting the overhead of the POSIX. Ideally, this overhead would shrink over
10+
time.
11+
12+
Sample output of the benchmark::
13+
14+
*** Booting Zephyr OS build v4.0.0-1410-gfca33facee37 ***
15+
ASSERT: y
16+
BOARD: qemu_riscv64
17+
NUM_CPUS: 1
18+
TEST_DELAY_US: 0
19+
TEST_DURATION_S: 5
20+
SMP: n
21+
API, Thread ID, time(s), threads, cores, rate (threads/s/core)
22+
k_thread, ALL, 5, 47663, 1, 9532
23+
pthread, ALL, 5, 28180, 1, 5636
24+
PROJECT EXECUTION SUCCESSFUL
25+
26+
To observe periodic statistics on a per-thread basis in addition to the summary of statistics
27+
printed at the end of execution, use CONFIG_TEST_PERIODIC_STATS.
28+
29+
Several other options can be tuned on an as-needed basis:
30+
31+
- CONFIG_MP_MAX_NUM_CPUS - Number of CPUs to use in parallel.
32+
- CONFIG_TEST_DURATION_S - Number of seconds to run the test.
33+
- CONFIG_TEST_DELAY_US - Microseconds to delay between pthread join and create.
34+
- CONFIG_TEST_KTHREADS - Exercise k_threads in the test app.
35+
- CONFIG_TEST_PTHREADS - Exercise pthreads in the test app.
36+
- CONFIG_TEST_STACK_SIZE - Size of each thread stack in this test.
37+
38+
The following table summarizes the purposes of the different extra
39+
configuration files that are available to be used with this benchmark.
40+
A tester may mix and match them allowing them different scenarios to
41+
be easily compared the default.
42+
43+
+-----------------------------+----------------------------------------+
44+
| prj-assert.conf | Enable assertions for API verification |
45+
+-----------------------------+----------------------------------------+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
CONFIG_FORCE_NO_ASSERT=n
2+
CONFIG_ASSERT=y
3+
4+
# May be enabled for GitHub CI to reduce host scheduling noise while running
5+
# several concurrent Qemu processes each under stressful SMP load.
6+
# CONFIG_PTHREAD_CREATE_BARRIER=y
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
CONFIG_TEST=y
2+
CONFIG_FORCE_NO_ASSERT=y
3+
4+
CONFIG_POSIX_API=y
5+
CONFIG_POSIX_AEP_CHOICE_BASE=y
6+
CONFIG_POSIX_PRIORITY_SCHEDULING=y
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
/*
2+
* Copyright (c) 2023, Meta
3+
* Copyright (c) 2024, Tenstorrent AI ULC
4+
*
5+
* SPDX-License-Identifier: Apache-2.0
6+
*/
7+
8+
#include <pthread.h>
9+
#include <stdio.h>
10+
11+
#include <zephyr/sys/__assert.h>
12+
#include <zephyr/sys/util.h>
13+
14+
#define STACK_SIZE K_THREAD_STACK_LEN(CONFIG_TEST_STACK_SIZE)
15+
16+
/* update interval for printing stats */
17+
#if CONFIG_TEST_DURATION_S >= 60
18+
#define UPDATE_INTERVAL_S 10
19+
#elif CONFIG_TEST_DURATION_S >= 30
20+
#define UPDATE_INTERVAL_S 5
21+
#else
22+
#define UPDATE_INTERVAL_S 1
23+
#endif
24+
25+
/* 32 threads is mainly a limitation of find_lsb_set() */
26+
#define NUM_CPUS MIN(32, MIN(CONFIG_MP_MAX_NUM_CPUS, CONFIG_POSIX_THREAD_THREADS_MAX))
27+
28+
typedef int (*create_fn)(int i);
29+
typedef int (*join_fn)(int i);
30+
31+
static void before(void);
32+
33+
/* bitmask of available threads */
34+
static bool alive[NUM_CPUS];
35+
36+
/* array of thread stacks */
37+
static K_THREAD_STACK_ARRAY_DEFINE(thread_stacks, NUM_CPUS, STACK_SIZE);
38+
39+
static struct k_thread k_threads[NUM_CPUS];
40+
static uint64_t counters[NUM_CPUS];
41+
static uint64_t prev_counters[NUM_CPUS];
42+
43+
static void print_stats(const char *tag, uint64_t now, uint64_t end)
44+
{
45+
for (int i = 0; i < NUM_CPUS; ++i) {
46+
printf("%s, %d, %u, %llu, 1, %llu\n", tag, i, UPDATE_INTERVAL_S, counters[i],
47+
(counters[i] - prev_counters[i]) / UPDATE_INTERVAL_S);
48+
prev_counters[i] = counters[i];
49+
}
50+
}
51+
52+
static void print_group_stats(const char *tag)
53+
{
54+
uint64_t count = 0;
55+
56+
for (int i = 0; i < NUM_CPUS; ++i) {
57+
count += counters[i];
58+
}
59+
60+
printf("%s, ALL, %u, %llu, %u, %llu\n", tag, CONFIG_TEST_DURATION_S, count, NUM_CPUS,
61+
count / CONFIG_TEST_DURATION_S / NUM_CPUS);
62+
}
63+
64+
static void create_join_common(const char *tag, create_fn create, join_fn join)
65+
{
66+
int i;
67+
int __maybe_unused ret;
68+
uint64_t now_ms = k_uptime_get();
69+
const uint64_t end_ms = now_ms + MSEC_PER_SEC * CONFIG_TEST_DURATION_S;
70+
uint64_t update_ms = now_ms + MSEC_PER_SEC * UPDATE_INTERVAL_S;
71+
72+
for (i = 0; i < NUM_CPUS; ++i) {
73+
/* spawn thread i */
74+
prev_counters[i] = 0;
75+
ret = create(i);
76+
__ASSERT(ret == 0, "%s_create(%d)[%zu] failed: %d", tag, i, counters[i], ret);
77+
}
78+
79+
do {
80+
if (!IS_ENABLED(CONFIG_SMP)) {
81+
/* allow the test thread to be swapped-out */
82+
k_yield();
83+
}
84+
85+
for (i = 0; i < NUM_CPUS; ++i) {
86+
if (alive[i]) {
87+
ret = join(i);
88+
__ASSERT(ret, "%s_join(%d)[%zu] failed: %d", tag, i, counters[i],
89+
ret);
90+
alive[i] = false;
91+
92+
/* update counter i after each (create,join) pair */
93+
++counters[i];
94+
95+
if (IS_ENABLED(CONFIG_TEST_DELAY_US)) {
96+
/* success with 0 delay means we are ~raceless */
97+
k_busy_wait(CONFIG_TEST_DELAY_US);
98+
}
99+
100+
/* re-spawn thread i */
101+
ret = create(i);
102+
__ASSERT(ret == 0, "%s_create(%d)[%zu] failed: %d", tag, i,
103+
counters[i], ret);
104+
}
105+
}
106+
107+
/* are we there yet? */
108+
now_ms = k_uptime_get();
109+
110+
/* dump some stats periodically */
111+
if (now_ms > update_ms) {
112+
update_ms += MSEC_PER_SEC * UPDATE_INTERVAL_S;
113+
114+
/* at this point, we should have seen many context switches */
115+
for (i = 0; IS_ENABLED(CONFIG_ASSERT) && i < NUM_CPUS; ++i) {
116+
__ASSERT(counters[i] > 0, "%s %d was never scheduled", tag, i);
117+
}
118+
119+
if (IS_ENABLED(CONFIG_TEST_PERIODIC_STATS)) {
120+
print_stats(tag, now_ms, end_ms);
121+
}
122+
}
123+
Z_SPIN_DELAY(100);
124+
} while (end_ms > now_ms);
125+
126+
print_group_stats(tag);
127+
}
128+
129+
/*
130+
* Wrappers for k_threads
131+
*/
132+
133+
static void k_thread_fun(void *arg1, void *arg2, void *arg3)
134+
{
135+
int i = POINTER_TO_INT(arg1);
136+
137+
alive[i] = true;
138+
}
139+
140+
static int k_thread_create_wrapper(int i)
141+
{
142+
k_thread_create(&k_threads[i], thread_stacks[i], STACK_SIZE, k_thread_fun,
143+
INT_TO_POINTER(i), NULL, NULL, K_HIGHEST_APPLICATION_THREAD_PRIO, 0,
144+
K_NO_WAIT);
145+
146+
return 0;
147+
}
148+
149+
static int k_thread_join_wrapper(int i)
150+
{
151+
return k_thread_join(&k_threads[i], K_FOREVER);
152+
}
153+
154+
static void create_join_kthread(void)
155+
{
156+
if (IS_ENABLED(CONFIG_TEST_KTHREADS)) {
157+
before();
158+
create_join_common("k_thread", k_thread_create_wrapper, k_thread_join_wrapper);
159+
}
160+
}
161+
162+
/*
163+
* Wrappers for pthreads
164+
*/
165+
166+
static pthread_t pthreads[NUM_CPUS];
167+
static pthread_attr_t pthread_attrs[NUM_CPUS];
168+
169+
static void *pthread_fun(void *arg)
170+
{
171+
k_thread_fun(arg, NULL, NULL);
172+
return NULL;
173+
}
174+
175+
static int pthread_create_wrapper(int i)
176+
{
177+
return pthread_create(&pthreads[i], &pthread_attrs[i], pthread_fun, INT_TO_POINTER(i));
178+
}
179+
180+
static int pthread_join_wrapper(int i)
181+
{
182+
return pthread_join(pthreads[i], NULL);
183+
}
184+
185+
static void create_join_pthread(void)
186+
{
187+
if (IS_ENABLED(CONFIG_TEST_PTHREADS)) {
188+
before();
189+
create_join_common("pthread", pthread_create_wrapper, pthread_join_wrapper);
190+
}
191+
}
192+
193+
static void setup(void)
194+
{
195+
printf("ASSERT: %c\n", IS_ENABLED(CONFIG_ASSERT) ? 'y' : 'n');
196+
printf("BOARD: %s\n", CONFIG_BOARD);
197+
printf("NUM_CPUS: %u\n", NUM_CPUS);
198+
printf("TEST_DELAY_US: %u\n", CONFIG_TEST_DELAY_US);
199+
printf("TEST_DURATION_S: %u\n", CONFIG_TEST_DURATION_S);
200+
printf("SMP: %c\n", IS_ENABLED(CONFIG_SMP) ? 'y' : 'n');
201+
202+
printf("API, Thread ID, time(s), threads, cores, rate (threads/s/core)\n");
203+
204+
if (IS_ENABLED(CONFIG_TEST_PTHREADS)) {
205+
int __maybe_unused ret;
206+
const struct sched_param param = {
207+
.sched_priority = sched_get_priority_max(SCHED_FIFO),
208+
};
209+
210+
/* setup pthread stacks */
211+
for (int i = 0; i < NUM_CPUS; ++i) {
212+
ret = pthread_attr_init(&pthread_attrs[i]);
213+
__ASSERT(ret == 0, "pthread_attr_init[%d] failed: %d", i, ret);
214+
215+
ret = pthread_attr_setstack(&pthread_attrs[i], thread_stacks[i],
216+
STACK_SIZE);
217+
__ASSERT(ret == 0, "pthread_attr_setstack[%d] failed: %d", i, ret);
218+
219+
ret = pthread_attr_setschedpolicy(&pthread_attrs[i], SCHED_FIFO);
220+
__ASSERT(ret == 0, "pthread_attr_setschedpolicy[%d] failed: %d", i, ret);
221+
222+
ret = pthread_attr_setschedparam(&pthread_attrs[i], &param);
223+
__ASSERT(ret == 0, "pthread_attr_setschedparam[%d] failed: %d", i, ret);
224+
}
225+
}
226+
}
227+
228+
static void before(void)
229+
{
230+
for (int i = 0; i < NUM_CPUS; ++i) {
231+
counters[i] = 0;
232+
}
233+
}
234+
235+
int main(void)
236+
{
237+
setup();
238+
239+
create_join_kthread();
240+
create_join_pthread();
241+
242+
printf("PROJECT EXECUTION SUCCESSFUL\n");
243+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
common:
2+
tags:
3+
- posix
4+
- benchmark
5+
min_ram: 64
6+
arch_exclude:
7+
- posix
8+
integration_platforms:
9+
- qemu_cortex_a53/qemu_cortex_a53/smp
10+
- qemu_riscv64/qemu_virt_riscv64/smp
11+
- qemu_riscv32/qemu_virt_riscv32/smp
12+
- qemu_x86_64
13+
harness: console
14+
harness_config:
15+
type: one_line
16+
record:
17+
regex: "(?P<api>.*), ALL, (?P<time>.*), (?P<threads>.*), (?P<cores>.*), (?P<rate>.*)"
18+
regex:
19+
- "PROJECT EXECUTION SUCCESSFUL"
20+
tests:
21+
benchmark.posix.threads: {}

tests/posix/pthread_pressure/prj.conf

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)