-
Notifications
You must be signed in to change notification settings - Fork 7.7k
kernel: cbs: adding support for the Constant Bandwidth Server (CBS) #83601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick notes after a quick and incomplete read. The big worry here is performance: this is adding two function calls and a bunch of tests to the context switch path for one somewhat obscure feature, and that's sort of a non-starter. At the very least, this should emit code that does an absolute minimum of work (e.g. just check a boolean on the thread struct) before returning into the main switch path. Calling out into a separate stack frame is spending a ton of cycles we can't spare.
Does the problem really require tracking budget across cooperative switches anyway? Can't you do this with a timer a-la timeslicing? Seems like this is taking a ton of overhead for what would normally seem like a fairly coarse decision. Thread deadlines are usually on the order of a tick, not a cycle.
@@ -630,6 +630,9 @@ char *z_setup_new_thread(struct k_thread *new_thread, | |||
#ifdef CONFIG_SCHED_DEADLINE | |||
new_thread->base.prio_deadline = 0; | |||
#endif /* CONFIG_SCHED_DEADLINE */ | |||
#ifdef CONFIG_CBS | |||
new_thread->cbs = NULL; | |||
#endif /* CONFIG_CBS */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really the fault of this PR, but we have to clean up the mess this function has become. Originally it had to initialize all fields manually because it was felt that doing a memset() was expensive, but at this point I doubt that's true, and the maintenance burden for sure isn't worth the benefit (it's been a while, but historically we had a ton of bugs with incompletely initialized data here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, however I just went with the flow here.
include/zephyr/sched_server/cbs.h
Outdated
} cbs_arg_t; | ||
|
||
typedef struct { | ||
struct k_timer *timer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k_timer is intended to be an intrusive data structure, just put it here by value. IIRC message queues are not, though.
kernel/sched.c
Outdated
@@ -835,8 +839,16 @@ struct k_thread *z_swap_next_thread(void) | |||
/* Just a wrapper around arch_current_thread_set(xxx) with tracing */ | |||
static inline void set_current(struct k_thread *new_thread) | |||
{ | |||
#ifdef CONFIG_CBS | |||
cbs_thread_switched_out(_current_cpu->current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API for this is arch_current_cpu()
now; some platforms have optimized variants that don't require fetching the CPU pointer (which requires running in a non-migratable context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this information? I'm applying the fixes but the latest main
branch seems to have it no more. It looks like _current
is being used instead.
kernel/thread.c
Outdated
@@ -942,6 +945,7 @@ void z_thread_mark_switched_out(void) | |||
} | |||
#endif /* CONFIG_INSTRUMENT_THREAD_SWITCHING */ | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needless noop whitespace change nitpick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry 🫠 removed.
subsys/sched_server/cbs/cbs.c
Outdated
cbs_arg_t *args = (cbs_arg_t *)cbs_args; | ||
|
||
cbs->thread = k_current_get(); | ||
cbs->thread->cbs = cbs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a lot of value to having the thread<->cbs pointer cycle? Seems like everywhere you are dealing with it is in the context of the thread being instrumented, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, cbs->thread
is needed for when if the CBS thread demands a deadline updated (when a job is pushed to it, see function k_cbs_push_job
and cbs_check_current_bandwidth
it calls). I can't think of a better way of fetching the CBS thread if not like this.
The thread->cbs
is needed for the context switches; the cbs timer should start when the CBS thread switches in and stop when it switches out. And again, I can't think of a better way of fetching the CBS thread if not like this.
I'm open to suggestions if you wish.
subsys/sched_server/cbs/cbs.c
Outdated
} | ||
cbs_cycle_t now = cbs_get_now(); | ||
cbs_t *cbs = (cbs_t *)k_timer_user_data_get(timer); | ||
k_sched_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sched_lock facility isn't SMP-safe. You almost certainly want a spinlock here unless you are taking care to prevent contention from other CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok-dokey, I have changed to K_SPINLOCK
.
subsys/sched_server/cbs/cbs.c
Outdated
return; | ||
} | ||
cbs_cycle_t now = cbs_get_now(); | ||
cbs_t *cbs = (cbs_t *)k_timer_user_data_get(timer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW if you make the k_timer a cbs field, you can just use a CONTAINER_OF() here to get the timer pointer, which is a single addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion! done.
include/zephyr/sched_server/cbs.h
Outdated
typedef struct { | ||
cbs_callback_t function; | ||
void *arg; | ||
} cbs_job_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No typedef struct names in zephyr please, absent special circumstances like by-value opaque types. Should just be "struct cbs_job"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've removed typedef structs.
subsys/sched_server/cbs/cbs.c
Outdated
|
||
void cbs_thread_switched_in(struct k_thread *thread) | ||
{ | ||
if (!thread || !thread->cbs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check thread for null here? That would be a bug in the context switch primitives you hooked. This is extremely performance-sensitive code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's just me used to include defensive code by default. I have removed that.
subsys/sched_server/cbs/cbs.c
Outdated
return; | ||
} | ||
cbs_t *cbs = (cbs_t *)thread->cbs; | ||
if (cbs->is_initialized && !cbs->is_idle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise no defensive coding in the context switch path please. If you want to be sure cbs is initialized then the architecture needs to be affirmatively sure it is before that pointer is set on the thread. It's OK to make this an assert if you're worried about design bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, this code wasn't needed indeed. I have removed it.
3beedf7
to
6ce9413
Compare
quick note here, you should not be introducing a core scheduling/kernel feature as a new subsystem, this belongs into the already existing kernel "subsystem" |
64eec3b
to
6e1a25e
Compare
9bb4d30
to
b9b6a25
Compare
This commit introduces kernel support for the Constant Bandwidth Server (CBS), an extension of the Earliest Deadline First (EDF) scheduler that allows tasks to be executed virtually isolated from each other, in a way that if a task executes for longer than expected it doesn’t interfere on the execution of the others. In other words, the CBS prevents that a task misbehavior causes other tasks to miss their own deadlines. In a nutshell, the CBS is a work-conserving wrapper for the tasks that automatically recalculates their deadlines when they exceed their configured timing constraints, thus acting as a fail-safe for EDF overruns. Signed-off-by: Alexander pinheiro paschoaletto <1222703@isep.ipp.pt>
This commit simply adds the documentation for the CBS. Signed-off-by: Alexander pinheiro paschoaletto <1222703@isep.ipp.pt>
…CBS) This commit adds two sample codes for the the CBS, one for testing a very simple logging application and other for effectively implementing it alongside other hard real-time threads. Signed-off-by: Alexander pinheiro paschoaletto <1222703@isep.ipp.pt>
Thank you for your brief review, I have corrected most of the requests. FWIW, the functions called in the context switch are there to just start/stop the CBS timer if the thread belongs to a CBS. There is nothing else. By the way, the absolute worst measured overheads in the XIAO ESP32C3 (RISC-V, single-core, 160MHz) were of 128 clock cycles for switching in and 139 clock cycles for switching out. These translate to a period in the magnitude of nanoseconds. The averages were (obviously) lower: 110.92 and 90.94 clock cycles, respectively. Does that looks inadmissible?
Yes it does. If thread X preempts CBS Y, the CBS needs a way of re-starting its budget timer when thread X finishes and it is allowed to resume execution. I couldn't make that work without including these functions in k_swap. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
I'd like the label removed, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces kernel support for the Constant Bandwidth Server (CBS) to improve scheduling guarantees and isolate task execution time effects. Key changes include initializing a new CBS field in thread structures, adding CBS logging and internal implementations, and updating scheduler and swap routines to integrate CBS functionality.
Reviewed Changes
Copilot reviewed 24 out of 35 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
kernel/thread.c | Initializes the new CBS field in thread structures. |
kernel/sched_server/cbs_log.c | Adds logging support for CBS events. |
kernel/sched_server/cbs.c | Implements CBS functions including budget management and job handling. |
kernel/sched.c & kernel/include/kswap.h | Updates scheduler routines to start/stop CBS timers during context switches. |
Header files in kernel/include/zephyr/sched_server/ | Provides public and internal API definitions for CBS. |
Files not reviewed (11)
- doc/introduction/index.rst: Language not supported
- doc/kernel/services/index.rst: Language not supported
- doc/kernel/services/sched_server/cbs.rst: Language not supported
- doc/zephyr.doxyfile.in: Language not supported
- kernel/CMakeLists.txt: Language not supported
- kernel/Kconfig: Language not supported
- kernel/Kconfig.sched_server: Language not supported
- samples/cbs/cbs.rst: Language not supported
- samples/cbs/edf/CMakeLists.txt: Language not supported
- samples/cbs/edf/README.rst: Language not supported
- samples/cbs/edf/prj.conf: Language not supported
Comments suppressed due to low confidence (1)
kernel/sched_server/cbs_log.c:21
- Typo in log message: 'SWT_AY' should likely be 'SWT_AWAY' for consistency with similar log messages.
LOG_INF("%s SWT_AY %u", (char *)cbs->name, (uint32_t)cbs->budget.current);
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
please
I won't lie, this is a bit frustrating. But yes, I'd like to have the stale label removed, please. I intend to include a little bit more functionality to this code in the near future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick fly-by. I can review the actual scheduler code soon.
@@ -372,6 +372,11 @@ struct k_thread { | |||
_wait_q_t halt_queue; | |||
#endif /* CONFIG_SMP */ | |||
|
|||
#ifdef CONFIG_CBS | |||
/** Constant Bandwidth Server (CBS) struct */ | |||
void *cbs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be preferable to forward declare a struct and specifically type this field rather than using a void pointer.
#ifndef ZEPHYR_CBS | ||
#define ZEPHYR_CBS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the full path of the function in header guards like in other header files? I realize it's a bit verbose, but it's conventional and unique.
typedef uint32_t cbs_cycle_t; | ||
#endif | ||
|
||
struct cbs_job { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefix all of these structs with k_
as well
*/ | ||
|
||
#ifdef CONFIG_TIMER_HAS_64BIT_CYCLE_COUNTER | ||
typedef uint64_t cbs_cycle_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised we don't already have a cycle counter type like this.
can you prefix this with a k_
please?
k_timeout_t period; | ||
}; | ||
|
||
struct k_cbs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a special thread structure for CBS?
The main reason I ask is because Zephyr already has some Kconfig options or defines for specifying maximum thread name length and whether or not there is a field for thread name.
I guess the other reason is to see if there is a convention for naming used by other schedulers for similar types.
The name of the struct leaves me wondering and could maybe be expanded a bit to be more descriptive
#endif | ||
}; | ||
|
||
extern void cbs_thread(void *job_queue, void *cbs_struct, void *unused); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should be prefixed with k_
.
Is it possible for the job queue use named struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do the prefix thing, sure. But I don't know if I understood the second request, though. Do you want type void *
to be changed to an explicit k_msgq *
struct? Aint't thread entry points supposed to be formatted as having all arguments as void *
?
Scheduling delay (in milliseconds) for the CBS thread. | ||
|
||
if CBS_LOG | ||
config CBS_THREAD_MAX_NAME_LEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This likely duplicates something that already exists in Zephyr, so I would suggest using the existing convention.
requires CONFIG_CBS to be included within the project. | ||
|
||
if CBS_LOG | ||
menu "CBS events to be logged" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The events being recorded here really seem like they would fit better with Zephyr's tracing subsystem. Can they use that mechanism instead?
Thank you so much for taking your time, cfriedt! I'm currently busy with other stuff at the moment, but I will get back to this eventually and address all your comments. |
This PR introduces kernel support for the Constant Bandwidth Server (CBS).
A CBS is an extension of the Earliest Deadline First (EDF) scheduler already implemented (although with some limitations) in Linux and RTEMS that allows tasks to be executed virtually isolated from each other (time-wise, not memory-wise), in a way that if a task executes for longer than expected it doesn't interfere on the execution of the others. In other words, the CBS prevents that a task
misbehavior causes other tasks to miss their own deadlines.
In a nutshell, the CBS is a work-conserving wrapper for the tasks that automatically recalculates their deadlines when they exceed their configured timing constraints. If a task takes longer than expected, it can be preempted by awaiting tasks instead of jeopardizing their deadlines.
You want to use the CBS when:
So, depending on what you need, the CBS can be seen as a safe way of serving tasks with no deadlines within a EDF taskset (like a workqueue, but better) or a fail-safe for existing EDF tasks that might overrun, and thus endanger other tasks's deadlines.
The most important implementation file to analyze is this one. The rest are docs, samples, and eventual/small kernel mods to enable the CBS to work accordingly.
For more details, feel free to read RFC #78000 and the docs wrote for this contribution. Or just ask, I'll be happy to provide an answer.