-
Notifications
You must be signed in to change notification settings - Fork 377
intelrdt: add support for EnableMonitoring #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideIntroduce optional monitoring group management in Intel RDT by extending the resctrl API, integrating an enable_monitoring flag into libcrun workflows, and removing legacy status fields. File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
c83b1ed
to
456f6e6
Compare
tested on a machine with Intel RDT support and it works as expected. Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Refactor repeated enable_monitoring checks into a helper to reduce nesting and improve readability in apply/destroy logic.
- Removing the intelrdt field from the status JSON may break backward compatibility—consider versioning or migrating existing state.
- Current monitoring cleanup swallows errors; consider logging or aggregating errors instead of silently discarding them.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Refactor repeated enable_monitoring checks into a helper to reduce nesting and improve readability in apply/destroy logic.
- Removing the intelrdt field from the status JSON may break backward compatibility—consider versioning or migrating existing state.
- Current monitoring cleanup swallows errors; consider logging or aggregating errors instead of silently discarding them.
## Individual Comments
### Comment 1
<location> `src/libcrun/intelrdt.c:317` </location>
<code_context>
+ if (UNLIKELY (ret < 0))
+ return ret;
+
+ if (monitoring_name)
+ {
+ ret = append_paths (&monitoring_path, err, INTEL_RDT_MOUNT_POINT, name, MON_GROUPS, monitoring_name, TASKS_FILE, NULL);
+ if (UNLIKELY (ret < 0))
+ return ret;
+
+ ret = write_file (monitoring_path, pid_str, len, err);
+ if (UNLIKELY (ret < 0))
+ return ret;
</code_context>
<issue_to_address>
No error handling for partial success in moving task to monitoring group.
If writing to the monitoring group fails after success with the main group, the process state may be inconsistent. Please consider implementing rollback or enhanced error reporting for this scenario.
</issue_to_address>
### Comment 2
<location> `src/libcrun/linux.c:6045` </location>
<code_context>
return 0;
fail:
+ if (has_monitoring)
+ {
+ libcrun_error_t tmp_err = NULL;
+ int tmp_ret;
+
+ tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
+ if (tmp_ret < 0)
+ crun_error_release (&tmp_err);
</code_context>
<issue_to_address>
Error from resctl_destroy_monitoring_group is ignored.
If monitoring group cleanup fails, resources may not be released. Please log the error or propagate it to ensure failures are visible.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
if (tmp_ret < 0)
crun_error_release (&tmp_err);
=======
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
if (tmp_ret < 0)
{
crun_error_log (&tmp_err);
if (err && *err == NULL)
*err = tmp_err;
else
crun_error_release (&tmp_err);
}
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if (monitoring_name) | ||
{ | ||
ret = append_paths (&monitoring_path, err, INTEL_RDT_MOUNT_POINT, name, MON_GROUPS, monitoring_name, TASKS_FILE, NULL); | ||
if (UNLIKELY (ret < 0)) | ||
return ret; | ||
|
||
ret = write_file (monitoring_path, pid_str, len, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: No error handling for partial success in moving task to monitoring group.
If writing to the monitoring group fails after success with the main group, the process state may be inconsistent. Please consider implementing rollback or enhanced error reporting for this scenario.
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err); | ||
if (tmp_ret < 0) | ||
crun_error_release (&tmp_err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Error from resctl_destroy_monitoring_group is ignored.
If monitoring group cleanup fails, resources may not be released. Please log the error or propagate it to ensure failures are visible.
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err); | |
if (tmp_ret < 0) | |
crun_error_release (&tmp_err); | |
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err); | |
if (tmp_ret < 0) | |
{ | |
crun_error_log (&tmp_err); | |
if (err && *err == NULL) | |
*err = tmp_err; | |
else | |
crun_error_release (&tmp_err); | |
} |
6cd4a53
to
0b85287
Compare
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
0b85287
to
dfdcf77
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
completely untested, need access to the hardware, leaving as a Draft until then.
Summary by Sourcery
Enable tracking of monitoring tasks under Intel RDT by adding mon_groups support and wiring enable_monitoring through resctrl operations and container lifecycle
New Features:
Enhancements: