Skip to content

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Sep 1, 2025

completely untested, need access to the hardware, leaving as a Draft until then.

Summary by Sourcery

Enable tracking of monitoring tasks under Intel RDT by adding mon_groups support and wiring enable_monitoring through resctrl operations and container lifecycle

New Features:

  • Add support for creating and destroying Intel RDT monitoring sub-groups when enable_monitoring is set

Enhancements:

  • Extend resctrl APIs to accept schemata arrays and optional monitoring group names
  • Integrate monitoring group lifecycle into libcrun_apply_intelrdt and libcrun_destroy_intelrdt flows
  • Remove deprecated intelrdt field from container status serialization and state management

Copy link

sourcery-ai bot commented Sep 1, 2025

Reviewer's Guide

Introduce optional monitoring group management in Intel RDT by extending the resctrl API, integrating an enable_monitoring flag into libcrun workflows, and removing legacy status fields.

File-Level Changes

Change Details Files
Enhance resctrl API with optional schemata and monitoring subgroup support
  • Update resctl_create signature to accept a schemata list
  • Add resctl_get_monitoring_path helper
  • Implement resctl_create_monitoring_group and resctl_destroy_monitoring_group
  • Extend resctl_move_task_to to append tasks to monitoring subgroups
src/libcrun/intelrdt.c
src/libcrun/intelrdt.h
Integrate enable_monitoring flag into Intel RDT apply/destroy flows in libcrun
  • Pass schemata and enable_monitoring to resctl_create in libcrun_apply_intelrdt
  • Create/destroy monitoring groups based on enable_monitoring and cleanup on failure
  • Propagate monitoring_name into resctl_move_task_to
  • Adapt libcrun_destroy_intelrdt to remove monitoring group before destroying the main group
src/libcrun/linux.c
src/libcrun/linux.h
Remove legacy intelrdt status handling and update container deletion
  • Eliminate intelrdt field from libcrun_container_status_t and JSON serialization
  • Adjust container_delete_internal to call updated libcrun_destroy_intelrdt signature
  • Remove container_has_intelrdt utility and related overloads
src/libcrun/container.c
src/libcrun/status.c
src/libcrun/status.h

Possibly linked issues

  • #0: PR adds functions to create, destroy, and move tasks in Intel RDT monitoring groups.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the support-rdt-monitoring branch 3 times, most recently from c83b1ed to 456f6e6 Compare September 2, 2025 14:07
@giuseppe
Copy link
Member Author

giuseppe commented Sep 2, 2025

tested on a machine with Intel RDT support and it works as expected. Ready for review

@giuseppe giuseppe marked this pull request as ready for review September 2, 2025 14:37
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Refactor repeated enable_monitoring checks into a helper to reduce nesting and improve readability in apply/destroy logic.
  • Removing the intelrdt field from the status JSON may break backward compatibility—consider versioning or migrating existing state.
  • Current monitoring cleanup swallows errors; consider logging or aggregating errors instead of silently discarding them.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Refactor repeated enable_monitoring checks into a helper to reduce nesting and improve readability in apply/destroy logic.
- Removing the intelrdt field from the status JSON may break backward compatibility—consider versioning or migrating existing state.
- Current monitoring cleanup swallows errors; consider logging or aggregating errors instead of silently discarding them.

## Individual Comments

### Comment 1
<location> `src/libcrun/intelrdt.c:317` </location>
<code_context>
+  if (UNLIKELY (ret < 0))
+    return ret;
+
+  if (monitoring_name)
+    {
+      ret = append_paths (&monitoring_path, err, INTEL_RDT_MOUNT_POINT, name, MON_GROUPS, monitoring_name, TASKS_FILE, NULL);
+      if (UNLIKELY (ret < 0))
+        return ret;
+
+      ret = write_file (monitoring_path, pid_str, len, err);
+      if (UNLIKELY (ret < 0))
+        return ret;
</code_context>

<issue_to_address>
No error handling for partial success in moving task to monitoring group.

If writing to the monitoring group fails after success with the main group, the process state may be inconsistent. Please consider implementing rollback or enhanced error reporting for this scenario.
</issue_to_address>

### Comment 2
<location> `src/libcrun/linux.c:6045` </location>
<code_context>
   return 0;

 fail:
+  if (has_monitoring)
+    {
+      libcrun_error_t tmp_err = NULL;
+      int tmp_ret;
+
+      tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
+      if (tmp_ret < 0)
+        crun_error_release (&tmp_err);
</code_context>

<issue_to_address>
Error from resctl_destroy_monitoring_group is ignored.

If monitoring group cleanup fails, resources may not be released. Please log the error or propagate it to ensure failures are visible.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
      tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
      if (tmp_ret < 0)
        crun_error_release (&tmp_err);
=======
      tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
      if (tmp_ret < 0)
        {
          crun_error_log (&tmp_err);
          if (err && *err == NULL)
            *err = tmp_err;
          else
            crun_error_release (&tmp_err);
        }
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +317 to +323
if (monitoring_name)
{
ret = append_paths (&monitoring_path, err, INTEL_RDT_MOUNT_POINT, name, MON_GROUPS, monitoring_name, TASKS_FILE, NULL);
if (UNLIKELY (ret < 0))
return ret;

ret = write_file (monitoring_path, pid_str, len, err);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: No error handling for partial success in moving task to monitoring group.

If writing to the monitoring group fails after success with the main group, the process state may be inconsistent. Please consider implementing rollback or enhanced error reporting for this scenario.

Comment on lines +6050 to +6052
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
if (tmp_ret < 0)
crun_error_release (&tmp_err);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Error from resctl_destroy_monitoring_group is ignored.

If monitoring group cleanup fails, resources may not be released. Please log the error or propagate it to ensure failures are visible.

Suggested change
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
if (tmp_ret < 0)
crun_error_release (&tmp_err);
tmp_ret = resctl_destroy_monitoring_group (name, ctr_name, &tmp_err);
if (tmp_ret < 0)
{
crun_error_log (&tmp_err);
if (err && *err == NULL)
*err = tmp_err;
else
crun_error_release (&tmp_err);
}

@giuseppe giuseppe force-pushed the support-rdt-monitoring branch 2 times, most recently from 6cd4a53 to 0b85287 Compare September 2, 2025 15:42
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the support-rdt-monitoring branch from 0b85287 to dfdcf77 Compare September 2, 2025 15:43
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@giuseppe
Copy link
Member Author

giuseppe commented Sep 3, 2025

@flouthoc @sohankunkerkar @kolyshkin PTAL

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@sohankunkerkar sohankunkerkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giuseppe giuseppe merged commit 130c391 into containers:main Sep 4, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants