Skip to content

Conversation

sohankunkerkar
Copy link
Member

@sohankunkerkar sohankunkerkar commented Aug 27, 2025

Replace individual tmpfs mounts with bind mounts to a shared empty directory for masked directory paths. This reduces kernel memory usage under high container density.

This is how the mountinfo will look like:

# 1812 1798 0:27 /crun-test/.empty-directory /sys/devices/system/cpu/cpu0/thermal_throttle ro - tmpfs tmpfs rw,seclabel,size=13107056k,nr_inodes=819200,mode=755,inode64
# 1813 1798 0:27 /crun-test/.empty-directory /sys/devices/system/cpu/cpu1/thermal_throttle ro - tmpfs tmpfs rw,seclabel,size=13107056k,nr_inodes=819200,mode=755,inode64
# 1814 1798 0:27 /crun-test/.empty-directory /sys/devices/system/cpu/cpu10/thermal_throttle ro - tmpfs tmpfs rw,seclabel,size=13107056k,nr_inodes=819200,mode=755,inode64

Summary by Sourcery

Optimize masked directory mounts by using a shared empty directory bind mount to reduce per-mount tmpfs memory overhead under high container density.

New Features:

  • Introduce shared empty directory management for masked paths via get_shared_empty_dir.
  • Add mount_masked_dir to bind mount the shared empty directory and fallback to tmpfs on failure.

Enhancements:

  • Extend container private_data to cache shared_empty_dir_fd and its proc path and clean them up properly.
  • Update do_masked_or_readonly_path to use the new mount_masked_dir logic instead of individual tmpfs mounts.

Copy link

sourcery-ai bot commented Aug 27, 2025

Reviewer's Guide

This PR optimizes masked directory mounts by introducing a single shared empty directory (using O_TMPFILE or a temporary mkdtemp directory) that is bind-mounted over each masked path, falling back to per-path tmpfs mounts on failure. It extends the container’s private data to cache the shared directory, implements helper functions to manage and mount it, and integrates the new logic into the existing masked-path workflow.

Sequence diagram for masked directory mount process with shared empty directory

sequenceDiagram
    participant Container
    participant MaskedDirManager
    participant Kernel
    Container->>MaskedDirManager: Request masked path mount
    MaskedDirManager->>MaskedDirManager: get_shared_empty_dir()
    alt Shared empty dir available
        MaskedDirManager->>Kernel: Bind mount shared empty dir
        Kernel-->>MaskedDirManager: Success or Failure
        alt Bind mount fails
            MaskedDirManager->>Kernel: Mount tmpfs over path
            Kernel-->>MaskedDirManager: Success
        end
    else Shared empty dir not available
        MaskedDirManager->>Kernel: Mount tmpfs over path
        Kernel-->>MaskedDirManager: Success
    end
    MaskedDirManager-->>Container: Mount complete
Loading

Class diagram for updated private_data_s structure and masked directory handling

classDiagram
    class private_data_s {
        char *external_descriptors
        int rootfsfd
        int shared_empty_dir_fd
        proc_fd_path_t shared_empty_dir_path
        ...
    }
    class libcrun_container_s {
        private_data_s *private_data
        cleanup_private_data
        ...
    }
    private_data_s <.. libcrun_container_s : contains
    class MaskedDirManager {
        +get_shared_empty_dir(container, err)
        +mount_masked_dir(container, pathfd, rel_path, err)
    }
    MaskedDirManager ..> private_data_s : uses
    MaskedDirManager ..> libcrun_container_s : uses
Loading

Flow diagram for masked directory mount optimization

flowchart TD
    A[Start masked path mount] --> B{Is shared empty dir available?}
    B -- Yes --> C[Bind mount shared empty dir over masked path]
    C --> D[Success]
    B -- No or bind mount fails --> E[Mount tmpfs over masked path]
    E --> D[Success]
Loading

File-Level Changes

Change Details Files
Extend private_data to store shared empty directory state
  • Add shared_empty_dir_fd and shared_empty_dir_path fields to private_data_s
  • Initialize shared_empty_dir_fd in get_private_data
  • Close shared_empty_dir_fd and zero out its path in cleanup_private_data
src/libcrun/linux.c
Implement get_shared_empty_dir helper
  • Attempt to open a shared empty directory via O_TMPFILE or fallback to mkdtemp
  • Cache the directory’s file descriptor and its /proc/self/fd/ path
  • Validate cached fd with fstat and reopen if invalid
  • Handle errors and propagate via libcrun_error_t
src/libcrun/linux.c
Add mount_masked_dir to bind-mount shared directory with tmpfs fallback
  • Try bind-mounting the shared empty directory over the masked path
  • Log warnings and fall back to a tmpfs mount on bind failure or creation errors
  • Propagate mount errors through libcrun_error_t
src/libcrun/linux.c
Integrate shared-dir logic into masked-path workflow
  • Replace direct tmpfs mount with mount_masked_dir call in do_masked_or_readonly_path
src/libcrun/linux.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sohankunkerkar sohankunkerkar changed the title linux: optimize masked paths with shared empty directory [WIP] linux: optimize masked paths with shared empty directory Aug 27, 2025
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/libcrun/linux.c:1099` </location>
<code_context>
+  if (prefix == NULL || *prefix != '/')
+    prefix = "/tmp";
+
+#ifdef O_TMPFILE
+  fd = open (prefix, O_TMPFILE | O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0700);
+  if (fd >= 0)
+    {
+      private_data->shared_empty_dir_fd = fd;
+      get_proc_self_fd_path (private_data->shared_empty_dir_path, fd);
+      fd = -1; /* Don't auto-close on cleanup */
+      return private_data->shared_empty_dir_fd;
+    }
+
+  /* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
+  if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
+    return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);
+#endif
</code_context>

<issue_to_address>
Error handling for open with O_TMPFILE may miss some edge cases.

Review whether additional errno values, such as ENOSPC or EACCES, should also trigger the fallback instead of being treated as fatal errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
  /* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
  if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
    return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);
=======
  /* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
  if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL && errno != ENOSPC && errno != EACCES)
    return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `src/libcrun/linux.c:1116` </location>
<code_context>
+
+  /* If O_TMPFILE not available, create directory using crun's standard pattern */
+  char template[PATH_MAX];
+  if (snprintf (template, sizeof (template), "%s/crun.XXXXXX", prefix) < 0)
+    return crun_make_error (err, errno, "snprintf");
+
</code_context>

<issue_to_address>
snprintf return value check may not be sufficient for buffer overflow.

If snprintf returns a value >= sizeof(template), truncation has occurred. Please add a check for this to prevent buffer overflow risks.
</issue_to_address>

### Comment 3
<location> `src/libcrun/linux.c:1144` </location>
<code_context>
+  int ret;
+
+  libcrun_error_t tmp_err = NULL;
+  int shared_empty_fd = get_shared_empty_dir (container, &tmp_err);
+  if (shared_empty_fd >= 0)
+    {
</code_context>

<issue_to_address>
Error handling for get_shared_empty_dir may leak file descriptors.

If get_shared_empty_dir fails after opening a directory, make sure any opened file descriptors are properly closed. Check the cleanup logic for edge cases.
</issue_to_address>

### Comment 4
<location> `src/libcrun/linux.c:1074` </location>
<code_context>
   return false;
 }

+/* Get or create a shared empty directory for masked paths optimization.
+ * This avoids creating individual tmpfs mounts for each masked directory.
+ */
</code_context>

<issue_to_address>
Consider merging the two new routines into a single helper function to simplify masked directory handling.

```suggestion
The two new routines (`get_shared_empty_dir` + `mount_masked_dir`) can be collapsed into one simpler helper. You only need to open the “anonymous” dir once (using O_TMPFILE or mkdtemp) and then bind-mount it via `/proc/self/fd/<fd>`. You can drop the extra proc-path cache and just `snprintf()` it on each mount.

Example:

```c
/* open anon directory once, fallback from O_TMPFILE -> mkdtemp */
static int open_anon_dir(const char *prefix, libcrun_error_t *err) {
    int fd;
#ifdef O_TMPFILE
    fd = open(prefix, O_TMPFILE|O_DIRECTORY|O_CLOEXEC, 0700);
    if (fd >= 0) return fd;
    if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
        return crun_make_error(err, errno, "open O_TMPFILE `%s`", prefix);
#endif
    char tmp[PATH_MAX];
    if (snprintf(tmp, sizeof(tmp), "%s/crun.XXXXXX", prefix) < 0)
        return crun_make_error(err, errno, "snprintf");
    if (!mkdtemp(tmp))
        return crun_make_error(err, errno, "mkdtemp `%s`", tmp);
    fd = open(tmp, O_DIRECTORY|O_CLOEXEC);
    rmdir(tmp);  /* cleanup on host fs */
    if (fd < 0)
        return crun_make_error(err, errno, "open `%s`", tmp);
    return fd;
}

/* single helper for masked dirs */
static int mount_masked_dir(libcrun_container_t *ctr,
                            int pathfd, const char *rel, libcrun_error_t *err)
{
    struct private_data_s *p = get_private_data(ctr);
    if (p->empty_fd < 0) {
        const char *dir = getenv("_LIBCONTAINER_STATEDIR");
        if (!dir || *dir!='/') dir = "/tmp";
        p->empty_fd = open_anon_dir(dir, err);
        if (p->empty_fd < 0) return -1;
    }

    /* mount via /proc/self/fd */
    char src[32];
    snprintf(src, sizeof(src), "/proc/self/fd/%d", p->empty_fd);
    if (do_mount(ctr, src, pathfd, rel,
                 NULL, MS_BIND|MS_RDONLY, NULL, LABEL_MOUNT, err) == 0)
        return 0;

    libcrun_warning("bind empty-dir `%s` failed, falling back to tmpfs: %s",
                    rel, err ? err->msg : "errno");
    crun_error_release(&err);

    /* fallback to tmpfs */
    return do_mount(ctr, "tmpfs", pathfd, rel,
                    "tmpfs", MS_RDONLY, "size=0k", LABEL_MOUNT, err);
}
```

- Drop `get_shared_empty_dir` and its proc-path cache.
- Store only one `empty_fd` in `private_data_s`.
- Inline the fallback logic in one function.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 1109 to 1111
/* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Error handling for open with O_TMPFILE may miss some edge cases.

Review whether additional errno values, such as ENOSPC or EACCES, should also trigger the fallback instead of being treated as fatal errors.

Suggested change
/* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);
/* Fallback if O_TMPFILE isn't supported or fails unexpectedly */
if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL && errno != ENOSPC && errno != EACCES)
return crun_make_error (err, errno, "open O_TMPFILE `%s`", prefix);


/* If O_TMPFILE not available, create directory using crun's standard pattern */
char template[PATH_MAX];
if (snprintf (template, sizeof (template), "%s/crun.XXXXXX", prefix) < 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): snprintf return value check may not be sufficient for buffer overflow.

If snprintf returns a value >= sizeof(template), truncation has occurred. Please add a check for this to prevent buffer overflow risks.

int ret;

libcrun_error_t tmp_err = NULL;
int shared_empty_fd = get_shared_empty_dir (container, &tmp_err);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Error handling for get_shared_empty_dir may leak file descriptors.

If get_shared_empty_dir fails after opening a directory, make sure any opened file descriptors are properly closed. Check the cleanup logic for edge cases.

return false;
}

/* Get or create a shared empty directory for masked paths optimization.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider merging the two new routines into a single helper function to simplify masked directory handling.

Suggested change
/* Get or create a shared empty directory for masked paths optimization.
The two new routines (`get_shared_empty_dir` + `mount_masked_dir`) can be collapsed into one simpler helper. You only need to open theanonymousdir once (using O_TMPFILE or mkdtemp) and then bind-mount it via `/proc/self/fd/<fd>`. You can drop the extra proc-path cache and just `snprintf()` it on each mount.
Example:
```c
/* open anon directory once, fallback from O_TMPFILE -> mkdtemp */
static int open_anon_dir(const char *prefix, libcrun_error_t *err) {
int fd;
#ifdef O_TMPFILE
fd = open(prefix, O_TMPFILE|O_DIRECTORY|O_CLOEXEC, 0700);
if (fd >= 0) return fd;
if (errno != EOPNOTSUPP && errno != EISDIR && errno != EINVAL)
return crun_make_error(err, errno, "open O_TMPFILE `%s`", prefix);
#endif
char tmp[PATH_MAX];
if (snprintf(tmp, sizeof(tmp), "%s/crun.XXXXXX", prefix) < 0)
return crun_make_error(err, errno, "snprintf");
if (!mkdtemp(tmp))
return crun_make_error(err, errno, "mkdtemp `%s`", tmp);
fd = open(tmp, O_DIRECTORY|O_CLOEXEC);
rmdir(tmp); /* cleanup on host fs */
if (fd < 0)
return crun_make_error(err, errno, "open `%s`", tmp);
return fd;
}
/* single helper for masked dirs */
static int mount_masked_dir(libcrun_container_t *ctr,
int pathfd, const char *rel, libcrun_error_t *err)
{
struct private_data_s *p = get_private_data(ctr);
if (p->empty_fd < 0) {
const char *dir = getenv("_LIBCONTAINER_STATEDIR");
if (!dir || *dir!='/') dir = "/tmp";
p->empty_fd = open_anon_dir(dir, err);
if (p->empty_fd < 0) return -1;
}
/* mount via /proc/self/fd */
char src[32];
snprintf(src, sizeof(src), "/proc/self/fd/%d", p->empty_fd);
if (do_mount(ctr, src, pathfd, rel,
NULL, MS_BIND|MS_RDONLY, NULL, LABEL_MOUNT, err) == 0)
return 0;
libcrun_warning("bind empty-dir `%s` failed, falling back to tmpfs: %s",
rel, err ? err->msg : "errno");
crun_error_release(&err);
/* fallback to tmpfs */
return do_mount(ctr, "tmpfs", pathfd, rel,
"tmpfs", MS_RDONLY, "size=0k", LABEL_MOUNT, err);
}
  • Drop get_shared_empty_dir and its proc-path cache.
  • Store only one empty_fd in private_data_s.
  • Inline the fallback logic in one function.

Copy link

TMT tests failed. @containers/packit-build please check.

@giuseppe
Copy link
Member

you can create a directory under $RUNDIR, e.g. $RUNDIR/.empty-directory and use always that for all the containers, easier to handle

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit, otherwise codewise LGTM

if (LIKELY (ret >= 0))
return ret;

if (tmp_err != NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the if statement

  if (tmp_err != NULL)

is unnecessary?

I created an issue related to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, @sohankunkerkar please drop the check as it is not necessary.

Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just reading tmpfs(5) and found out that size=0, which we're currently using, means "no limit". Perhaps if we change it to be > 0, this will fix the issue of high kernel memory use?

Something like nr_blocks=1,nr_inodes=1?

@giuseppe
Copy link
Member

I was just reading tmpfs(5) and found out that size=0, which we're currently using, means "no limit". Perhaps if we change it to be > 0, this will fix the issue of high kernel memory use?

Something like nr_blocks=1,nr_inodes=1?

wouldn't the bind mount be always cheaper than the tmpfs though?

If we leave the fallback though, I think we need to issue above.

@sohankunkerkar can you please add a new patch to this PR?

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments inline, waiting for @kolyshkin to weight in if he is happy with the change

if (LIKELY (ret >= 0))
return ret;

if (tmp_err != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, @sohankunkerkar please drop the check as it is not necessary.

Replace "size=0k" (unlimited growth) with explicit block and inode limits
for tmpfs mounts used in masked directory paths. This prevents excessive
kernel memory consumption under high container density.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I wonder if there are measurements re how much memory it saves?

@kolyshkin
Copy link
Collaborator

@sohankunkerkar guess you also need to remove [WIP].

@sohankunkerkar sohankunkerkar changed the title [WIP] linux: optimize masked paths with shared empty directory linux: optimize masked paths with shared empty directory Aug 29, 2025
return ret;

/* Apply SELinux label using container's mount label */
if (container->container_def->linux && container->container_def->linux->mount_label)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CNV team found an SELinux issue while testing the previous patch:

audit: type=1400 audit(1756323551.103:5179): avc: denied { read } for pid=142874 
comm="virt-launcher-m" name=".empty-directory" dev="tmpfs" ino=7201 
scontext=system_u:system_r:container_t:s0:c139,c767 
tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

I added this code to address the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this test to confirm the SELinux context for the directory.

@test "crun shared directory has correct SELinux context" {
	cat > "$CRIO_CONFIG_DIR/01-crun-test.conf" <<EOF
[crio.runtime.runtimes.crun-test]
runtime_path = "$CRUN_BINARY"
runtime_type = ""
runtime_root = "/run/crun-test"
monitor_path = "/usr/bin/conmon"
EOF

	start_crio
	
	# Create container config with thermal_throttle masked paths
	jq '.linux.security_context.masked_paths = ["/sys/devices/system/cpu/cpu0/thermal_throttle"] | del(.linux.security_context.namespace_options.pid)' \
		"$TESTDATA/container_redis.json" > "$TESTDIR/container_thermal.json"
	
	# Create sandbox config without PID namespace sharing (avoid conflict with drop_infra_ctr=true)
	jq 'del(.linux.security_context.namespace_options.pid)' \
		"$TESTDATA/sandbox_config.json" > "$TESTDIR/sandbox_no_pid.json"
	
	local ctr_id
	ctr_id=$(crictl run --runtime crun-test "$TESTDIR/container_thermal.json" "$TESTDIR/sandbox_no_pid.json")
	
	# Wait for container to start
	sleep 2
	
	sudo ls -ldZ /run/crun-test/.empty-directory 2>/dev/null || echo "Directory not found"
	
	local selinux_context
	selinux_context=$(sudo ls -ldZ /run/crun-test/.empty-directory 2>/dev/null | awk '{print $5}' | head -1)
	
	echo "DEBUG: SELinux context found: '$selinux_context'"
	
	# Should have container_file_t context (accessible to containers) not container_var_run_t
	[[ "$selinux_context" =~ container_file_t ]]
	[[ ! "$selinux_context" =~ container_var_run_t ]]
	
	# Test mountinfo to verify bind mounts instead of tmpfs  
	echo "DEBUG: Checking mountinfo for thermal_throttle paths..."
	crictl exec "$ctr_id" sh -c "cat /proc/self/mountinfo | grep thermal_throttle | head -3"
	
	crictl stop "$ctr_id"
	crictl rm "$ctr_id"
}

Copy link
Member Author

@sohankunkerkar sohankunkerkar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be this?

const char *shared_context = "system_u:object_r:container_file_t:s0";
ret = setxattr (empty_dir_path, "security.selinux", shared_context, strlen (shared_context), 0);
  if (ret < 0 && errno != EOPNOTSUPP)
    {
      libcrun_warning ("Failed to set shared SELinux context on %s: %s", empty_dir_path, strerror (errno));
    }

I suspect each container gets unique MCS categories for isolation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not set the xattr at all. It will be changed for each mount. It must have a static value (the default it gets might be good) and the selinux policy should allow that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about containers/container-selinux#393?

@sohankunkerkar
Copy link
Member Author

@giuseppe could you PTAL?

sohankunkerkar added a commit to sohankunkerkar/container-selinux that referenced this pull request Aug 30, 2025
…timization

Problem: crun PR #1859 (containers/crun#1859) optimizes
masked paths by using a shared empty directory instead of individual tmpfs mounts.
However, containers cannot access this shared directory due to SELinux policy:

  avc: denied { read } for name=".empty-directory"
  scontext=container_t:s0:c139,c767 tcontext=container_var_run_t:s0
Without this policy, the optimization falls back to individual tmpfs
mounts, negating the performance benefits.
sohankunkerkar added a commit to sohankunkerkar/container-selinux that referenced this pull request Aug 30, 2025
…timization

Problem: crun PR #1859 (containers/crun#1859) optimizes
masked paths by using a shared empty directory instead of individual tmpfs mounts.
However, containers cannot access this shared directory due to SELinux policy:

  avc: denied { read } for name=".empty-directory"
  scontext=container_t:s0:c139,c767 tcontext=container_var_run_t:s0
Without this policy, the optimization falls back to individual tmpfs
mounts, negating the performance benefits.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Optimize masked path handling by bind-mounting a shared empty directory
(via cached /proc/self/fd) instead of creating per-path tmpfs mounts.
This reduces kernel memory and mount syscall overhead under high container
density.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lsm5
Copy link
Member

lsm5 commented Sep 1, 2025

/packit retest-failed

Copy link

Account lsm5 has no write access nor is author of PR!

@lsm5
Copy link
Member

lsm5 commented Sep 1, 2025

@giuseppe mind giving me write access ? Alternatively, please feel free to run that packit command.

@giuseppe
Copy link
Member

giuseppe commented Sep 1, 2025

@lsm5 can you try again?

@lsm5
Copy link
Member

lsm5 commented Sep 1, 2025

/packit retest-failed

@lsm5
Copy link
Member

lsm5 commented Sep 1, 2025

@lsm5 can you try again?

@giuseppe works, thanks. I promise not to abuse my privs 😄

@giuseppe giuseppe merged commit 99b6cad into containers:main Sep 3, 2025
47 checks passed
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/container-selinux that referenced this pull request Sep 3, 2025
…timization

Problem: crun PR #1859 (containers/crun#1859) optimizes
masked paths by using a shared empty directory instead of individual tmpfs mounts.
However, containers cannot access this shared directory due to SELinux policy:

  avc: denied { read } for name=".empty-directory"
  scontext=container_t:s0:c139,c767 tcontext=container_var_run_t:s0
Without this policy, the optimization falls back to individual tmpfs
mounts, negating the performance benefits.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants