Skip to content

Commit e0020ba

Browse files
committed
rust: add PidNamespace
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`. The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the calling `Task`'s pid namespace. It will only effect the pid namespace of children created by the calling `Task`. This invariant guarantees that after having acquired a reference to a `Task`'s pid namespace it will remain unchanged. When a task has exited and been reaped `release_task()` will be called. This will set the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to the `Task` will prevent `release_task()` being called. In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be used. There are two cases to consider: (1) retrieving the `PidNamespace` of the `current` task (2) retrieving the `PidNamespace` of a non-`current` task From system call context retrieving the `PidNamespace` for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust. Retrieving the `PidNamespace` from system call context for (2) requires RCU protection. Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can be returned as the non-`current` task could have already passed through `release_task()`. To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function should not be called directly as it could be abused to created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a reference count. For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes against putting the last reference of the associated `struct pid` of `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired from `task->thread_pid` to finish. Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent 22018a5 commit e0020ba

File tree

5 files changed

+225
-6
lines changed

5 files changed

+225
-6
lines changed

rust/helpers/helpers.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include "kunit.c"
1818
#include "mutex.c"
1919
#include "page.c"
20+
#include "pid_namespace.c"
2021
#include "rbtree.c"
2122
#include "refcount.c"
2223
#include "security.c"

rust/helpers/pid_namespace.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
3+
#include <linux/pid_namespace.h>
4+
#include <linux/cleanup.h>
5+
6+
struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
7+
{
8+
return get_pid_ns(ns);
9+
}
10+
11+
void rust_helper_put_pid_ns(struct pid_namespace *ns)
12+
{
13+
put_pid_ns(ns);
14+
}
15+
16+
/* Get a reference on a task's pid namespace. */
17+
struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
18+
{
19+
struct pid_namespace *pid_ns;
20+
21+
guard(rcu)();
22+
pid_ns = task_active_pid_ns(task);
23+
if (pid_ns)
24+
get_pid_ns(pid_ns);
25+
return pid_ns;
26+
}

rust/kernel/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ pub mod list;
4444
#[cfg(CONFIG_NET)]
4545
pub mod net;
4646
pub mod page;
47+
pub mod pid_namespace;
4748
pub mod prelude;
4849
pub mod print;
4950
pub mod rbtree;

rust/kernel/pid_namespace.rs

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
3+
// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
4+
5+
//! Pid namespaces.
6+
//!
7+
//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
8+
//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
9+
10+
use crate::{
11+
bindings,
12+
types::{AlwaysRefCounted, Opaque},
13+
};
14+
use core::ptr;
15+
16+
/// Wraps the kernel's `struct pid_namespace`. Thread safe.
17+
///
18+
/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
19+
/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
20+
/// code that we get passed from the C side.
21+
#[repr(transparent)]
22+
pub struct PidNamespace {
23+
inner: Opaque<bindings::pid_namespace>,
24+
}
25+
26+
impl PidNamespace {
27+
/// Returns a raw pointer to the inner C struct.
28+
#[inline]
29+
pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
30+
self.inner.get()
31+
}
32+
33+
/// Creates a reference to a [`PidNamespace`] from a valid pointer.
34+
///
35+
/// # Safety
36+
///
37+
/// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
38+
/// returned [`PidNamespace`] reference.
39+
pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
40+
// SAFETY: The safety requirements guarantee the validity of the dereference, while the
41+
// `PidNamespace` type being transparent makes the cast ok.
42+
unsafe { &*ptr.cast() }
43+
}
44+
}
45+
46+
// SAFETY: Instances of `PidNamespace` are always reference-counted.
47+
unsafe impl AlwaysRefCounted for PidNamespace {
48+
#[inline]
49+
fn inc_ref(&self) {
50+
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
51+
unsafe { bindings::get_pid_ns(self.as_ptr()) };
52+
}
53+
54+
#[inline]
55+
unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
56+
// SAFETY: The safety requirements guarantee that the refcount is non-zero.
57+
unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
58+
}
59+
}
60+
61+
// SAFETY:
62+
// - `PidNamespace::dec_ref` can be called from any thread.
63+
// - It is okay to send ownership of `PidNamespace` across thread boundaries.
64+
unsafe impl Send for PidNamespace {}
65+
66+
// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
67+
// we're either accessing properties that don't change or that are properly synchronised by C code.
68+
unsafe impl Sync for PidNamespace {}

rust/kernel/task.rs

Lines changed: 129 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
77
use crate::{
88
bindings,
9-
types::{NotThreadSafe, Opaque},
9+
pid_namespace::PidNamespace,
10+
types::{ARef, NotThreadSafe, Opaque},
1011
};
1112
use core::{
1213
cmp::{Eq, PartialEq},
@@ -36,6 +37,16 @@ macro_rules! current {
3637
};
3738
}
3839

40+
/// Returns the currently running task's pid namespace.
41+
#[macro_export]
42+
macro_rules! current_pid_ns {
43+
() => {
44+
// SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
45+
// the caller.
46+
unsafe { &*$crate::task::Task::current_pid_ns() }
47+
};
48+
}
49+
3950
/// Wraps the kernel's `struct task_struct`.
4051
///
4152
/// # Invariants
@@ -145,6 +156,97 @@ impl Task {
145156
}
146157
}
147158

159+
/// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
160+
///
161+
/// This function can be used to create an unbounded lifetime by e.g., storing the returned
162+
/// PidNamespace in a global variable which would be a bug. So the recommended way to get the
163+
/// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
164+
/// safe.
165+
///
166+
/// # Safety
167+
///
168+
/// Callers must ensure that the returned object doesn't outlive the current task/thread.
169+
pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
170+
struct PidNamespaceRef<'a> {
171+
task: &'a PidNamespace,
172+
_not_send: NotThreadSafe,
173+
}
174+
175+
impl Deref for PidNamespaceRef<'_> {
176+
type Target = PidNamespace;
177+
178+
fn deref(&self) -> &Self::Target {
179+
self.task
180+
}
181+
}
182+
183+
// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
184+
//
185+
// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
186+
// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
187+
// on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
188+
// created by the calling `Task`. This invariant guarantees that after having acquired a
189+
// reference to a `Task`'s pid namespace it will remain unchanged.
190+
//
191+
// When a task has exited and been reaped `release_task()` will be called. This will set
192+
// the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
193+
// that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
194+
// referencing count to
195+
// the `Task` will prevent `release_task()` being called.
196+
//
197+
// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
198+
// can be used. There are two cases to consider:
199+
//
200+
// (1) retrieving the `PidNamespace` of the `current` task
201+
// (2) retrieving the `PidNamespace` of a non-`current` task
202+
//
203+
// From system call context retrieving the `PidNamespace` for case (1) is always safe and
204+
// requires neither RCU locking nor a reference count to be held. Retrieving the
205+
// `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
206+
// like that is exposed to Rust.
207+
//
208+
// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
209+
// Accessing `PidNamespace` outside of RCU protection requires a reference count that
210+
// must've been acquired while holding the RCU lock. Note that accessing a non-`current`
211+
// task means `NULL` can be returned as the non-`current` task could have already passed
212+
// through `release_task()`.
213+
//
214+
// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
215+
// returned `PidNamespace` cannot outlive the calling scope. The associated
216+
// `current_pid_ns()` function should not be called directly as it could be abused to
217+
// created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
218+
// Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
219+
// protection and without having to acquire a reference count.
220+
//
221+
// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
222+
// reference on `PidNamespace` and will return an `Option` to force the caller to
223+
// explicitly handle the case where `PidNamespace` is `None`, something that tends to be
224+
// forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
225+
// difficult to perform operations that are otherwise safe without holding a reference
226+
// count as long as RCU protection is guaranteed. But it is not important currently. But we
227+
// do want it in the future.
228+
//
229+
// Note for (2) the required RCU protection around calling `task_active_pid_ns()`
230+
// synchronizes against putting the last reference of the associated `struct pid` of
231+
// `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
232+
// `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
233+
// `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
234+
// `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
235+
// from `task->thread_pid` to finish.
236+
//
237+
// SAFETY: The current task's pid namespace is valid as long as the current task is running.
238+
let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
239+
PidNamespaceRef {
240+
// SAFETY: If the current thread is still running, the current task and its associated
241+
// pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
242+
// transferred to another thread (where it could potentially outlive the current
243+
// `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
244+
// current task/thread.
245+
task: unsafe { PidNamespace::from_ptr(pidns) },
246+
_not_send: NotThreadSafe,
247+
}
248+
}
249+
148250
/// Returns the group leader of the given task.
149251
pub fn group_leader(&self) -> &Task {
150252
// SAFETY: By the type invariant, we know that `self.0` is a valid task. Valid tasks always
@@ -182,11 +284,32 @@ impl Task {
182284
unsafe { bindings::signal_pending(self.0.get()) != 0 }
183285
}
184286

185-
/// Returns the given task's pid in the current pid namespace.
186-
pub fn pid_in_current_ns(&self) -> Pid {
187-
// SAFETY: We know that `self.0.get()` is valid by the type invariant, and passing a null
188-
// pointer as the namespace is correct for using the current namespace.
189-
unsafe { bindings::task_tgid_nr_ns(self.0.get(), ptr::null_mut()) }
287+
/// Returns task's pid namespace with elevated reference count
288+
pub fn get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
289+
// SAFETY: By the type invariant, we know that `self.0` is valid.
290+
let ptr = unsafe { bindings::task_get_pid_ns(self.0.get()) };
291+
if ptr.is_null() {
292+
None
293+
} else {
294+
// SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
295+
// reference count via `task_get_pid_ns()`.
296+
// CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
297+
Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
298+
}
299+
}
300+
301+
/// Returns the given task's pid in the provided pid namespace.
302+
#[doc(alias = "task_tgid_nr_ns")]
303+
pub fn tgid_nr_ns(&self, pidns: Option<&PidNamespace>) -> Pid {
304+
let pidns = match pidns {
305+
Some(pidns) => pidns.as_ptr(),
306+
None => core::ptr::null_mut(),
307+
};
308+
// SAFETY: By the type invariant, we know that `self.0` is valid. We received a valid
309+
// PidNamespace that we can use as a pointer or we received an empty PidNamespace and
310+
// thus pass a null pointer. The underlying C function is safe to be used with NULL
311+
// pointers.
312+
unsafe { bindings::task_tgid_nr_ns(self.0.get(), pidns) }
190313
}
191314

192315
/// Wakes up the task.

0 commit comments

Comments
 (0)