Skip to content

Commit 9ad8d22

Browse files
committed
Merge tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pid_namespace rust bindings from Christian Brauner: "This contains my Rust bindings for pid namespaces needed for various rust drivers. Here's a description of the basic C semantics and how they are mapped to Rust. The pid namespace of a task doesn't ever change once the task is alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not have an effect on the calling task's pid namespace. It will only effect the pid namespace of children created by the calling task. This invariant guarantees that after having acquired a reference to a task's pid namespace it will remain unchanged. When a task has exited and been reaped release_task() will be called. This will set the pid namespace of the task to NULL. So retrieving the pid namespace of a task that is dead will return NULL. Note, that neither holding the RCU lock nor holding a reference count to the task will prevent release_task() from being called. In order to retrieve the pid namespace of a task the task_active_pid_ns() function can be used. There are two cases to consider: (1) retrieving the pid namespace of the current task (2) retrieving the pid namespace of a non-current task From system call context retrieving the pid namespace for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the pid namespace after release_task() for current will return NULL but no codepath like that is exposed to Rust. Retrieving the pid namespace from system call context for (2) requires RCU protection. Accessing a pid namespace outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-current task means NULL can be returned as the non-current task could have already passed through release_task(). To retrieve (1) the current_pid_ns!() macro should be used. It ensures that the returned pid namespace cannot outlive the calling scope. The associated current_pid_ns() function should not be called directly as it could be abused to created an unbounded lifetime for the pid namespace. The current_pid_ns!() macro allows Rust to handle the common case of accessing current's pid namespace without RCU protection and without having to acquire a reference count. For (2) the task_get_pid_ns() method must be used. This will always acquire a reference on the pid namespace and will return an Option to force the caller to explicitly handle the case where pid namespace is None. Something that tends to be forgotten when doing the equivalent operation in C. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note that for (2) the required RCU protection around calling task_active_pid_ns() synchronizes against putting the last reference of the associated struct pid of task->thread_pid. The struct pid stored in that field is used to retrieve the pid namespace of the caller. When release_task() is called task->thread_pid will be NULLed and put_pid() on said struct pid will be delayed in free_pid() via call_rcu() allowing everyone with an RCU protected access to the struct pid acquired from task->thread_pid to finish" * tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: add PidNamespace
2 parents 445d9f0 + e0020ba commit 9ad8d22

File tree

5 files changed

+225
-6
lines changed

5 files changed

+225
-6
lines changed

rust/helpers/helpers.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include "kunit.c"
1919
#include "mutex.c"
2020
#include "page.c"
21+
#include "pid_namespace.c"
2122
#include "rbtree.c"
2223
#include "refcount.c"
2324
#include "security.c"

rust/helpers/pid_namespace.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
3+
#include <linux/pid_namespace.h>
4+
#include <linux/cleanup.h>
5+
6+
struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
7+
{
8+
return get_pid_ns(ns);
9+
}
10+
11+
void rust_helper_put_pid_ns(struct pid_namespace *ns)
12+
{
13+
put_pid_ns(ns);
14+
}
15+
16+
/* Get a reference on a task's pid namespace. */
17+
struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
18+
{
19+
struct pid_namespace *pid_ns;
20+
21+
guard(rcu)();
22+
pid_ns = task_active_pid_ns(task);
23+
if (pid_ns)
24+
get_pid_ns(pid_ns);
25+
return pid_ns;
26+
}

rust/kernel/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ pub mod list;
4545
#[cfg(CONFIG_NET)]
4646
pub mod net;
4747
pub mod page;
48+
pub mod pid_namespace;
4849
pub mod prelude;
4950
pub mod print;
5051
pub mod rbtree;

rust/kernel/pid_namespace.rs

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
3+
// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
4+
5+
//! Pid namespaces.
6+
//!
7+
//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
8+
//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
9+
10+
use crate::{
11+
bindings,
12+
types::{AlwaysRefCounted, Opaque},
13+
};
14+
use core::ptr;
15+
16+
/// Wraps the kernel's `struct pid_namespace`. Thread safe.
17+
///
18+
/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
19+
/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
20+
/// code that we get passed from the C side.
21+
#[repr(transparent)]
22+
pub struct PidNamespace {
23+
inner: Opaque<bindings::pid_namespace>,
24+
}
25+
26+
impl PidNamespace {
27+
/// Returns a raw pointer to the inner C struct.
28+
#[inline]
29+
pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
30+
self.inner.get()
31+
}
32+
33+
/// Creates a reference to a [`PidNamespace`] from a valid pointer.
34+
///
35+
/// # Safety
36+
///
37+
/// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
38+
/// returned [`PidNamespace`] reference.
39+
pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
40+
// SAFETY: The safety requirements guarantee the validity of the dereference, while the
41+
// `PidNamespace` type being transparent makes the cast ok.
42+
unsafe { &*ptr.cast() }
43+
}
44+
}
45+
46+
// SAFETY: Instances of `PidNamespace` are always reference-counted.
47+
unsafe impl AlwaysRefCounted for PidNamespace {
48+
#[inline]
49+
fn inc_ref(&self) {
50+
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
51+
unsafe { bindings::get_pid_ns(self.as_ptr()) };
52+
}
53+
54+
#[inline]
55+
unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
56+
// SAFETY: The safety requirements guarantee that the refcount is non-zero.
57+
unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
58+
}
59+
}
60+
61+
// SAFETY:
62+
// - `PidNamespace::dec_ref` can be called from any thread.
63+
// - It is okay to send ownership of `PidNamespace` across thread boundaries.
64+
unsafe impl Send for PidNamespace {}
65+
66+
// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
67+
// we're either accessing properties that don't change or that are properly synchronised by C code.
68+
unsafe impl Sync for PidNamespace {}

rust/kernel/task.rs

Lines changed: 129 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
77
use crate::{
88
bindings,
9-
types::{NotThreadSafe, Opaque},
9+
pid_namespace::PidNamespace,
10+
types::{ARef, NotThreadSafe, Opaque},
1011
};
1112
use core::{
1213
cmp::{Eq, PartialEq},
@@ -36,6 +37,16 @@ macro_rules! current {
3637
};
3738
}
3839

40+
/// Returns the currently running task's pid namespace.
41+
#[macro_export]
42+
macro_rules! current_pid_ns {
43+
() => {
44+
// SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
45+
// the caller.
46+
unsafe { &*$crate::task::Task::current_pid_ns() }
47+
};
48+
}
49+
3950
/// Wraps the kernel's `struct task_struct`.
4051
///
4152
/// # Invariants
@@ -145,6 +156,97 @@ impl Task {
145156
}
146157
}
147158

159+
/// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
160+
///
161+
/// This function can be used to create an unbounded lifetime by e.g., storing the returned
162+
/// PidNamespace in a global variable which would be a bug. So the recommended way to get the
163+
/// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
164+
/// safe.
165+
///
166+
/// # Safety
167+
///
168+
/// Callers must ensure that the returned object doesn't outlive the current task/thread.
169+
pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
170+
struct PidNamespaceRef<'a> {
171+
task: &'a PidNamespace,
172+
_not_send: NotThreadSafe,
173+
}
174+
175+
impl Deref for PidNamespaceRef<'_> {
176+
type Target = PidNamespace;
177+
178+
fn deref(&self) -> &Self::Target {
179+
self.task
180+
}
181+
}
182+
183+
// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
184+
//
185+
// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
186+
// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
187+
// on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
188+
// created by the calling `Task`. This invariant guarantees that after having acquired a
189+
// reference to a `Task`'s pid namespace it will remain unchanged.
190+
//
191+
// When a task has exited and been reaped `release_task()` will be called. This will set
192+
// the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
193+
// that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
194+
// referencing count to
195+
// the `Task` will prevent `release_task()` being called.
196+
//
197+
// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
198+
// can be used. There are two cases to consider:
199+
//
200+
// (1) retrieving the `PidNamespace` of the `current` task
201+
// (2) retrieving the `PidNamespace` of a non-`current` task
202+
//
203+
// From system call context retrieving the `PidNamespace` for case (1) is always safe and
204+
// requires neither RCU locking nor a reference count to be held. Retrieving the
205+
// `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
206+
// like that is exposed to Rust.
207+
//
208+
// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
209+
// Accessing `PidNamespace` outside of RCU protection requires a reference count that
210+
// must've been acquired while holding the RCU lock. Note that accessing a non-`current`
211+
// task means `NULL` can be returned as the non-`current` task could have already passed
212+
// through `release_task()`.
213+
//
214+
// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
215+
// returned `PidNamespace` cannot outlive the calling scope. The associated
216+
// `current_pid_ns()` function should not be called directly as it could be abused to
217+
// created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
218+
// Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
219+
// protection and without having to acquire a reference count.
220+
//
221+
// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
222+
// reference on `PidNamespace` and will return an `Option` to force the caller to
223+
// explicitly handle the case where `PidNamespace` is `None`, something that tends to be
224+
// forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
225+
// difficult to perform operations that are otherwise safe without holding a reference
226+
// count as long as RCU protection is guaranteed. But it is not important currently. But we
227+
// do want it in the future.
228+
//
229+
// Note for (2) the required RCU protection around calling `task_active_pid_ns()`
230+
// synchronizes against putting the last reference of the associated `struct pid` of
231+
// `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
232+
// `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
233+
// `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
234+
// `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
235+
// from `task->thread_pid` to finish.
236+
//
237+
// SAFETY: The current task's pid namespace is valid as long as the current task is running.
238+
let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
239+
PidNamespaceRef {
240+
// SAFETY: If the current thread is still running, the current task and its associated
241+
// pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
242+
// transferred to another thread (where it could potentially outlive the current
243+
// `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
244+
// current task/thread.
245+
task: unsafe { PidNamespace::from_ptr(pidns) },
246+
_not_send: NotThreadSafe,
247+
}
248+
}
249+
148250
/// Returns a raw pointer to the task.
149251
#[inline]
150252
pub fn as_ptr(&self) -> *mut bindings::task_struct {
@@ -188,11 +290,32 @@ impl Task {
188290
unsafe { bindings::signal_pending(self.as_ptr()) != 0 }
189291
}
190292

191-
/// Returns the given task's pid in the current pid namespace.
192-
pub fn pid_in_current_ns(&self) -> Pid {
193-
// SAFETY: It's valid to pass a null pointer as the namespace (defaults to current
194-
// namespace). The task pointer is also valid.
195-
unsafe { bindings::task_tgid_nr_ns(self.as_ptr(), ptr::null_mut()) }
293+
/// Returns task's pid namespace with elevated reference count
294+
pub fn get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
295+
// SAFETY: By the type invariant, we know that `self.0` is valid.
296+
let ptr = unsafe { bindings::task_get_pid_ns(self.as_ptr()) };
297+
if ptr.is_null() {
298+
None
299+
} else {
300+
// SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
301+
// reference count via `task_get_pid_ns()`.
302+
// CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
303+
Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
304+
}
305+
}
306+
307+
/// Returns the given task's pid in the provided pid namespace.
308+
#[doc(alias = "task_tgid_nr_ns")]
309+
pub fn tgid_nr_ns(&self, pidns: Option<&PidNamespace>) -> Pid {
310+
let pidns = match pidns {
311+
Some(pidns) => pidns.as_ptr(),
312+
None => core::ptr::null_mut(),
313+
};
314+
// SAFETY: By the type invariant, we know that `self.0` is valid. We received a valid
315+
// PidNamespace that we can use as a pointer or we received an empty PidNamespace and
316+
// thus pass a null pointer. The underlying C function is safe to be used with NULL
317+
// pointers.
318+
unsafe { bindings::task_tgid_nr_ns(self.as_ptr(), pidns) }
196319
}
197320

198321
/// Wakes up the task.

0 commit comments

Comments
 (0)