Skip to content

Commit fb67199

Browse files
committed
Guarantee slice representation
1 parent b0e56db commit fb67199

File tree

1 file changed

+178
-0
lines changed

1 file changed

+178
-0
lines changed

text/0000-guaranteed-slice-repr.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
- Feature Name: guaranteed_slice_repr
2+
- Start Date: 2025-02-18
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC guarantees the in-memory representation of slice and str references.
10+
Specifically, `&[T]` is guaranteed to have the same layout as:
11+
12+
```rust
13+
#[repr(C)]
14+
struct Slice<T> {
15+
data: *const T,
16+
len: usize,
17+
}
18+
```
19+
20+
The layout of `&str` is the same as that of `&[u8]`, and the layout of
21+
`&mut str` is the same as that of `&mut [u8]`.
22+
23+
# Motivation
24+
[motivation]: #motivation
25+
26+
This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing
27+
slices and to declare slice fields or locals.
28+
29+
For example, guaranteeing the representation of slices allows non-Rust code to
30+
read from the `data` or `len` fields of `string` in the type below without
31+
intermediate FFI calls into Rust:
32+
33+
```rust
34+
#[repr(C)]
35+
struct HasString {
36+
string: &'static str,
37+
}
38+
```
39+
40+
Note: prior to this RFC, the type above is not even properly `repr(C)` since the
41+
size and alignment of slices were not guaranteed. However, the Rust compiler
42+
accepts `repr(C)` declaration above without warning.
43+
44+
# Guide-level explanation
45+
[guide-level-explanation]: #guide-level-explanation
46+
47+
Slices are represented with a pointer and length pair. Their in-memory layout is
48+
the same as a `#[repr(C)]` struct like the following:
49+
50+
```rust
51+
#[repr(C)]
52+
struct Slice<T> {
53+
data: *const T,
54+
len: usize,
55+
}
56+
```
57+
58+
The precise ABI of slices is not guaranteed, so `&[T]` may not be passed by-value
59+
or returned by-value from an `extern "C" fn`.
60+
61+
The validity requirements for the in-memory slice representation are the same
62+
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).
63+
Namely:
64+
65+
* `data` must be non-null, [valid] for reads for `len * mem::size_of::<T>()` many bytes,
66+
and it must be properly aligned. This means in particular:
67+
68+
* The entire memory range of this slice must be contained within a single allocated object!
69+
Slices can never span across multiple allocated objects. See [below](#incorrect-usage)
70+
for an example incorrectly not taking this into account.
71+
* `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One
72+
reason for this is that enum layout optimizations may rely on references
73+
(including slices of any length) being aligned and non-null to distinguish
74+
them from other data. You can obtain a pointer that is usable as `data`
75+
for zero-length slices using [`NonNull::dangling()`].
76+
77+
* `data` must point to `len` consecutive properly initialized values of type `T`.
78+
79+
* The memory referenced by the returned slice must not be mutated for the duration
80+
of lifetime `'a`, except inside an `UnsafeCell`.
81+
82+
* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,
83+
and adding that size to `data` must not "wrap around" the address space.
84+
See the safety documentation of [`pointer::offset`].
85+
86+
# Drawbacks
87+
[drawbacks]: #drawbacks
88+
89+
## Zero-sized types
90+
91+
One could imagine representing `&[T]` as only `len` for zero-sized `T`.
92+
This proposal would preclude that choice in favor of a standard representation
93+
for slices regardless of the underlying type.
94+
95+
Alternatively, we could choose to guarantee that the data pointer is present if
96+
and only if `size_of::<T> != 0`. This has the possibility of breaking exising
97+
code which smuggles pointers through the `data` value in `from_raw_parts` /
98+
`into_raw_parts`.
99+
100+
## Uninhabited types
101+
102+
Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]`
103+
types into a ZST since the slice can only ever be length zero. This may offer
104+
modest performance benefits for highly generic code which happens to create
105+
empty slices of uninhabited types, but this is unlikely to be worth the
106+
cost of maintaining a special case.
107+
108+
## Compatibility with C++ `std::span`
109+
110+
The largest drawback of this layout and set of validity requirements is that it
111+
may preclude `&[T]` from being representationally equivalent to C++'s
112+
`std::span<T, std::dynamic_extent>`.
113+
114+
* `std::span` does not currently guarantee its layout. In practice, pointer + length
115+
is the common representation. This is even observable using `is_layout_compatible`
116+
[on MSVC](https://godbolt.org/z/Y8ardrshY), though not
117+
[on GCC](https://godbolt.org/z/s4v4xehnG) nor
118+
[on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a
119+
different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy
120+
requirements) could preclude matching the layout with `&[T]`.
121+
122+
* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One
123+
possibile workaround for this would be to guarantee that `Option<&[T]>` uses
124+
`data: std::ptr::null()` to represent the `None` case, making `std::span<T>`
125+
equivalent to `Option<&[T]>` for non-zero-sized types.
126+
127+
* Rust uses a dangling pointer in the representation of zero-length slices.
128+
It's unclear whether C++ guarantees that a dangling pointer will remain
129+
unchanged when passed through `std::span`. However, it does support
130+
dangling pointers during regular construction via the use of
131+
[`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span)
132+
in the iterator constructors.
133+
134+
Note that C++ also does not support zero-sized types, so there is no naiive way
135+
to represent types like `std::span<SomeZeroSizedRustType>`.
136+
137+
## Flexibility
138+
139+
Additionally, guaranteeing layout of Rust-native types limits the compiler's and
140+
standard library's ability to change and take advantage of new optimization
141+
opportunities.
142+
143+
# Rationale and alternatives
144+
[rationale-and-alternatives]: #rationale-and-alternatives
145+
146+
* We could avoid committing to a particular representation for slices.
147+
148+
* We could try to guarantee layout compatibility with a particular target's
149+
`std::span` representation, though without standardization this may be
150+
impossible. Multiple different C++ stdlib implementations may be used on
151+
the same platform and could potentially have different span representations.
152+
In practice, current span representations also use ptr+len pairs.
153+
154+
* We could avoid storing a data pointer for zero-sized types. This would result
155+
in a more compact representation but would mean that the representation of
156+
`&[T]` is dependent on the type of `T`.
157+
158+
# Prior art
159+
[prior-art]: #prior-art
160+
161+
The layout in this RFC is already documented in
162+
[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)
163+
164+
# Unresolved questions
165+
[unresolved-questions]: #unresolved-questions
166+
167+
* Should `&[T]` include a pointer when `T` is zero-sized?
168+
169+
# Future possibilities
170+
[future-possibilities]: #future-possibilities
171+
172+
* Consider defining a separate Rust type which is repr-equivalent to the platform's
173+
native `std::span<T, std::dynamic_extent>` to allow for easier
174+
interoperability with C++ APIs. Unfortunately, the C++ standard does not
175+
guarantee the layout of `std::span` (though the representation may be known
176+
and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).
177+
Zero-sized types would also not be supported with a naiive implementation of
178+
such a type.

0 commit comments

Comments
 (0)