Skip to content

Commit 5cd1c6c

Browse files
authored
Note design constraints on hypothetical DynSized
1 parent 35fc085 commit 5cd1c6c

File tree

1 file changed

+135
-0
lines changed

1 file changed

+135
-0
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Exotically sized types (`DynSized` and `extern type`)
2+
3+
## Overview
4+
5+
In current Rust, there's two kinds of types with respect to sizing:
6+
if a type is `Sized`, its layout (size and alignment) is known statically,
7+
and if a type is `?Sized`, its layout may not be known until runtime (e.g. via a vtable).
8+
9+
However, more exotically sized types exist; the most common example is opaque `extern type`.
10+
`extern type`s have an *unknown* layout to Rust, and as such can only be used behind a pointer type.
11+
Since the most unsized a type can currently be is `?Sized`, though,
12+
the compiler has to make up a size and alignment to return from `mem::size_of_val`/`align_of_val`.
13+
Currently the compiler returns a size of 0 and an alignment of 1.
14+
Lying in this fashion is considered undesirable \[2].
15+
16+
Additionally, some C-header-interface libraries expose an opaque (incomplete) type
17+
but also provide a function returning the size of the type and expect the caller to allocate space.
18+
This is useful to allow the library to change the size of the type,
19+
but still allow the caller to control allocation (e.g. using a custom arena allocator).
20+
When bridging to Rust, these types should ideally have access to dynamic size/align.
21+
22+
## Proposed Solution
23+
24+
The most obvious and independently reinvented solution is a "`DynSized`" trait that provides dynamic size/align information.
25+
`extern type` would not implement `DynSized`, and generic code could opt into `?DynSized` types to support such.
26+
27+
At the time of writing, there is weak approval from T-lang to proceed with an internal-only version of `DynSized`
28+
which is used to prohibit the use of `extern type` in standard `<T: ?Sized>` generic arguments \[2].
29+
30+
This design document is about the restrictions on what `T: ?Sized + DynSized` actually needs to imply.
31+
32+
## Design Constraints
33+
34+
### `Arc` and `Weak`
35+
36+
`Arc` supports "zombie" references, where all strong `Arc` and the pointee have been dropped,
37+
but `Weak` handles still exist and so the allocation still exists.
38+
This means that `Weak` needs to be able to determine the layout of the allocation from a dropped pointee.
39+
40+
In addition, `Weak` are pointers to the *reference count* part of the `ArcInner` allocation,
41+
and thus need to *statically* know the alignment of the pointee type to determine the offset
42+
(it cannot call `align_of_val_raw` without first knowing the offset).
43+
44+
For the alignment, there are three potential resolutions:
45+
46+
- Store layout information in the `ArcInner` header,
47+
- Require that alignment be determined solely from pointee metadata, or
48+
- Change the pointer of `Arc<T>` to point directly at `T` and use a fixed negative offset for the header.
49+
50+
For the both, there are three potential resolutions:
51+
52+
- Store layout information in the `ArcInner` header, or
53+
- Require that layout be determined solely from pointee metadata, or
54+
- Require that layout be determinable from a dropped pointee.
55+
56+
T-lang commented on this in \[3] (w.r.t. const `Weak<T>::[into|from]_raw` and `Weak::new`):
57+
58+
> Consensus from meeting:
59+
> - We approve the option to make `align_of_val_raw` require a once-valid-but-dropped value, in order to better support thin objects
60+
> - we believe the sentinel design (of `Weak::new`) means that `align_of_val_raw` is only ever invoked on once-valid-but-dropped values
61+
> - We do not want `align_of_val_raw` to be forced to work for metadata + thin pointer
62+
> - Implement `Weak::from_raw` to check for sentinel and take some special action if it is observed
63+
> - potential cost: for unsized types (only), there is an extra branch (but if custom dst doesn’t require \[dynamic] alignment, we can change this later)
64+
> - It is not really lang team’s call, but we are -1 on adding more fields to `Rc`/`Arc`
65+
> - For custom dst, the design will have to accommodate getting the size and alignment from “once-valid-but-dropped” values (values that were once valid but have been dropped); this is a non-issue for known use cases like c-string and thin-objects (which store a vtable)
66+
> - (but could be relevant for dynamically allocated vtables)
67+
68+
### `Mutex` (and more generally, `UnsafeCell`)
69+
70+
The problem statement here is the combination of `&Mutex<T>` and `&mut T` both being usable concurrently,
71+
plus the following presumably sound function:
72+
73+
```rust
74+
fn noop_write<T: ?Sized>(it: &mut T) {
75+
let len = std::mem::size_of_val(it);
76+
let ptr = it as *mut T as *mut u8;
77+
unsafe { std::ptr::copy(ptr, ptr, len); }
78+
}
79+
```
80+
81+
To make the conflict abundantly clear, consider the following:
82+
83+
```rust
84+
let mutex: &Mutex<ThinCStr> = /* elided */;
85+
86+
join(
87+
|| {
88+
let mut lock = mutex.lock();
89+
let it: &mut ThinCStr = &mut *lock;
90+
noop_write(it);
91+
},
92+
|| {
93+
std::mem::size_of_val(mutex);
94+
},
95+
);
96+
```
97+
98+
In order to determine the size of `Mutex<ThinCStr>`, you have to know the size of `ThinCStr`, which is inline to the `Mutex`.
99+
To determine the size of `ThinCStr`, you have to read every byte to find the terminating nul byte (equiv. call `strlen`).
100+
However, in the other fork, we lock the mutex and use the `&mut ThinCStr` to read and write-back every byte of the `ThinCStr`.
101+
Because the `&mut` side of the operation is surely nonatomic (and `strlen` likely isn't), this is an unsafe data race, thus UB.
102+
103+
This constraint is more difficult to resolve than the previous one coming from `Arc`/`Weak`.
104+
Fundamentally, types like `ThinCStr` which require reading the pointee to determine layout information break a core property of `UnsafeCell`
105+
that `&UnsafeCell<T>` cannot (safely) read (or write) any of `T`'s bytes, if `std::mem::size_of_val` works without locking.
106+
107+
Thus (at the time of writing) there are three known potential resolutions to this constraint:
108+
109+
- Require layout to be calculated solely from thin pointer and pointee metadata,
110+
- Require `size_of_val` to acquire a read lock (for `Mutex`-like types), or
111+
- Prohibit the use of pointee-determined-layout types in `Mutex`-like types.
112+
113+
## Potential Conclusions
114+
115+
This heading is the notes' author's (@CAD97's) opinion only:
116+
117+
From the above, there result *four* classes of sizedness that Rust *could* care about \[1]:
118+
119+
- "`T: Sized + MetaSized + DynSized`", where the size and alignment are known statically;
120+
- "`T: ?Sized + MetaSized + DynSized`", where the size and alignment are known from the data pointer and metadata;
121+
- "`T: ?Sized + ?MetaSized + DynSized`", where the size and alignment require reading the pointee; and
122+
- "`T: ?Sized + ?MetaSized + ?DynSized`", where the size and alignment cannot be determined by (generic) code.
123+
124+
Examples of these are respectively `u8`, `dyn Trait`, `ThinCStr`, and `extern type`.
125+
126+
@CAD97 posits that in the majority of cases,
127+
`OwningPointer<T>`-like types want "`?Sized + ?MetaSized + DynSized`",
128+
`Ref<T>`-like types want "`?Sized + ?MetaSized + ?DynSized`", and
129+
`UnsafeCell<T>`-like types want "`?Sized + MetaSized + DynSized`".
130+
131+
## References
132+
133+
- \[1] https://internals.rust-lang.org/t/erfc-minimal-custom-dsts-via-extern-type-dynsized/16591?u=cad97
134+
- \[2] https://github.com/rust-lang/rust/issues/49708
135+
- \[3] https://hackmd.io/7r3_is6uTz-163fsOV8Vfg

0 commit comments

Comments
 (0)