|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: 0.18 — lazy UTF-8 and no-alloc |
| 4 | +author: dwrensha |
| 5 | +--- |
| 6 | + |
| 7 | +New release alert! |
| 8 | +Version 0.18 of [capnproto-rust](https://github.com/capnproto/capnproto-rust) |
| 9 | +is now [available on crates.io](https://crates.io/crates/capnp). |
| 10 | + |
| 11 | +If you use capnproto-rust on data with |
| 12 | +the [`Text` built-in type](https://capnproto.org/language.html#built-in-types), |
| 13 | +then it's likely that this release will require some |
| 14 | +updates to your code. |
| 15 | +But don't worry — the changes are straightforward and they bring some |
| 16 | +important benefits. |
| 17 | + |
| 18 | +## lazy UTF-8 validation |
| 19 | + |
| 20 | +Suppose we have the following struct defined in a Cap'n Proto schema: |
| 21 | + |
| 22 | +``` |
| 23 | +struct Foo { |
| 24 | + oneText @0 :Text; |
| 25 | + anotherText @1 :Text; |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +Then, in Rust, these `Text` fields can be accessed through the `text::Reader` type: |
| 30 | + |
| 31 | +```rust |
| 32 | +let my_foo: foo::Reader = ...; |
| 33 | +let one_text: capnp::text::Reader<'_> = my_foo.get_one_text()?; |
| 34 | +let another_text: capnp::text::Reader<'_> = my_foo.get_another_text()?; |
| 35 | +``` |
| 36 | + |
| 37 | +But what exactly is a `text::Reader`? |
| 38 | + |
| 39 | + |
| 40 | +### the old definition |
| 41 | + |
| 42 | +In previous versions of capnproto-rust, the `text::Reader` type |
| 43 | +was an alias to Rust's `&str` type: |
| 44 | + |
| 45 | + |
| 46 | +```rust |
| 47 | +pub mod text { |
| 48 | + type Reader<'a> = &'a str; |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +At first glance, this seems like a perfect fit. |
| 53 | +A Cap'n Proto `Text` value is required to |
| 54 | +contain valid UTF-8 data, just like a Rust `&str`, |
| 55 | +and a `text::Reader` is meant to represent |
| 56 | +a reference to that data. |
| 57 | + |
| 58 | +However, in practice, there are some ways in which this representation |
| 59 | +falls short. |
| 60 | + |
| 61 | +* **performance**: Validating UTF-8 data has a cost, |
| 62 | + and ideally we would like to avoid paying it multiple |
| 63 | + times on the same data. If `text::Reader` is just |
| 64 | + `&str`, then we need to validate every time that we: |
| 65 | + - copy a text field from one message to another, |
| 66 | + - write a text field to a file, |
| 67 | + - write a file to a text field, or |
| 68 | + - access some sub-range of a text field. |
| 69 | + |
| 70 | + This goes against the general Cap'n Proto philosophy |
| 71 | + of doing validation as lazily as possible. |
| 72 | + |
| 73 | +* **robustness** -- If a text field holds corrupted data, then |
| 74 | + you still might want to be able to access that data, even |
| 75 | + if it is not valid UTF-8. For example, imagine that a text |
| 76 | + field holds log messages from a web server. We should |
| 77 | + still be able to read the messages, even if they are garbled. |
| 78 | + Indeed, garbled messages are probably the most interesting ones, |
| 79 | + as they indicate unexpected behavior. |
| 80 | + See [this issue](https://github.com/capnproto/capnproto-rust/issues/314) |
| 81 | + for more discussion. |
| 82 | + |
| 83 | + |
| 84 | +### the new definition |
| 85 | + |
| 86 | +To address the above-noted shortcomings, |
| 87 | +version 0.18 of capnproto-rust defines `text::Reader` |
| 88 | +like this: |
| 89 | + |
| 90 | +```rust |
| 91 | +pub mod text { |
| 92 | + /// Wrapper around utf-8 encoded text. |
| 93 | + /// This is defined as a tuple struct to allow pattern matching |
| 94 | + /// on it via byte literals (for example `text::Reader(b"hello")`). |
| 95 | + #[derive(Copy, Clone, PartialEq)] |
| 96 | + pub struct Reader<'a>(pub &'a [u8]); |
| 97 | + |
| 98 | + impl<'a> Reader<'a> { |
| 99 | + pub fn as_bytes(self) -> &'a [u8] { ... } |
| 100 | + pub fn to_str(self) -> Result<&'a str, Utf8Error> { ... } |
| 101 | + pub fn to_string(self) -> Result<String, Utf8Error> { ... } |
| 102 | + } |
| 103 | + |
| 104 | + impl<'a> From<&'a str> for Reader<'a> { ... } |
| 105 | + impl<'a> From<&'a [u8]> for Reader<'a> { ... }} |
| 106 | +} |
| 107 | +``` |
| 108 | +Now consumers can easily access the underlying data, via `as_bytes()`, |
| 109 | +and getting it as a `&str` or `String` just requires an extra `to_str()` |
| 110 | +or `to_string()` call. |
| 111 | + |
| 112 | +When setting text fields in a message, you will now need to |
| 113 | +insert some `.into()` calls to convert from a `str` or `String` |
| 114 | +into a `text::Reader`, like this: |
| 115 | + |
| 116 | +```rust |
| 117 | +let name: &str = "alice"; |
| 118 | +let mut my_foo: foo::Builder = ...; |
| 119 | +my_foo.set_one_text("hello world".into())?; |
| 120 | +my_foo.set_another_text(format!("hello {name}")[..].into())?; |
| 121 | +``` |
| 122 | + |
| 123 | +All this is admittedly more verbose than it was before, |
| 124 | +but it's in keeping with the general spirit of capnproto-rust: |
| 125 | +we are willing to introduce some verbosity |
| 126 | +if that's what it takes to model Cap'n Proto data |
| 127 | +in a satisfactory way. |
| 128 | + |
| 129 | + |
| 130 | +## no-alloc mode |
| 131 | + |
| 132 | +Another new feature is no-alloc mode. |
| 133 | + |
| 134 | +In version 0.13, capnproto-rust |
| 135 | +[gained support for no_std environments]({{site.baseurl}}/2020/06/06/no-std-support.html). |
| 136 | +However, it still depended on the [`alloc`](https://doc.rust-lang.org/alloc/) crate, |
| 137 | +which can sometimes be a problem for microcontroller targets and kernel programming. |
| 138 | +(See [this issue](https://github.com/capnproto/capnproto-rust/issues/221) |
| 139 | +for some discussion.) |
| 140 | + |
| 141 | +Starting with version 0.18, the `capnp` crate now has an `alloc` Cargo feature, |
| 142 | +which can be disabled to remove the `alloc` dependency. |
| 143 | + |
| 144 | +A side benefit of this change is that now error handling in capnproto-rust |
| 145 | +is much less dependent on heap allocation, and so should have better |
| 146 | +performance and be more reliable. |
| 147 | + |
0 commit comments