-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Multiaddr: Replace Arc<Vec<u8>> with inline storage #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3dc7801
to
c84042f
Compare
Note: this is basically identical to the optimization that has been done in rust-multihash in multiformats/rust-multihash#47 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what my review is worth this looks good to me.
I would prefer every performance optimization to come with a benchmark preventing future-us to introduce performance regressions. Something along the lines of https://gist.github.com/vmx/0dc0d981b522e2834084559faa0dcb56. What do you all think?
@mxinden I can add a clone benchmark that compares cloning multiaddrs that are slightly below the limit to multiadds that are slightly above the limit. But it will be hard to integrate this into CI, so it would just be something you run on demand when you want to mess with this part of the code... |
Added a small bench and a test that shows where the limits of inline storage are. Here are the results:
Note that the main point of the optimization is to avoid touching the heap and reducing the total memory usage, but a factor 3 speedup in cloning is a nice benefit. Since cloning involves just increasing the strong_count of the arc, it is independent of size for the heap case. |
73cad48
to
cdbf554
Compare
/// a multiaddr containing an ipv4 or ipv6 address and port. | ||
/// | ||
/// More complex multiaddrs like those containing peer ids will be stored on the heap. | ||
const MAX_INLINE: usize = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 62
? Most peer IDs would fit in there as well and peer IDs are quite a common case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make it 62. I have no idea about the size distribution of the typical multiaddr. From what I have seen in rust-libp2p, and for what I intend to use this for, you often retain addresses for other nodes, which are just host/port/protocol. If you say that the ones containing a peer id are sufficiently common, I will change this to 62.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless large multiaddrs are way more popular than I think, 64 would be too big. If you go to the arced version for large multihashes, it is not the end of the world. And there are situations where having lots of 64 byte objects would be worse than having a bunch of pointers to arcs.
misc/multiaddr/src/lib.rs
Outdated
Some(protocol) | ||
} | ||
|
||
/// Like [`Multiaddr::push`] but consumes `self`. | ||
pub fn with(mut self, p: Protocol<'_>) -> Self { | ||
let mut w = io::Cursor::<&mut Vec<u8>>::new(Arc::make_mut(&mut self.bytes)); | ||
let mut w = io::Cursor::new(self.to_vec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not avoid this allocation if p
fits into Storage::Inline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Sorry, what I had in mind was to use diff --git a/misc/multiaddr/Cargo.toml b/misc/multiaddr/Cargo.toml
index c30b4504..a6ce960d 100644
--- a/misc/multiaddr/Cargo.toml
+++ b/misc/multiaddr/Cargo.toml
@@ -16,6 +16,7 @@ data-encoding = "2.1"
multihash = "0.10"
percent-encoding = "2.1.0"
serde = "1.0.70"
+smallvec = { version = "1.0", features = ["write"] }
static_assertions = "1.1"
unsigned-varint = "0.3"
url = { version = "2.1.0", default-features = false }
diff --git a/misc/multiaddr/src/lib.rs b/misc/multiaddr/src/lib.rs
index f81b8ada..58906a48 100644
--- a/misc/multiaddr/src/lib.rs
+++ b/misc/multiaddr/src/lib.rs
@@ -14,10 +14,10 @@ use serde::{
Serializer,
de::{self, Error as DeserializerError}
};
+use smallvec::SmallVec;
use std::{
convert::TryFrom,
fmt,
- io,
iter::FromIterator,
net::{IpAddr, Ipv4Addr, Ipv6Addr},
result::Result as StdResult,
@@ -35,19 +35,22 @@ static_assertions::const_assert! {
std::mem::size_of::<usize>() <= std::mem::size_of::<u64>()
}
+/// Size to use with `SmallVec`.
+const INLINE_SIZE: usize = 32;
+
/// Representation of a Multiaddr.
#[derive(PartialEq, Eq, Clone, Hash)]
-pub struct Multiaddr { bytes: Arc<Vec<u8>> }
+pub struct Multiaddr { bytes: Arc<SmallVec<[u8; INLINE_SIZE]>> }
impl Multiaddr {
/// Create a new, empty multiaddress.
pub fn empty() -> Self {
- Self { bytes: Arc::new(Vec::new()) }
+ Self { bytes: Arc::new(SmallVec::new()) }
}
/// Create a new, empty multiaddress with the given capacity.
pub fn with_capacity(n: usize) -> Self {
- Self { bytes: Arc::new(Vec::with_capacity(n)) }
+ Self { bytes: Arc::new(SmallVec::with_capacity(n)) }
}
/// Return the length in bytes of this multiaddress.
@@ -73,9 +76,8 @@ impl Multiaddr {
/// ```
///
pub fn push(&mut self, p: Protocol<'_>) {
- let mut w = io::Cursor::<&mut Vec<u8>>::new(Arc::make_mut(&mut self.bytes));
- w.set_position(w.get_ref().len() as u64);
- p.write_bytes(&mut w).expect("Writing to a `io::Cursor<&mut Vec<u8>>` never fails.")
+ let mut w = Arc::make_mut(&mut self.bytes);
+ p.write_bytes(&mut w).expect("Writing to a `SmallVec` never fails.")
}
/// Pops the last `Protocol` of this multiaddr, or `None` if the multiaddr is empty.
@@ -107,9 +109,8 @@ impl Multiaddr {
/// Like [`Multiaddr::push`] but consumes `self`.
pub fn with(mut self, p: Protocol<'_>) -> Self {
- let mut w = io::Cursor::<&mut Vec<u8>>::new(Arc::make_mut(&mut self.bytes));
- w.set_position(w.get_ref().len() as u64);
- p.write_bytes(&mut w).expect("Writing to a `io::Cursor<&mut Vec<u8>>` never fails.");
+ let mut w = Arc::make_mut(&mut self.bytes);
+ p.write_bytes(&mut w).expect("Writing to a `SmallVec` never fails.");
self
}
@@ -212,7 +213,7 @@ impl<'a> FromIterator<Protocol<'a>> for Multiaddr {
where
T: IntoIterator<Item = Protocol<'a>>,
{
- let mut writer = Vec::new();
+ let mut writer = SmallVec::new();
for cmp in iter {
cmp.write_bytes(&mut writer).expect("Writing to a `Vec` never fails.");
}
@@ -224,7 +225,7 @@ impl FromStr for Multiaddr {
type Err = Error;
fn from_str(input: &str) -> Result<Self> {
- let mut writer = Vec::new();
+ let mut writer = SmallVec::new();
let mut parts = input.split('/').peekable();
if Some("") != parts.next() {
@@ -262,7 +263,7 @@ impl<'a> Iterator for Iter<'a> {
impl<'a> From<Protocol<'a>> for Multiaddr {
fn from(p: Protocol<'a>) -> Multiaddr {
- let mut w = Vec::new();
+ let mut w = SmallVec::new();
p.write_bytes(&mut w).expect("Writing to a `Vec` never fails.");
Multiaddr { bytes: Arc::new(w) }
}
@@ -299,7 +300,7 @@ impl TryFrom<Vec<u8>> for Multiaddr {
let (_, s) = Protocol::from_bytes(slice)?;
slice = s
}
- Ok(Multiaddr { bytes: Arc::new(v) })
+ Ok(Multiaddr { bytes: Arc::new(SmallVec::from(v)) })
}
} |
Ah sorry, that makes no sense. We still do the Arc. |
} | ||
|
||
#[test] | ||
fn roundtrip() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test not redundant given the property test below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this one is guaranteed to hit all the interesting cases, whereas with the property based test you depend on luck. But this was copied from multihash, which is an independent repository. So if you want to reduce the number of tests for rust-libp2p, which is a bit bigger, I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are concerned about the test runtime, it would probably be best to keep this one and drop the property based test.
48def4c
to
7e96076
Compare
Storage stores small bytes inline, and spills to the heap only once that is no longer possible.
Co-Authored-By: Max Inden <mail@max-inden.de>
Co-Authored-By: Toralf Wittner <tw@dtex.org>
7e96076
to
0c0485f
Compare
This comment is kind of irrelevant here since we're also using To me, instead of optimizing the insides of the I also strongly think that we're micro-optimizing here. Benchmarking how much time it takes to clone a That being said I don't care that much and I'm not opposed to merging this. |
Just to be clear, the purpose of this change is not to make clone faster. That is just a welcome but not very important side effect. I am perfectly happy with the speed of cloning an The purpose of the optimization is to reduce the memory usage of a multiaddr, make memory usage more deterministic, and to avoid touching the heap at all in the average case. (The worst case memory usage of an Arc<Vec> can be arbitrarily large and even in the average case will be significantly more than just the raw bytes, because it is two objects on the heap. I agree that this is an optimization that does not matter at all unless you are running on very low powered hardware and/or have a very large number of multiaddrs sticking around. But if multiaddr is supposed to be a general purpose crate and not just a support library for parity, then these situations can happen. E.g. some of the ipfs nodes used by pinning services like ipfs pinata are keeping a quite large number of connections open to work around deficiencies of the current content routing. Besides, I think as far as microoptimizations go, this one is pretty harmless and self contained. It does not add unsafe code or additional dependencies (except for the SmallVec, which I could also remove again). The optimization was originally done for multi hash, for which it definitely matters because an ipfs node will keep a very large number of multihashes around. I thought it would be nice to carry it over to multiaddr, where admittedly it matters a bit less. |
If anything, this change ties the multiaddr more to a specific usage.
I don't understand. If you're saying that you are keeping a lot of multihashes alive, then this PR will worsen your memory usage, and cloning an inline multiaddr/multihash is presumably more expensive than cloning an |
If you have large multiaddrs and do not benefit from the inline case, a Multiaddr that is essentially an
If you have a lot of distinct multiaddrs that fit the inline case, this will make memory usage better since you don't touch the heap at all. If you have a lot of distinct multiaddrs that do not fit the inline case, it will still make things better over an
Unless you go to very large inline multiaddrs, it is significantly cheaper than cloning an But this whole thing is not worth having a heated discussion about. I think this is a small but in some cases significant improvement. If you think this is a bad tradeoff and don't want to merge this, I am fine with that as well. |
Just out of curiosity, I ran a small example that just allocates a bunch of multiaddrs, on this branch and on master, for the inline and non-inline case, and ran heaptrack on it. use multiaddr::Multiaddr;
fn main() {
let inline = "/dns4/01234567890123456789123/tcp/80/ws";
let heap = "/dns4/0123456789012345678901234/tcp/80/ws";
let mut t: Vec<Multiaddr> = Vec::new();
for _ in 0..1000000 {
// t.push(heap.parse().unwrap());
t.push(inline.parse().unwrap());
}
} This branch / inline
This branch / heap
master / same size as inline case above
master / same size as heap case above
So in the inline case, this branch consumes quite a bit less memory than master. But also in the heap case, this branch consumes a bit less memory than master, presumably because Not sure if there is a better way to measure the actual heap consumption including overhead. Was looking for a working version of https://docs.rs/heapsize/0.4.2/heapsize/ that works with current rust, but could not find anything quickly. |
@tomaka I am going to close this. I think doing this in a way that is more agreeable to you is beyond my current rust capabilities. Ideally you would want to separate the notion of a multiaddr (which is just a refinement of That would be a bit more work, which might not be worth it... |
Storage stores small bytes inline, and spills to the heap only once that is no longer possible.
A typical small multiaddr consisting of e.g. ip4 host/port, ip6 host/port etc. will be stored inline. Only some complex multiaddrs containing multiple peer ids will be stored on the heap as an Arc<[u8]>.
The
push
andwith
fns allocate temporary vectors, but I think this is an acceptable tradeoff for making the in memory representation of multiaddrs much smaller. Many of the multiadds that you have you don't ever manually build but get from somewhere else.