-
Notifications
You must be signed in to change notification settings - Fork 154
Open
Description
I am the author of the markov_str library and I trying out the SmallVec
to speed up the creation of my Markov Chains. But I have encountered this issue while benchmarking with cargo flamegraph: equvalance checks between SmallVec
s take too long and almost all the time is spent on spilled()
.
Example:
MarkovChain type:
pub struct MarkovChain {
items: HashMap<SmallVec<[Spur; 4]>, ChainItem>,
state_size: usize,
regex: Regex,
cache: Rodeo,
}
benchmarked function:
/// Adds text as training data. The tokens will be created with the regex of the MarkovChain.
pub fn add_text(&mut self, text: &str) {
let tokens: Vec<Spur> = self
.regex
.find_iter(text)
.map(|t| self.cache.get_or_intern(t.as_str()))
.collect();
// vec.windows(0) panics for some reason.
if tokens.is_empty() {
return;
}
// Creating a preallocated buffer and filling and cleaning it instead of creating a new one every loop is way more efficient.
let mut prevbuf: SmallVec<[Spur; 4]> = SmallVec::with_capacity(self.state_size);
for win in tokens.windows(tokens.len().min(self.state_size + 1)) {
let rel = win.last().unwrap();
for i in 1..win.len() {
prevbuf.clear();
for t in win.iter().rev().skip(1).take(i).rev() {
prevbuf.push(*t);
}
match self.items.raw_entry_mut().from_key(&prevbuf) {
RawEntryMut::Occupied(mut view) => {
view.get_mut().add(*rel);
}
RawEntryMut::Vacant(view) => {
view.insert(prevbuf.clone(), ChainItem::new(*rel));
}
}
}
Crates used:
hashbrown = "0.15.0"
lasso = {version = "0.7.3", features = ["ahasher", "inline-more"]}
rand = "0.8.5"
regex = "1.11.0"
smallvec = "1.13.2"
The code and sample data I used for this benchmark can be found at Brogolem35/markov_str/.
Metadata
Metadata
Assignees
Labels
No labels