Skip to content

spilled() is very slow #361

@Brogolem35

Description

@Brogolem35

I am the author of the markov_str library and I trying out the SmallVec to speed up the creation of my Markov Chains. But I have encountered this issue while benchmarking with cargo flamegraph: equvalance checks between SmallVecs take too long and almost all the time is spent on spilled().

Example:

MarkovChain type:

pub struct MarkovChain {
	items: HashMap<SmallVec<[Spur; 4]>, ChainItem>,
	state_size: usize,
	regex: Regex,
	cache: Rodeo,
}

benchmarked function:

	/// Adds text as training data. The tokens will be created with the regex of the MarkovChain.
	pub fn add_text(&mut self, text: &str) {
		let tokens: Vec<Spur> = self
			.regex
			.find_iter(text)
			.map(|t| self.cache.get_or_intern(t.as_str()))
			.collect();

		// vec.windows(0) panics for some reason.
		if tokens.is_empty() {
			return;
		}

		// Creating a preallocated buffer and filling and cleaning it instead of creating a new one every loop is way more efficient.
		let mut prevbuf: SmallVec<[Spur; 4]> = SmallVec::with_capacity(self.state_size);
		for win in tokens.windows(tokens.len().min(self.state_size + 1)) {
			let rel = win.last().unwrap();

			for i in 1..win.len() {
				prevbuf.clear();
				for t in win.iter().rev().skip(1).take(i).rev() {
					prevbuf.push(*t);
				}

				match self.items.raw_entry_mut().from_key(&prevbuf) {
					RawEntryMut::Occupied(mut view) => {
						view.get_mut().add(*rel);
					}
					RawEntryMut::Vacant(view) => {
						view.insert(prevbuf.clone(), ChainItem::new(*rel));
					}
				}
			}

Crates used:

hashbrown = "0.15.0"
lasso = {version = "0.7.3", features = ["ahasher", "inline-more"]}
rand = "0.8.5"
regex = "1.11.0"
smallvec = "1.13.2"

Flamegraph output:
image

The code and sample data I used for this benchmark can be found at Brogolem35/markov_str/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions