Skip to content

feat(cache): a variant of sieve, with lazy op #13904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

PsiACE
Copy link
Contributor

@PsiACE PsiACE commented Dec 2, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This optimization is inspired by the sieve, which can reduce unnecessary element movements and has a potential filtering effect.

Simply put, we maintain the visited status of each key and try to evict those elements that were inserted a long time ago but have not been accessed yet.

  • Closes #issue

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Dec 2, 2023
@PsiACE
Copy link
Contributor Author

PsiACE commented Dec 2, 2023

It is no longer Lru and theoretically has better properties. For the sake of evaluation, I did not change the name. You can refer to the unit tests for a quick understanding.

The impact on Databend performance and cache hit rate needs further evaluation.

@PsiACE PsiACE marked this pull request as draft December 2, 2023 20:37
@PsiACE PsiACE marked this pull request as ready for review December 3, 2023 04:18
@PsiACE PsiACE requested review from BohuTANG and dantengsky December 3, 2023 04:20
@PsiACE
Copy link
Contributor Author

PsiACE commented Dec 3, 2023

This algorithm is storagebackend-friendly. In fact, we only need to maintain a "visited" linkedhashmap as the state. Since the state and storage are decoupled, we can consider refactoring our disk cache and implementing S3 cache in the future.

@PsiACE PsiACE added the ci-cloud Build docker image for cloud test label Dec 4, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Docker Image for PR

  • tag: pr-13904-a2c9e86

note: this image tag is only available for internal use,
please check the internal doc for more details.

@JackTan25 JackTan25 added the ci-benchmark Benchmark: run all test label Dec 4, 2023
@JackTan25
Copy link
Contributor

JackTan25 commented Dec 4, 2023

image

well, this is the core idea of sieve cache. And the initial state of 'hand' p is null, it will be initialized by the oldest object which is the tail of the queue. And this cache algorithm is good to extended to the existed cache evict algorithms,
image

Copy link
Contributor

github-actions bot commented Dec 4, 2023

Docker Image for PR

  • tag: pr-13904-c5909e1

note: this image tag is only available for internal use,
please check the internal doc for more details.

@JackTan25
Copy link
Contributor

JackTan25 commented Dec 4, 2023

img_v3_025q_5c62167d-bd32-472c-8db9-486ff37fdefg

the core idea of the sieve is to reduce the duplicated obj insert

Copy link
Contributor

github-actions bot commented Dec 4, 2023

@PsiACE
Copy link
Contributor Author

PsiACE commented Dec 4, 2023

This version lacks a hand pointer, so there is no protection for the frequently accessed parts. However, personally, I think it is acceptable to evict and reload them. One possible optimization is to change "visited" into a counter with an upper bound, which would provide some protection mechanism. This approach seems somewhat like a version between s3-FIFO and sieve, but further evaluation is still needed. The specific impact needs to be discussed based on the workload.

@dantengsky
Copy link
Member

👍

Besides the missing of hand pointer, are there any other tweaks or improvements worth mentioning?

And about the 'SIEVE is not scan-resistant' thing mentioned in the Sieve paper - any idea how that may affect us?

Also, does find_evict_candidate need to iterate through all elements in visited every time?

https://github.com/datafuselabs/databend/blob/c8b06a5c86cdba178bbc23ac589db2cb5e32e42f/src/common/cache/src/cache/lru.rs#L163-L174

@PsiACE
Copy link
Contributor Author

PsiACE commented Dec 4, 2023

Besides the missing of hand pointer, are there any other tweaks or improvements worth mentioning?

No. But hand does have a significant meaning, so we will try to compare only this PR and Lru.

And about the 'SIEVE is not scan-resistant' thing mentioned in the Sieve paper - any idea how that may affect us?

If we frequently encounter large scans, then scan-resistant will be a very important feature. This means that the elements we insert will soon no longer be accessed. However, the Lru we previously used also does not have scan resistance. We can try using probability models or other methods to further improve it.

Also, does find_evict_candidate need to iterate through all elements in visited every time?

Currently, yes. This means it is an O(n) operation. Perhaps we can use other techniques to accomplish this since we only need to find an element that has not been accessed before.

One simple solution is to allow the elements in "visited" to be moved, so we actually only need a deque to complete the sorting, ensuring that the key to be removed is always at a certain position, depending on when we perform the move.

@PsiACE PsiACE marked this pull request as draft December 4, 2023 16:34
@PsiACE
Copy link
Contributor Author

PsiACE commented Dec 4, 2023

Although there seems to be improvement on the hits dataset, it performs similarly to Lru on some public traces and causes a decrease in throughput due to the O(n) traversal. I will try to make further modifications.

@1a1a11a
Copy link

1a1a11a commented Dec 10, 2023

@PragmaTwice Cool work!

One possible way to avoid the O(N) operations is to track the visited bit using a HashSet. Add to the HashSet upon get, and at eviction time, we iterate through the map; if an object has been visited, we put it back. Otherwise, we evict the object. In the worst case, we may have to check N objects, but we can cap this to some value, e.g., 20. There are more optimized solutions, but they would need more engineering work.

Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
Signed-off-by: Chojan Shang <psiace@apache.org>
@PsiACE PsiACE removed the ci-benchmark Benchmark: run all test label Dec 10, 2023
@PsiACE PsiACE added the ci-benchmark Benchmark: run all test label Dec 10, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13904-2090508

note: this image tag is only available for internal use,
please check the internal doc for more details.

Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants