You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In very broad terms probabilistic data structures (PDS) allow us to get to a "close enough" result in a much shorter time and by using significantly less memory.
1
+
In addition to a comprehensive set of basic data structures, Redis supports the following probabilistic data structures:
2
2
3
-
Redis Stack supports 4 of the most famous PDS:
4
-
- Bloom filters
5
-
- Cuckoo filters
6
-
- Count-Min Sketch
3
+
- Bloom filter
4
+
- Cuckoo filter
5
+
- Count-min sketch
7
6
- Top-K
7
+
- t-digest
8
+
- Hyperloglog
8
9
9
-
In the rest of this tutorial we'll introduce how you can use a Bloom filter to save many heavy calls to the relational database, or a lot of memory, compared to using sets or hashes.
10
-
A Bloom filter is a probabilistic data structure that enables you to check if an element is present in a set using a very small memory space of a fixed size. **It can guarantee the absence of an element from a set, but it can only give an estimation about its presence**. So when it responds that an element is not present in a set (a negative answer), you can be sure that indeed is the case. However, one out of every N positive answers will be wrong.
11
-
Even though it looks unusual at a first glance, this kind of uncertainty still has its place in computer science. There are many cases out there where a negative answer will prevent very costly operations;
10
+
In this tutorial you will learn how to use a Bloom filter, a resource efficient data structure, in a bike shop use case.
12
11
13
-
How can a Bloom filter be useful to our bike shop? For starters, we could keep a Bloom filter that stores all usernames of people who've already registered with our service. That way, when someone is creating a new account we can very quickly check if that username is free. If the answer is yes, we'd still have to go and check the main database for the precise result, but if the answer is no, we can skip that call and continue with the registration.
12
+
A Bloom filter allows you to check if an element is present in a set in a memory-efficient and scalable way. A query will return one of two possible answers:
14
13
15
-
Another, perhaps more interesting example is for showing better and more relevant ads to users. We could keep a bloom filter per user with all the products they've bought from the shop, and when we get a list of products from our suggestion engine we could check it against this filter.
14
+
1. the element *might* be in the set
15
+
2. the element is definitely not in the set
16
16
17
+
In other words, a Bloom filter will guarantee the absence of an element in a set, but it can only give an estimation about its presence. False positives are entirely possible. See [this Wikipedia article](https://en.wikipedia.org/wiki/Bloom_filter) for more detailed information about false positives and their frequency.
17
18
18
-
```redis Add all bought product ids in the Bloom filter
19
+
Despite the uncertainty involved when using Bloom filters, they are still valuable for many applications.
20
+
21
+
How can a Bloom filter be helpful to an online bike shop service? For starters, you can use a Bloom filter to store the usernames of people who've already registered with the shop. When someone creates a new account, the system can very quickly check if the user's proposed username is available. If the answer is yes, you can confirm using your primary database. But, if the answer is no, further checks are not required and registration can proceed.
22
+
23
+
Another use case is targeting ads to users. A per-user Bloom filter can be created and populated with all the products each user has purchased from the shop. When the shop's ad suggestion engine provides a list of possible ads to show a user, it can check each item against the user's Bloom filter. Each item that is not part of the filter are good targets. For each item that might already be part of the filter, a second query can be made to the primary database to confirm. If the second confirmation is negative, then that ad can be added to the target list.
24
+
25
+
First, create the Bloom filter.
26
+
27
+
```redis Add all bought product IDs to a Bloom filter
Just before we try to show an ad to a user, we can first check if that product id is already in their "bought products" Bloom filter. If the answer is yes - we might choose to check the main database, or we might skip to the next recommendation from our list. But if the answer is no, then we know for sure that our user hasn't bought that product:
31
+
Next, run a couple of queries.
23
32
24
33
```redis Has a user bought this product?
25
34
BF.EXISTS user:778:bought_products 1234567 // No, the user has not bought this product
26
35
BF.EXISTS user:778:bought_products 3178945 // The user might have bought this product
27
36
```
37
+
38
+
You can read more about Bloom filters and their use cases [here](https://redis.io/docs/data-types/probabilistic/bloom-filter/). See [here](https://redis.io/commands/?group=bf) for the complete list of Bloom filter commands.
Copy file name to clipboardExpand all lines: src/redis_stack/redis_for_time_series.md
+10-15Lines changed: 10 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,8 @@
1
-
Among other things, Redis Stack provides you with a native time series data structure. Let's see how a time series might
2
-
be useful in our bike shop.
1
+
This tutorial will demonstrate Redis Stack's ability to store time series data using the bike shop use case.
3
2
4
-
As we have multiple physical shops too, alongside our online shop, it could be helpful to have an overview of the sales
5
-
volume. We will create one time series per shop tracking the total amount of all sales. In addition, we will mark the
6
-
time series with the appropriate region label, `east` or `west`. This kind of representation will allow us to easily
7
-
query bike sales performance per certain time periods, per shop, per region or across all shops.
3
+
The bike shop company consists of multiple physical stores and an online presense. It would be helpful to have an aggregate view of sales volume across the physical and online stores.
4
+
5
+
The following example creates a time series key for five bike shops to track the total sales for each. Each key has an appropriate region label, `east` or `west`. This kind of representation will allow you to easily query bike sales performance over specific periods on a per-shop or per-region basis.
8
6
9
7
```redis Create time series per shop
10
8
TS.CREATE bike_sales_1 DUPLICATE_POLICY SUM LABELS region east compacted no
@@ -14,12 +12,10 @@ TS.CREATE bike_sales_4 DUPLICATE_POLICY SUM LABELS region west compacted no
14
12
TS.CREATE bike_sales_5 DUPLICATE_POLICY SUM LABELS region west compacted no
15
13
```
16
14
17
-
In the above query, we make the shop id (1,2,3,4,5) a part of the time series name. You might also notice
18
-
the `DUPLICATE_POLICY SUM` argument; this describes what should be done when two events in the same time series share
19
-
the same timestamp: In this case, it would mean that two sales happened at exactly the same time, so the resulting value
20
-
should be a sum of the two sales amounts.
15
+
Notice the `DUPLICATE_POLICY SUM` arguments; these describe how Redis should behave when two events in the same store and region have
16
+
the same timestamp. In this case, two sales that happen at exactly the same time in a particular store and region are added together.
21
17
22
-
Since the metrics are collected with a millisecond timestamp, we can compact our time series into sales per hour:
18
+
Time series data are collected at a millisecond resolution. You can compact time series data and make it available in various sized aggregations. Here's an example of aggregating data by day:
23
19
24
20
```redis Time series compaction
25
21
TS.CREATE bike_sales_1_per_day LABELS region east compacted yes
You can read more about time series and use cases [here](https://redis.io/docs/data-types/timeseries/). See [here](https://redis.io/commands/?group=timeseries) for the complete list of time series commands.
0 commit comments