Skip to content

Commit b789981

Browse files
committed
Merge remote-tracking branch 'origin/doc-3110-dwdougherty' into doc-3179-dwdougherty
Start branch doc-3179-dwdougherty with both doc-3100* and doc-3110* branch content in place.
2 parents 96cc936 + 679455d commit b789981

File tree

4 files changed

+58
-50
lines changed

4 files changed

+58
-50
lines changed
Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,38 @@
1-
In very broad terms probabilistic data structures (PDS) allow us to get to a "close enough" result in a much shorter time and by using significantly less memory.
1+
In addition to a comprehensive set of basic data structures, Redis supports the following probabilistic data structures:
22

3-
Redis Stack supports 4 of the most famous PDS:
4-
- Bloom filters
5-
- Cuckoo filters
6-
- Count-Min Sketch
3+
- Bloom filter
4+
- Cuckoo filter
5+
- Count-min sketch
76
- Top-K
7+
- t-digest
8+
- Hyperloglog
89

9-
In the rest of this tutorial we'll introduce how you can use a Bloom filter to save many heavy calls to the relational database, or a lot of memory, compared to using sets or hashes.
10-
A Bloom filter is a probabilistic data structure that enables you to check if an element is present in a set using a very small memory space of a fixed size. **It can guarantee the absence of an element from a set, but it can only give an estimation about its presence**. So when it responds that an element is not present in a set (a negative answer), you can be sure that indeed is the case. However, one out of every N positive answers will be wrong.
11-
Even though it looks unusual at a first glance, this kind of uncertainty still has its place in computer science. There are many cases out there where a negative answer will prevent very costly operations;
10+
In this tutorial you will learn how to use a Bloom filter, a resource efficient data structure, in a bike shop use case.
1211

13-
How can a Bloom filter be useful to our bike shop? For starters, we could keep a Bloom filter that stores all usernames of people who've already registered with our service. That way, when someone is creating a new account we can very quickly check if that username is free. If the answer is yes, we'd still have to go and check the main database for the precise result, but if the answer is no, we can skip that call and continue with the registration.
12+
A Bloom filter allows you to check if an element is present in a set in a memory-efficient and scalable way. A query will return one of two possible answers:
1413

15-
Another, perhaps more interesting example is for showing better and more relevant ads to users. We could keep a bloom filter per user with all the products they've bought from the shop, and when we get a list of products from our suggestion engine we could check it against this filter.
14+
1. the element *might* be in the set
15+
2. the element is definitely not in the set
1616

17+
In other words, a Bloom filter will guarantee the absence of an element in a set, but it can only give an estimation about its presence. False positives are entirely possible. See [this Wikipedia article](https://en.wikipedia.org/wiki/Bloom_filter) for more detailed information about false positives and their frequency.
1718

18-
```redis Add all bought product ids in the Bloom filter
19+
Despite the uncertainty involved when using Bloom filters, they are still valuable for many applications.
20+
21+
How can a Bloom filter be helpful to an online bike shop service? For starters, you can use a Bloom filter to store the usernames of people who've already registered with the shop. When someone creates a new account, the system can very quickly check if the user's proposed username is available. If the answer is yes, you can confirm using your primary database. But, if the answer is no, further checks are not required and registration can proceed.
22+
23+
Another use case is targeting ads to users. A per-user Bloom filter can be created and populated with all the products each user has purchased from the shop. When the shop's ad suggestion engine provides a list of possible ads to show a user, it can check each item against the user's Bloom filter. Each item that is not part of the filter are good targets. For each item that might already be part of the filter, a second query can be made to the primary database to confirm. If the second confirmation is negative, then that ad can be added to the target list.
24+
25+
First, create the Bloom filter.
26+
27+
```redis Add all bought product IDs to a Bloom filter
1928
BF.MADD user:778:bought_products 4545667 9026875 3178945 4848754 1242449
2029
```
2130

22-
Just before we try to show an ad to a user, we can first check if that product id is already in their "bought products" Bloom filter. If the answer is yes - we might choose to check the main database, or we might skip to the next recommendation from our list. But if the answer is no, then we know for sure that our user hasn't bought that product:
31+
Next, run a couple of queries.
2332

2433
```redis Has a user bought this product?
2534
BF.EXISTS user:778:bought_products 1234567 // No, the user has not bought this product
2635
BF.EXISTS user:778:bought_products 3178945 // The user might have bought this product
2736
```
37+
38+
You can read more about Bloom filters and their use cases [here](https://redis.io/docs/data-types/probabilistic/bloom-filter/). See [here](https://redis.io/commands/?group=bf) for the complete list of Bloom filter commands.

src/redis_stack/redis_for_time_series.md

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
Among other things, Redis Stack provides you with a native time series data structure. Let's see how a time series might
2-
be useful in our bike shop.
1+
This tutorial will demonstrate Redis Stack's ability to store time series data using the bike shop use case.
32

4-
As we have multiple physical shops too, alongside our online shop, it could be helpful to have an overview of the sales
5-
volume. We will create one time series per shop tracking the total amount of all sales. In addition, we will mark the
6-
time series with the appropriate region label, `east` or `west`. This kind of representation will allow us to easily
7-
query bike sales performance per certain time periods, per shop, per region or across all shops.
3+
The bike shop company consists of multiple physical stores and an online presense. It would be helpful to have an aggregate view of sales volume across the physical and online stores.
4+
5+
The following example creates a time series key for five bike shops to track the total sales for each. Each key has an appropriate region label, `east` or `west`. This kind of representation will allow you to easily query bike sales performance over specific periods on a per-shop or per-region basis.
86

97
```redis Create time series per shop
108
TS.CREATE bike_sales_1 DUPLICATE_POLICY SUM LABELS region east compacted no
@@ -14,12 +12,10 @@ TS.CREATE bike_sales_4 DUPLICATE_POLICY SUM LABELS region west compacted no
1412
TS.CREATE bike_sales_5 DUPLICATE_POLICY SUM LABELS region west compacted no
1513
```
1614

17-
In the above query, we make the shop id (1,2,3,4,5) a part of the time series name. You might also notice
18-
the `DUPLICATE_POLICY SUM` argument; this describes what should be done when two events in the same time series share
19-
the same timestamp: In this case, it would mean that two sales happened at exactly the same time, so the resulting value
20-
should be a sum of the two sales amounts.
15+
Notice the `DUPLICATE_POLICY SUM` arguments; these describe how Redis should behave when two events in the same store and region have
16+
the same timestamp. In this case, two sales that happen at exactly the same time in a particular store and region are added together.
2117

22-
Since the metrics are collected with a millisecond timestamp, we can compact our time series into sales per hour:
18+
Time series data are collected at a millisecond resolution. You can compact time series data and make it available in various sized aggregations. Here's an example of aggregating data by day:
2319

2420
```redis Time series compaction
2521
TS.CREATE bike_sales_1_per_day LABELS region east compacted yes
@@ -714,7 +710,7 @@ TS.RANGE bike_sales_1 - + AGGREGATION avg 3600000
714710
TS.RANGE bike_sales_2 - + AGGREGATION avg 86400000
715711
```
716712

717-
```redis Get sales per day, across west region
713+
```redis Get sales per day for the west region
718714
// Get sales per day, across all shops in the west region
719715
TS.MRANGE - + AGGREGATION sum 86400000 FILTER region=west compacted=no
720716
```
@@ -727,9 +723,7 @@ TS.MRANGE - + FILTER region=(east,west) compacted=no GROUPBY region REDUCE sum
727723
TS.MRANGE - + AGGREGATION avg 86400000 FILTER region=(east,west) compacted=no GROUPBY region REDUCE sum
728724
```
729725

730-
Remember that compacted version of the time series we created at the beginning of this section? This last one is exactly the kind
731-
of queries we might need it for. Instead of doing the aggregation on all the data points, we can simply run this query
732-
on the compacted time series:
726+
The next two queries show how useful compacted time series can be to reduce the data set size needed for aggregated queries.
733727

734728
```redis All sales per region, compacted
735729
TS.MRANGE - + FILTER region=(east,west) compacted=yes GROUPBY region REDUCE sum
@@ -739,3 +733,4 @@ TS.MRANGE - + FILTER region=(east,west) compacted=yes GROUPBY region REDUCE sum
739733
TS.RANGE bike_sales_3_per_day - + FILTER_BY_VALUE 3000 5000
740734
```
741735

736+
You can read more about time series and use cases [here](https://redis.io/docs/data-types/timeseries/). See [here](https://redis.io/commands/?group=timeseries) for the complete list of time series commands.

0 commit comments

Comments
 (0)