Skip to content

Commit 61283be

Browse files
committed
DOC-3110: copy-edit RI tutorials
1 parent 584afe7 commit 61283be

File tree

4 files changed

+62
-50
lines changed

4 files changed

+62
-50
lines changed
Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,39 @@
1-
In very broad terms probabilistic data structures (PDS) allow us to get to a "close enough" result in a much shorter time and by using significantly less memory.
1+
This tutorial will demonstrate Redis Stack's probabilistic data structure capabilities using the bike shop use case.
22

3-
Redis Stack supports 4 of the most famous PDS:
4-
- Bloom filters
5-
- Cuckoo filters
6-
- Count-Min Sketch
3+
Redis Stack supports the following probabilistic data structures:
4+
5+
- Bloom filter
6+
- Cuckoo filter
7+
- Count-min sketch
78
- Top-K
9+
- t-digest
10+
11+
Probabilistic data structures, in general, provide results that are "close enough" in a much shorter time and by using significantly less memory than other data types such as sets or hashes. Here, you'll learn how to use a Bloom filter.
12+
13+
A Bloom filter allows you to check if an element is present in a set using a very small, fixed-size amount of memory. A query will return one of two possible answers:
814

9-
In the rest of this tutorial we'll introduce how you can use a Bloom filter to save many heavy calls to the relational database, or a lot of memory, compared to using sets or hashes.
10-
A Bloom filter is a probabilistic data structure that enables you to check if an element is present in a set using a very small memory space of a fixed size. **It can guarantee the absence of an element from a set, but it can only give an estimation about its presence**. So when it responds that an element is not present in a set (a negative answer), you can be sure that indeed is the case. However, one out of every N positive answers will be wrong.
11-
Even though it looks unusual at a first glance, this kind of uncertainty still has its place in computer science. There are many cases out there where a negative answer will prevent very costly operations;
15+
1. the element *might* be in the set
16+
2. the element is definitely not in the set
1217

13-
How can a Bloom filter be useful to our bike shop? For starters, we could keep a Bloom filter that stores all usernames of people who've already registered with our service. That way, when someone is creating a new account we can very quickly check if that username is free. If the answer is yes, we'd still have to go and check the main database for the precise result, but if the answer is no, we can skip that call and continue with the registration.
18+
In other words, a Bloom filter will guarantee the absence of an element in a set, but it can only give an estimation about its presence. False positives are entirely possible. See [this Wikipedia article](https://en.wikipedia.org/wiki/Bloom_filter) for more detailed information about false positives and their frequency.
1419

15-
Another, perhaps more interesting example is for showing better and more relevant ads to users. We could keep a bloom filter per user with all the products they've bought from the shop, and when we get a list of products from our suggestion engine we could check it against this filter.
20+
Despite the uncertainty involved when using Bloom filters, they are still valuable for many applications.
1621

22+
How can a Bloom filter be helpful to an online bike shop service? For starters, you can use a Bloom filter to store the usernames of people who've already registered with the shop. When someone creates a new account, the system can very quickly check if the user's proposed username is available. If the answer is yes, you can confirm using your primary database. But, if the answer is no, further checks are not required and registration can proceed.
1723

18-
```redis Add all bought product ids in the Bloom filter
24+
Another use case is targeting ads to users. A per-user Bloom filter can be created and populated with all the products each user has purchased from the shop. When the shop's ad suggestion engine provides a list of possible ads to show a user, it can check each item against the user's Bloom filter. Each item that is not part of the filter are good targets. For each item that might already be part of the filter, a second query can be made to the primary database to confirm. If the second confirmation is negative, then that ad can be added to the target list.
25+
26+
First, create the Bloom filter.
27+
28+
```redis Add all bought product IDs to a Bloom filter
1929
BF.MADD user:778:bought_products 4545667 9026875 3178945 4848754 1242449
2030
```
2131

22-
Just before we try to show an ad to a user, we can first check if that product id is already in their "bought products" Bloom filter. If the answer is yes - we might choose to check the main database, or we might skip to the next recommendation from our list. But if the answer is no, then we know for sure that our user hasn't bought that product:
32+
Next, run a couple of queries.
2333

2434
```redis Has a user bought this product?
2535
BF.EXISTS user:778:bought_products 1234567 // No, the user has not bought this product
2636
BF.EXISTS user:778:bought_products 3178945 // The user might have bought this product
2737
```
38+
39+
You can read more about Bloom filters and their use cases [here](https://redis.io/docs/data-types/probabilistic/bloom-filter/). See [here](https://redis.io/commands/?group=bf) for the complete list of Bloom filter commands.

src/redis_stack/redis_for_time_series.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
Among other things, Redis Stack provides you with a native time series data structure. Let's see how a time series might
2-
be useful in our bike shop.
1+
This tutorial will demonstrate Redis Stack's ability to store time series data using the bike shop use case.
32

4-
As we have multiple physical shops too, alongside our online shop, it could be helpful to have an overview of the sales
5-
volume. We will create one time series per shop tracking the total amount of all sales. In addition, we will mark the
6-
time series with the appropriate region label, `east` or `west`. This kind of representation will allow us to easily
7-
query bike sales performance per certain time periods, per shop, per region or across all shops.
3+
The bike shop company consists of multiple physical stores and an online presense. It would be helpful to have an aggregate view of sales volume across the physical and online stores.
4+
5+
In the following example, a time series key is created for each of the five shops to track total sales. Each key is marked with an appropriate region label, `east` or `west`. This kind of representation will allows you to easily
6+
query bike sales performance certain time periods on a per shop, per region, or across all shops.
7+
8+
Notice the `DUPLICATE_POLICY SUM`
89

910
```redis Create time series per shop
1011
TS.CREATE bike_sales_1 DUPLICATE_POLICY SUM LABELS region east compacted no
@@ -14,12 +15,10 @@ TS.CREATE bike_sales_4 DUPLICATE_POLICY SUM LABELS region west compacted no
1415
TS.CREATE bike_sales_5 DUPLICATE_POLICY SUM LABELS region west compacted no
1516
```
1617

17-
In the above query, we make the shop id (1,2,3,4,5) a part of the time series name. You might also notice
18-
the `DUPLICATE_POLICY SUM` argument; this describes what should be done when two events in the same time series share
19-
the same timestamp: In this case, it would mean that two sales happened at exactly the same time, so the resulting value
20-
should be a sum of the two sales amounts.
18+
Notice the `DUPLICATE_POLICY SUM` arguments; these describe how Redis should behave when two events in the same store and region have
19+
the same timestamp. In this case, two sales that happen at exactly the same time in a particular store and region are added together.
2120

22-
Since the metrics are collected with a millisecond timestamp, we can compact our time series into sales per hour:
21+
Time series data is collected using millisecond timestamps. You can compact time series data and make it available in various sized aggregations. Here's an example of aggregating data by day:
2322

2423
```redis Time series compaction
2524
TS.CREATE bike_sales_1_per_day LABELS region east compacted yes
@@ -714,7 +713,7 @@ TS.RANGE bike_sales_1 - + AGGREGATION avg 3600000
714713
TS.RANGE bike_sales_2 - + AGGREGATION avg 86400000
715714
```
716715

717-
```redis Get sales per day, across west region
716+
```redis Get sales per day for the west region
718717
// Get sales per day, across all shops in the west region
719718
TS.MRANGE - + AGGREGATION sum 86400000 FILTER region=west compacted=no
720719
```
@@ -727,9 +726,7 @@ TS.MRANGE - + FILTER region=(east,west) compacted=no GROUPBY region REDUCE sum
727726
TS.MRANGE - + AGGREGATION avg 86400000 FILTER region=(east,west) compacted=no GROUPBY region REDUCE sum
728727
```
729728

730-
Remember that compacted version of the time series we created at the beginning of this section? This last one is exactly the kind
731-
of queries we might need it for. Instead of doing the aggregation on all the data points, we can simply run this query
732-
on the compacted time series:
729+
The next two queries show how useful compacted time series can be to reduce the data set size needed for aggregated queries.
733730

734731
```redis All sales per region, compacted
735732
TS.MRANGE - + FILTER region=(east,west) compacted=yes GROUPBY region REDUCE sum
@@ -739,3 +736,4 @@ TS.MRANGE - + FILTER region=(east,west) compacted=yes GROUPBY region REDUCE sum
739736
TS.RANGE bike_sales_3_per_day - + FILTER_BY_VALUE 3000 5000
740737
```
741738

739+
You can read more about time series and use cases [here](https://redis.io/docs/data-types/timeseries/). See [here](https://redis.io/commands/?group=timeseries) for the complete list of time series commands.

0 commit comments

Comments
 (0)