Skip to content

Commit 2c9c6cf

Browse files
DOC-5225 started probabilistic examples page for Python
1 parent 86d214f commit 2c9c6cf

File tree

1 file changed

+60
-0
lines changed
  • content/develop/clients/redis-py

1 file changed

+60
-0
lines changed
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Learn how to use approximate statistics with Redis.
13+
linkTitle: Probabilistic data types
14+
title: Probabilistic data types
15+
weight: 45
16+
---
17+
18+
Redis supports several
19+
[probabilistic data types]({{< relref "/develop/data-types/probabilistic" >}})
20+
that let you gather approximate statistical information. To see why
21+
this would be useful, consider the task of counting the number
22+
of distinct IP addresses that access a website in one day.
23+
24+
Assuming that you already have code that supplies you with each IP
25+
address as a string, you could record the addresses in Redis using
26+
a [set]({{< relref "/develop/data-types/sets" >}}):
27+
28+
```py
29+
r.sadd("ip_tracker", new_ip_address)
30+
```
31+
32+
The set can only contain each key once, so if the same address
33+
appears again during the day, the new instance will not change
34+
the set. At the end of the day, you could get the exact number of
35+
distinct addresses using the `scard()` function:
36+
37+
```py
38+
num_distinct_ips = r.scard("ip_tracker")
39+
```
40+
41+
This approach is simple, effective, and precise but if your website
42+
is very busy, the `ip_tracker` set could become very large and consume
43+
a lot of memory.
44+
45+
The count of distinct IP addresses would probably be rounded to the
46+
nearest thousand or more when the usage statistics are delivered, so
47+
getting it exactly right is not important. It would be useful
48+
if you could trade off some precision in exchange for lower memory
49+
consumption and obfuscation of IP address strings. The probabilistic
50+
data types provide exactly this kind of trade-off for statistics.
51+
Specifically, you can count the approximate number of items in a
52+
set using the
53+
[HyperLogLog]({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
54+
data type, as described below.
55+
56+
## HyperLogLog
57+
58+
A HyperLogLog object works in a similar way to a [set]({{< relref "/develop/data-types/sets" >}}),
59+
except that it only lets you add items and approximate the number of items
60+
in the set (also known as the *cardinality* of the set).

0 commit comments

Comments
 (0)