|
| 1 | +--- |
| 2 | +categories: |
| 3 | +- docs |
| 4 | +- develop |
| 5 | +- stack |
| 6 | +- oss |
| 7 | +- rs |
| 8 | +- rc |
| 9 | +- oss |
| 10 | +- kubernetes |
| 11 | +- clients |
| 12 | +description: Learn how to use approximate statistics with Redis. |
| 13 | +linkTitle: Probabilistic data types |
| 14 | +title: Probabilistic data types |
| 15 | +weight: 45 |
| 16 | +--- |
| 17 | + |
| 18 | +Redis supports several |
| 19 | +[probabilistic data types]({{< relref "/develop/data-types/probabilistic" >}}) |
| 20 | +that let you gather approximate statistical information. To see why |
| 21 | +this would be useful, consider the task of counting the number |
| 22 | +of distinct IP addresses that access a website in one day. |
| 23 | + |
| 24 | +Assuming that you already have code that supplies you with each IP |
| 25 | +address as a string, you could record the addresses in Redis using |
| 26 | +a [set]({{< relref "/develop/data-types/sets" >}}): |
| 27 | + |
| 28 | +```py |
| 29 | +r.sadd("ip_tracker", new_ip_address) |
| 30 | +``` |
| 31 | + |
| 32 | +The set can only contain each key once, so if the same address |
| 33 | +appears again during the day, the new instance will not change |
| 34 | +the set. At the end of the day, you could get the exact number of |
| 35 | +distinct addresses using the `scard()` function: |
| 36 | + |
| 37 | +```py |
| 38 | +num_distinct_ips = r.scard("ip_tracker") |
| 39 | +``` |
| 40 | + |
| 41 | +This approach is simple, effective, and precise but if your website |
| 42 | +is very busy, the `ip_tracker` set could become very large and consume |
| 43 | +a lot of memory. |
| 44 | + |
| 45 | +The count of distinct IP addresses would probably be rounded to the |
| 46 | +nearest thousand or more when the usage statistics are delivered, so |
| 47 | +getting it exactly right is not important. It would be useful |
| 48 | +if you could trade off some precision in exchange for lower memory |
| 49 | +consumption and obfuscation of IP address strings. The probabilistic |
| 50 | +data types provide exactly this kind of trade-off for statistics. |
| 51 | +Specifically, you can count the approximate number of items in a |
| 52 | +set using the |
| 53 | +[HyperLogLog]({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}}) |
| 54 | +data type, as described below. |
| 55 | + |
| 56 | +## HyperLogLog |
| 57 | + |
| 58 | +A HyperLogLog object works in a similar way to a [set]({{< relref "/develop/data-types/sets" >}}), |
| 59 | +except that it only lets you add items and approximate the number of items |
| 60 | +in the set (also known as the *cardinality* of the set). |
0 commit comments