Skip to content

Commit 6fbe661

Browse files
DOC-5225 started stats section
1 parent e33fdb7 commit 6fbe661

File tree

1 file changed

+50
-8
lines changed
  • content/develop/clients/redis-py

1 file changed

+50
-8
lines changed

content/develop/clients/redis-py/prob.md

Lines changed: 50 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The types fall into two basic categories:
2323
- [Set operations](#set-operations): These types let you calculate (approximately)
2424
the number of items in a set of distinct values, and whether or not a given value is
2525
a member of a set.
26-
- [Numeric data calculations](#numeric-data): These types give you an approximation of
26+
- [Statistics](#statistics): These types give you an approximation of
2727
statistics such as the percentile, rank, and frequency of numeric data points in a list.
2828

2929
To see why these approximate calculations would be useful, consider the task of
@@ -97,6 +97,7 @@ add. The following example adds some names to a Bloom filter representing
9797
a list of users and checks for the presence or absence of users in the list.
9898
Note that you must use the `bf()` method to access the Bloom filter commands.
9999

100+
<!--
100101
```py
101102
res1 = r.bf().madd("recorded_users", "andy", "cameron", "david", "michelle")
102103
print(res1) # >>> [1, 1, 1, 1]
@@ -107,12 +108,16 @@ print(res2) # >>> 1
107108
res3 = r.bf().exists("recorded_users", "kaitlyn")
108109
print(res3) # >>> 0
109110
```
111+
-->
112+
{{< clients-example home_prob_dts bloom Python >}}
113+
{{< /clients-example >}}
110114

111115
A Cuckoo filter has similar features to a Bloom filter, but also supports
112116
a deletion operation to remove hashes from a set, as shown in the example
113117
below. Note that you must use the `cf()` method to access the Cuckoo filter
114118
commands.
115119

120+
<!--
116121
```py
117122
res4 = r.cf().add("other_users", "paolo")
118123
print(res4) # >>> 1
@@ -132,6 +137,9 @@ print(res8)
132137
res9 = r.cf().exists("other_users", "paolo")
133138
print(res9) # >>> 0
134139
```
140+
-->
141+
{{< clients-example home_prob_dts cuckoo Python >}}
142+
{{< /clients-example >}}
135143

136144
Which of these two data types you choose depends on your use case.
137145
Bloom filters are generally faster than Cuckoo filters when adding new items,
@@ -143,11 +151,14 @@ reference pages for more information and comparison between the two types.
143151

144152
### Set cardinality
145153

146-
A HyperLogLog object doesn't support the set membership operation but
147-
instead is specialized to calculate the cardinality of the set. You can
148-
also merge two or more HyperLogLogs to find the cardinality of the
154+
A [HyperLogLog]({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
155+
object calculates the cardinality of a set. As you add
156+
items, the HyperLogLog tracks the number of distinct set members but
157+
doesn't let you retrieve them or query which items have been added.
158+
You can also merge two or more HyperLogLogs to find the cardinality of the
149159
union of the sets they represent.
150160

161+
<!--
151162
```py
152163
res10 = r.pfadd("group:1", "andy", "cameron", "david")
153164
print(res10) # >>> 1
@@ -167,19 +178,50 @@ print(res14) # >>> True
167178
res15 = r.pfcount("both_groups")
168179
print(res15) # >>> 7
169180
```
181+
-->
182+
{{< clients-example home_prob_dts hyperloglog Python >}}
183+
{{< /clients-example >}}
170184

171185
The main benefit that HyperLogLogs offer is their very low
172186
memory usage. They can count up to 2^64 items with less than
173-
1% standard error using a maximum 12KB of memory.
187+
1% standard error using a maximum 12KB of memory. This makes
188+
them very useful for counting things like the total of distinct
189+
IP addresses that access a website or the total of distinct
190+
bank card numbers that make purchases within a day.
174191

175-
## Numeric data
192+
## Statistics
176193

177194
Redis supports several approximate statistical calculations
178195
on numeric data sets:
179196

180-
- Frequency: The Count-min sketch data type lets you find the
181-
approximate frequency of a labeled item in a data stream.
197+
- [Frequency](#frequency): The Count-min sketch data type lets you
198+
find the approximate frequency of a labeled item in a data stream.
182199
- Percentiles: The t-digest data type estimates the percentile
183200
of a supplied value in a data stream.
184201
- Ranking: The Top-K data type estimates the ranking of items
185202
by frequency in a data stream.
203+
204+
### Frequency
205+
206+
A [Count-min sketch]({{< relref "/develop/data-types/probabilistic/count-min-sketch" >}})
207+
(CMS) object keeps count of a set of related items represented by
208+
string labels. The count is approximate, but you can specify
209+
how close you want to keep the count to the true value (as a fraction)
210+
and the acceptable probability of failing to keep it in this
211+
desired range. For example, you can request that the count should
212+
stay within 0.1% of the true value and have a 0.05% probability
213+
of going outside this limit.
214+
215+
{{< clients-example home_prob_dts cms Python >}}
216+
{{< /clients-example >}}
217+
218+
The advantage of using a CMS over keeping an exact count with a
219+
[sorted set]({{< relref "/develop/data-types/sorted-sets" >}})
220+
is that that a CMS has very low and fixed memory usage, even for
221+
large numbers of items. Use CMS objects to keep daily counts of
222+
items sold, accesses to individual web pages on your site, and
223+
other similar statistics.
224+
225+
### Percentiles
226+
227+

0 commit comments

Comments
 (0)