Skip to content

Commit 6aa5c19

Browse files
DOC-52225 added temporary inline code examples
1 parent fe5b8ad commit 6aa5c19

File tree

1 file changed

+157
-35
lines changed
  • content/develop/clients/redis-py

1 file changed

+157
-35
lines changed

content/develop/clients/redis-py/prob.md

Lines changed: 157 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -52,28 +52,29 @@ is very busy, the `ip_tracker` set could become very large and consume
5252
a lot of memory.
5353

5454
You would probably round the count of distinct IP addresses to the
55-
nearest thousand or more when you deliver the usage statistics, so
55+
nearest thousand or more to deliver the usage statistics, so
5656
getting it exactly right is not important. It would be useful
57-
if you could trade off some precision in exchange for lower memory
57+
if you could trade off some accuracy in exchange for lower memory
5858
consumption. The probabilistic data types provide exactly this kind of
5959
trade-off. Specifically, you can count the approximate number of items in a
60-
set using the
61-
[HyperLogLog]({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
62-
data type, as described below.
60+
set using the [HyperLogLog](#set-cardinality) data type, as described below.
6361

6462
In general, the probabilistic data types let you perform approximations with a
65-
bounded degree of error that have much lower memory or execution time than
66-
the equivalent precise calculations.
63+
bounded degree of error that have much lower memory consumption or execution
64+
time than the equivalent precise calculations.
6765

6866
## Set operations
6967

7068
Redis supports the following approximate set operations:
7169

72-
- [Membership](#set-membership): The Bloom filter and Cuckoo filter data types
73-
let you track whether or not a given item is a member of a set.
74-
- [Cardinality](#set-cardinality): The HyperLogLog data type gives you an approximate
75-
value for the number of items in a set, also known as the *cardinality* of
76-
the set.
70+
- [Membership](#set-membership): The
71+
[Bloom filter]({{< relref "/develop/data-types/probabilistic/bloom-filter" >}}) and
72+
[Cuckoo filter]({{< relref "/develop/data-types/probabilistic/cuckoo-filter" >}})
73+
data types let you track whether or not a given item is a member of a set.
74+
- [Cardinality](#set-cardinality): The
75+
[HyperLogLog]({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
76+
data type gives you an approximate value for the number of items in a set, also
77+
known as the *cardinality* of the set.
7778

7879
The sections below describe these operations in more detail.
7980

@@ -98,7 +99,6 @@ add. The following example adds some names to a Bloom filter representing
9899
a list of users and checks for the presence or absence of users in the list.
99100
Note that you must use the `bf()` method to access the Bloom filter commands.
100101

101-
<!--
102102
```py
103103
res1 = r.bf().madd("recorded_users", "andy", "cameron", "david", "michelle")
104104
print(res1) # >>> [1, 1, 1, 1]
@@ -109,16 +109,15 @@ print(res2) # >>> 1
109109
res3 = r.bf().exists("recorded_users", "kaitlyn")
110110
print(res3) # >>> 0
111111
```
112-
-->
113-
{{< clients-example home_prob_dts bloom Python >}}
114-
{{< /clients-example >}}
112+
113+
<!-- < clients-example home_prob_dts bloom Python >}}
114+
< /clients-example >}} -->
115115

116116
A Cuckoo filter has similar features to a Bloom filter, but also supports
117117
a deletion operation to remove hashes from a set, as shown in the example
118118
below. Note that you must use the `cf()` method to access the Cuckoo filter
119119
commands.
120120

121-
<!--
122121
```py
123122
res4 = r.cf().add("other_users", "paolo")
124123
print(res4) # >>> 1
@@ -138,9 +137,9 @@ print(res8)
138137
res9 = r.cf().exists("other_users", "paolo")
139138
print(res9) # >>> 0
140139
```
141-
-->
142-
{{< clients-example home_prob_dts cuckoo Python >}}
143-
{{< /clients-example >}}
140+
141+
<!-- < clients-example home_prob_dts cuckoo Python >}}
142+
< /clients-example >}} -->
144143

145144
Which of these two data types you choose depends on your use case.
146145
Bloom filters are generally faster than Cuckoo filters when adding new items,
@@ -157,9 +156,9 @@ object calculates the cardinality of a set. As you add
157156
items, the HyperLogLog tracks the number of distinct set members but
158157
doesn't let you retrieve them or query which items have been added.
159158
You can also merge two or more HyperLogLogs to find the cardinality of the
160-
union of the sets they represent.
159+
[union](https://en.wikipedia.org/wiki/Union_(set_theory)) of the sets they
160+
represent.
161161

162-
<!--
163162
```py
164163
res10 = r.pfadd("group:1", "andy", "cameron", "david")
165164
print(res10) # >>> 1
@@ -179,9 +178,9 @@ print(res14) # >>> True
179178
res15 = r.pfcount("both_groups")
180179
print(res15) # >>> 7
181180
```
182-
-->
183-
{{< clients-example home_prob_dts hyperloglog Python >}}
184-
{{< /clients-example >}}
181+
182+
<!-- < clients-example home_prob_dts hyperloglog Python >}}
183+
< /clients-example >}} -->
185184

186185
The main benefit that HyperLogLogs offer is their very low
187186
memory usage. They can count up to 2^64 items with less than
@@ -205,6 +204,8 @@ on numeric data sets:
205204
[Top-K]({{< relref "/develop/data-types/probabilistic/top-k" >}}) data type
206205
estimates the ranking of labeled items by frequency in a data stream.
207206

207+
The sections below describe these operations in more detail.
208+
208209
### Frequency
209210

210211
A [Count-min sketch]({{< relref "/develop/data-types/probabilistic/count-min-sketch" >}})
@@ -214,10 +215,41 @@ how close you want to keep the count to the true value (as a fraction)
214215
and the acceptable probability of failing to keep it in this
215216
desired range. For example, you can request that the count should
216217
stay within 0.1% of the true value and have a 0.05% probability
217-
of going outside this limit.
218+
of going outside this limit. The example below shows how to create
219+
a Count-min sketch object, add data to it, and then query it.
220+
Note that you must use the `cms()` method to access the Count-min
221+
sketch commands.
222+
223+
```py
224+
# Specify that you want to keep the counts within 0.01
225+
# (0.1%) of the true value with a 0.005 (0.05%) chance
226+
# of going outside this limit.
227+
res16 = r.cms().initbyprob("items_sold", 0.01, 0.005)
228+
print(res16) # >>> True
229+
230+
# The parameters for `incrby()` are two lists. The count
231+
# for each item in the first list is incremented by the
232+
# value at the same index in the second list.
233+
res17 = r.cms().incrby(
234+
"items_sold",
235+
["bread", "tea", "coffee", "beer"], # Items sold
236+
[300, 200, 200, 100]
237+
)
238+
print(res17) # >>> [300, 200, 200, 100]
239+
240+
res18 = r.cms().incrby(
241+
"items_sold",
242+
["bread", "coffee"],
243+
[100, 150]
244+
)
245+
print(res18) # >>> [400, 350]
246+
247+
res19 = r.cms().query("items_sold", "bread", "tea", "coffee", "beer")
248+
print(res19) # >>> [400, 200, 350, 100]
249+
```
218250

219-
{{< clients-example home_prob_dts cms Python >}}
220-
{{< /clients-example >}}
251+
<!-- < clients-example home_prob_dts cms Python >}}
252+
< /clients-example >}} -->
221253

222254
The advantage of using a CMS over keeping an exact count with a
223255
[sorted set]({{< relref "/develop/data-types/sorted-sets" >}})
@@ -231,7 +263,7 @@ other similar statistics.
231263
A [quantile](https://en.wikipedia.org/wiki/Quantile) is the value
232264
below which a certain fraction of samples lie. For example, with
233265
a set of measurements of people's heights, the quantile of 0.75 is
234-
the value of height below which 75% of people's heights lie.
266+
the value of height below which 75% of all people's heights lie.
235267
[Percentiles](https://en.wikipedia.org/wiki/Percentile) are equivalent
236268
to quantiles, except that the fraction is expressed as a percentage.
237269

@@ -246,10 +278,56 @@ maximum values, the quantile of 0.75, and the
246278
[cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function)
247279
(CDF), which is effectively the inverse of the quantile function. It also
248280
shows how to merge two or more t-digest objects to query the combined
249-
data set.
281+
data set. Note that you must use the `tdigest()` method to access the
282+
t-digest commands.
283+
284+
```py
285+
res20 = r.tdigest().create("male_heights")
286+
print(res20) # >>> True
287+
288+
res21 = r.tdigest().add(
289+
"male_heights",
290+
[175.5, 181, 160.8, 152, 177, 196, 164]
291+
)
292+
print(res21) # >>> OK
293+
294+
res22 = r.tdigest().min("male_heights")
295+
print(res22) # >>> 152.0
250296

251-
{{< clients-example home_prob_dts tdigest Python >}}
252-
{{< /clients-example >}}
297+
res23 = r.tdigest().max("male_heights")
298+
print(res23) # >>> 196.0
299+
300+
res24 = r.tdigest().quantile("male_heights", 0.75)
301+
print(res24) # >>> 181
302+
303+
# Note that the CDF value for 181 is not exactly
304+
# 0.75. Both values are estimates.
305+
res25 = r.tdigest().cdf("male_heights", 181)
306+
print(res25) # >>> [0.7857142857142857]
307+
308+
res26 = r.tdigest().create("female_heights")
309+
print(res26) # >>> True
310+
311+
res27 = r.tdigest().add(
312+
"female_heights",
313+
[155.5, 161, 168.5, 170, 157.5, 163, 171]
314+
)
315+
print(res27) # >>> OK
316+
317+
res28 = r.tdigest().quantile("female_heights", 0.75)
318+
print(res28) # >>> [170]
319+
320+
res29 = r.tdigest().merge(
321+
"all_heights", 2, "male_heights", "female_heights"
322+
)
323+
print(res29) # >>> OK
324+
325+
res30 = r.tdigest().quantile("all_heights", 0.75)
326+
print(res30) # >>> [175.5]
327+
```
328+
329+
<!-- < clients-example home_prob_dts tdigest Python >}}
330+
< /clients-example >}} -->
253331

254332
A t-digest object also supports several other related commands, such
255333
as querying by rank. See the
@@ -268,7 +346,51 @@ The example below adds several different items to a Top-K object
268346
that tracks the top three items (this is the second parameter to
269347
the `topk().reserve()` method). It also shows how to list the
270348
top *k* items and query whether or not a given item is in the
271-
list.
349+
list. Note that you must use the `topk()` method to access the
350+
Top-K commands.
351+
352+
```py
353+
# The `reserve()` method creates the Top-K object with
354+
# the given key. The parameters are the number of items
355+
# in the ranking and values for `width`, `depth`, and
356+
# `decay`, described in the Top-K reference page.
357+
res31 = r.topk().reserve("top_3_songs", 3, 7, 8, 0.9)
358+
print(res31) # >>> True
359+
360+
# The parameters for `incrby()` are two lists. The count
361+
# for each item in the first list is incremented by the
362+
# value at the same index in the second list.
363+
res32 = r.topk().incrby(
364+
"top_3_songs",
365+
[
366+
"Starfish Trooper",
367+
"Only one more time",
368+
"Rock me, Handel",
369+
"How will anyone know?",
370+
"Average lover",
371+
"Road to everywhere"
372+
],
373+
[
374+
3000,
375+
1850,
376+
1325,
377+
3890,
378+
4098,
379+
770
380+
]
381+
)
382+
print(res32)
383+
# >>> [None, None, None, 'Rock me, Handel', 'Only one more time', None]
384+
385+
res33 = r.topk().list("top_3_songs")
386+
print(res33)
387+
# >>> ['Average lover', 'How will anyone know?', 'Starfish Trooper']
388+
389+
res34 = r.topk().query(
390+
"top_3_songs", "Starfish Trooper", "Road to everywhere"
391+
)
392+
print(res34) # >>> [1, 0]
393+
```
272394

273-
{{< clients-example home_prob_dts topk Python >}}
274-
{{< /clients-example >}}
395+
<!-- < clients-example home_prob_dts topk Python >}}
396+
< /clients-example >}} -->

0 commit comments

Comments
 (0)