@@ -23,7 +23,7 @@ The types fall into two basic categories:
23
23
- [ Set operations] ( #set-operations ) : These types let you calculate (approximately)
24
24
the number of items in a set of distinct values, and whether or not a given value is
25
25
a member of a set.
26
- - [ Numeric data calculations ] ( #numeric-data ) : These types give you an approximation of
26
+ - [ Statistics ] ( #statistics ) : These types give you an approximation of
27
27
statistics such as the percentile, rank, and frequency of numeric data points in a list.
28
28
29
29
To see why these approximate calculations would be useful, consider the task of
@@ -97,6 +97,7 @@ add. The following example adds some names to a Bloom filter representing
97
97
a list of users and checks for the presence or absence of users in the list.
98
98
Note that you must use the ` bf() ` method to access the Bloom filter commands.
99
99
100
+ <!--
100
101
```py
101
102
res1 = r.bf().madd("recorded_users", "andy", "cameron", "david", "michelle")
102
103
print(res1) # >>> [1, 1, 1, 1]
@@ -107,12 +108,16 @@ print(res2) # >>> 1
107
108
res3 = r.bf().exists("recorded_users", "kaitlyn")
108
109
print(res3) # >>> 0
109
110
```
111
+ -->
112
+ {{< clients-example home_prob_dts bloom Python >}}
113
+ {{< /clients-example >}}
110
114
111
115
A Cuckoo filter has similar features to a Bloom filter, but also supports
112
116
a deletion operation to remove hashes from a set, as shown in the example
113
117
below. Note that you must use the ` cf() ` method to access the Cuckoo filter
114
118
commands.
115
119
120
+ <!--
116
121
```py
117
122
res4 = r.cf().add("other_users", "paolo")
118
123
print(res4) # >>> 1
@@ -132,6 +137,9 @@ print(res8)
132
137
res9 = r.cf().exists("other_users", "paolo")
133
138
print(res9) # >>> 0
134
139
```
140
+ -->
141
+ {{< clients-example home_prob_dts cuckoo Python >}}
142
+ {{< /clients-example >}}
135
143
136
144
Which of these two data types you choose depends on your use case.
137
145
Bloom filters are generally faster than Cuckoo filters when adding new items,
@@ -143,11 +151,14 @@ reference pages for more information and comparison between the two types.
143
151
144
152
### Set cardinality
145
153
146
- A HyperLogLog object doesn't support the set membership operation but
147
- instead is specialized to calculate the cardinality of the set. You can
148
- also merge two or more HyperLogLogs to find the cardinality of the
154
+ A [ HyperLogLog] ({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
155
+ object calculates the cardinality of a set. As you add
156
+ items, the HyperLogLog tracks the number of distinct set members but
157
+ doesn't let you retrieve them or query which items have been added.
158
+ You can also merge two or more HyperLogLogs to find the cardinality of the
149
159
union of the sets they represent.
150
160
161
+ <!--
151
162
```py
152
163
res10 = r.pfadd("group:1", "andy", "cameron", "david")
153
164
print(res10) # >>> 1
@@ -167,19 +178,50 @@ print(res14) # >>> True
167
178
res15 = r.pfcount("both_groups")
168
179
print(res15) # >>> 7
169
180
```
181
+ -->
182
+ {{< clients-example home_prob_dts hyperloglog Python >}}
183
+ {{< /clients-example >}}
170
184
171
185
The main benefit that HyperLogLogs offer is their very low
172
186
memory usage. They can count up to 2^64 items with less than
173
- 1% standard error using a maximum 12KB of memory.
187
+ 1% standard error using a maximum 12KB of memory. This makes
188
+ them very useful for counting things like the total of distinct
189
+ IP addresses that access a website or the total of distinct
190
+ bank card numbers that make purchases within a day.
174
191
175
- ## Numeric data
192
+ ## Statistics
176
193
177
194
Redis supports several approximate statistical calculations
178
195
on numeric data sets:
179
196
180
- - Frequency: The Count-min sketch data type lets you find the
181
- approximate frequency of a labeled item in a data stream.
197
+ - [ Frequency] ( #frequency ) : The Count-min sketch data type lets you
198
+ find the approximate frequency of a labeled item in a data stream.
182
199
- Percentiles: The t-digest data type estimates the percentile
183
200
of a supplied value in a data stream.
184
201
- Ranking: The Top-K data type estimates the ranking of items
185
202
by frequency in a data stream.
203
+
204
+ ### Frequency
205
+
206
+ A [ Count-min sketch] ({{< relref "/develop/data-types/probabilistic/count-min-sketch" >}})
207
+ (CMS) object keeps count of a set of related items represented by
208
+ string labels. The count is approximate, but you can specify
209
+ how close you want to keep the count to the true value (as a fraction)
210
+ and the acceptable probability of failing to keep it in this
211
+ desired range. For example, you can request that the count should
212
+ stay within 0.1% of the true value and have a 0.05% probability
213
+ of going outside this limit.
214
+
215
+ {{< clients-example home_prob_dts cms Python >}}
216
+ {{< /clients-example >}}
217
+
218
+ The advantage of using a CMS over keeping an exact count with a
219
+ [ sorted set] ({{< relref "/develop/data-types/sorted-sets" >}})
220
+ is that that a CMS has very low and fixed memory usage, even for
221
+ large numbers of items. Use CMS objects to keep daily counts of
222
+ items sold, accesses to individual web pages on your site, and
223
+ other similar statistics.
224
+
225
+ ### Percentiles
226
+
227
+
0 commit comments