@@ -24,7 +24,8 @@ The types fall into two basic categories:
24
24
the number of items in a set of distinct values, and whether or not a given value is
25
25
a member of a set.
26
26
- [ Statistics] ( #statistics ) : These types give you an approximation of
27
- statistics such as the percentile, rank, and frequency of numeric data points in a list.
27
+ statistics such as the quantiles, ranks, and frequencies of numeric data points in
28
+ a list.
28
29
29
30
To see why these approximate calculations would be useful, consider the task of
30
31
counting the number of distinct IP addresses that access a website in one day.
@@ -50,8 +51,8 @@ This approach is simple, effective, and precise but if your website
50
51
is very busy, the ` ip_tracker ` set could become very large and consume
51
52
a lot of memory.
52
53
53
- The count of distinct IP addresses would probably be rounded to the
54
- nearest thousand or more when the usage statistics are delivered , so
54
+ You would probably round the count of distinct IP addresses to the
55
+ nearest thousand or more when you deliver the usage statistics, so
55
56
getting it exactly right is not important. It would be useful
56
57
if you could trade off some precision in exchange for lower memory
57
58
consumption. The probabilistic data types provide exactly this kind of
@@ -194,12 +195,15 @@ bank card numbers that make purchases within a day.
194
195
Redis supports several approximate statistical calculations
195
196
on numeric data sets:
196
197
197
- - [ Frequency] ( #frequency ) : The Count-min sketch data type lets you
198
- find the approximate frequency of a labeled item in a data stream.
199
- - Percentiles: The t-digest data type estimates the percentile
200
- of a supplied value in a data stream.
201
- - Ranking: The Top-K data type estimates the ranking of items
202
- by frequency in a data stream.
198
+ - [ Frequency] ( #frequency ) : The
199
+ [ Count-min sketch] ({{< relref "/develop/data-types/probabilistic/count-min-sketch" >}})
200
+ data type lets you find the approximate frequency of a labeled item in a data stream.
201
+ - [ Quantiles] ( #quantiles ) : The
202
+ [ t-digest] ({{< relref "/develop/data-types/probabilistic/t-digest" >}})
203
+ data type estimates the quantile of a query value in a data stream.
204
+ - [ Ranking] ( #ranking ) : The
205
+ [ Top-K] ({{< relref "/develop/data-types/probabilistic/top-k" >}}) data type
206
+ estimates the ranking of labeled items by frequency in a data stream.
203
207
204
208
### Frequency
205
209
@@ -222,6 +226,49 @@ large numbers of items. Use CMS objects to keep daily counts of
222
226
items sold, accesses to individual web pages on your site, and
223
227
other similar statistics.
224
228
225
- ### Percentiles
229
+ ### Quantiles
230
+
231
+ A [ quantile] ( https://en.wikipedia.org/wiki/Quantile ) is the value
232
+ below which a certain fraction of samples lie. For example, with
233
+ a set of measurements of people's heights, the quantile of 0.75 is
234
+ the value of height below which 75% of people's heights lie.
235
+ [ Percentiles] ( https://en.wikipedia.org/wiki/Percentile ) are equivalent
236
+ to quantiles, except that the fraction is expressed as a percentage.
237
+
238
+ A [ t-digest] ({{< relref "/develop/data-types/probabilistic/t-digest" >}})
239
+ object can estimate quantiles from a set of values added to it
240
+ without having to store each value in the set explicitly. This can
241
+ save a lot of memory when you have a large number of samples.
242
+
243
+ The example below shows how to add data samples to a t-digest
244
+ object and obtain some basic statistics, such as the minimum and
245
+ maximum values, the quantile of 0.75, and the
246
+ [ cumulative distribution function] ( https://en.wikipedia.org/wiki/Cumulative_distribution_function )
247
+ (CDF), which is effectively the inverse of the quantile function. It also
248
+ shows how to merge two or more t-digest objects to query the combined
249
+ data set.
250
+
251
+ {{< clients-example home_prob_dts tdigest Python >}}
252
+ {{< /clients-example >}}
253
+
254
+ A t-digest object also supports several other related commands, such
255
+ as querying by rank. See the
256
+ [ t-digest] ({{< relref "/develop/data-types/probabilistic/t-digest" >}})
257
+ reference for more information.
258
+
259
+ ### Ranking
226
260
261
+ A [ Top-K] ({{< relref "/develop/data-types/probabilistic/top-k" >}})
262
+ object estimates the rankings of different labeled items in a data
263
+ stream according to frequency. For example, you could use this to
264
+ track the top ten most frequently-accessed pages on a website, or the
265
+ top five most popular items sold.
227
266
267
+ The example below adds several different items to a Top-K object
268
+ that tracks the top three items (this is the second parameter to
269
+ the ` topk().reserve() ` method). It also shows how to list the
270
+ top * k* items and query whether or not a given item is in the
271
+ list.
272
+
273
+ {{< clients-example home_prob_dts topk Python >}}
274
+ {{< /clients-example >}}
0 commit comments