Skip to content

Commit 9ea2295

Browse files
Update better-than-average-calculating-geometric-means-using-SQL.md
1 parent 0015329 commit 9ea2295

File tree

1 file changed

+38
-26
lines changed

1 file changed

+38
-26
lines changed

better-than-average-calculating-geometric-means-using-SQL.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,49 +3,61 @@
33

44
# Abstract
55

6-
Geometric means are a robust and precise way to visualize the central tendency of a data set, particularly when examining skewed data or comparing ratios. Measures of central tendency are predominantly presented as arithmetic means or medians that are relatively simple to calculate and interpret, but may be inaccurate in representing data that are not _strictly normal_. Geometric means represent the best of both worlds, providing estimates that take into account all the observations in a data set without being influenced by the extremes. They can be employed by data analytics working in multiple industries including business, finance, health care, and research. Examples are varied and include examining
7-
compounded interest rates or returns on investments, assessing population changes in longitudinal data, or investigating lognormal data such lab assay results, biological concentrations, or decay rates.
6+
Geometric means are a robust and precise way to visualize the central tendency of a data set, particularly when examining skewed data or comparing ratios. Measures of central tendency are predominantly presented as arithmetic means or medians that are relatively simple to calculate and interpret, but may be inaccurate in representing data that are not _strictly normal_. Geometric means represent the best of both worlds, providing estimates that take into account all the observations in a data set without being influenced by the extremes.
87

98
While, most databases provide a function to calculate the Arthimetic Mean, none of them provide a function to calcualte the Geometric Mean. We will look at how to calculate Geometric Mean using SQL.
109

1110
# Introduction
1211

1312
## What is a geometric mean?
14-
Geometric means are a type of _average_, or _measure of central tendency_ in a distribution of data points, in the same group as the median, mode, or arithmetic mean. Whereas the arithmetic mean is calculated by summing a series of data points and then dividing that sum by the number of data points, the geometric mean multiplies a series of data points, and then uses the n number of data points to find the nth root of that product. Mathematically, the geometric mean adds depth and stability to
15-
the mean.
16-
17-
We can easily visualize the geometric mean when applying it to its counterpart, the geometric series of numbers, where each number increases from the previous number according to the same proportion. The geometric mean will lie in the direct center of the values, whereas the arithmetic mean would have been _pulled_ towards the higher values, and thus not truly represent the center of the data.
18-
19-
𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛 = 3+9+27+81+243
20-
5 = 72.6 (See Equation 1)
21-
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑚𝑒𝑎𝑛 = √3 ∗ 9 ∗ 27 ∗ 81 ∗ 243
22-
5 = 27 (See Equation 2)
13+
Geometric means are a type of _average_, or _measure of central tendency_ in a distribution of data points, in the same group as the median, mode, or arithmetic mean. Whereas the arithmetic mean is calculated by summing a series of data points and then dividing that sum by the number of data points, the geometric mean multiplies a series of data points, and then uses the n number of data points to find the nth root of that product. Mathematically, the geometric mean adds depth and stability to the mean.
2314

2415
## When should I use the geometric mean instead of the arithmetic mean?
25-
There are no hard rules for which mean you should use. Different types of averages can be used to express slightly different concepts: the center of the data, the values most often seen, and/or the typical "expected" values may or may not all be conveyed by the same measure. Data is rarely perfect, and you may need to look at several different types of averages to decide what works best for what you are trying to communicate with your data. But in general, geometric means are preferable when looking at skewed
26-
data, scaled data, or when averaging ratios. Some common applications include:
27-
28-
* Population growth
29-
* Compounding interest
30-
* Bioassays
31-
* Radioactive decay
32-
* Dose-response relationships
33-
* Count data
34-
* Time Series data
35-
* Longitudinal data
36-
* Repeated measures data
37-
* Bioequivalence trials
16+
There are no hard rules for which mean you should use. Different types of averages can be used to express slightly different concepts: the center of the data, the values most often seen, and/or the typical _expected_ values may or may not all be conveyed by the same measure. Data is rarely perfect, and you may need to look at several different types of averages to decide what works best for what you are trying to communicate with your data. But in general, geometric means are preferable when looking at skewed data, scaled data, or when averaging ratios.
3817

39-
If your data involve rate changes or changes over time, your data may be skewed. Often these data have a lognormal distribution, and the geometric mean describes the center of lognormal data perfectly.
18+
### Illustrative Example
4019

41-
In the following example the Arthimetic Mean is _pulled_ towards the higher pay rates. Notice that the Average (Arithemetic Mean) is 266, whereas most employees earn less than 200. Geometric mean might be a better central measure for this dataset, as it will consider all of the data points, but without being subject to the same _pull_ that can deteriorate the interpretation of the arithmetic mean (Figure 1):
20+
Let's take the pay rate for employees in an organization. Most of the Individual Contributors earn less than 200K. The CFO, VP and Director skew the dataset.
4221

4322
|![Arithematic Mean vs Geometric Mean vs Harmonic Mean(1)](https://github.com/structured-query-language/structured-query-language.github.io/assets/121721444/ca7bffab-208f-4085-a9f6-93ec4266df00)|
4423
|:--:|
4524
|Figure 1: Comparison of Means|
4625

26+
In the above example the Arthimetic Mean is _pulled_ towards the higher pay rates. Notice that the Average (arithemetic mean) is 266, whereas _most employees earn less than 200_. Geometric mean might be a better central measure for this dataset, as it will consider all of the data points, but without being subject to the same _pull_ that can deteriorate the interpretation of the arithmetic mean (Figure 1).
27+
4728
Geometric means are also appropriate when summarizing ratios or percentages. In the financial industry, this concept is applied when constructing stock indexes and rates of return. The geometric mean is also employed in the art world, to choose aspect ratios film and video. The idea of comparing ratios is expanded when you look at scaled data: if you have data that have different attributes or scales, and you have normalized the results to be presented as ratios to reference values, the geometric mean is the correct mean to use.
4829

30+
### Geometric Mean Calculation using SQL
31+
32+
```sql
33+
select EXP(SUM(LN(pay))/COUNT(pay)) from employee;
34+
```
35+
36+
Alternative if you are using Google BigQuery, you can create User Defined Aggregate Function for Geometric Mean as following:
37+
38+
```sql
39+
CREATE TEMP AGGREGATE FUNCTION geometric_mean(
40+
column_values float64
41+
)
42+
RETURNS float64
43+
AS
44+
(
45+
EXP(SUM(LN(column_values))/COUNT(column_values))
46+
);
47+
48+
with test_data as (
49+
SELECT 1 AS col1
50+
UNION ALL
51+
SELECT 3
52+
UNION ALL
53+
SELECT 5
54+
)
55+
select geometric_mean(col1) from test_data;
56+
```
57+
58+
59+
60+
4961
## Considerations
5062

5163
The calculation of the geometric mean requires that all values are non-zero and positive. So what should you do if you have data that do not meet this requirement? If you have values that equal zero, you have a few options:

0 commit comments

Comments
 (0)