Skip to content

Commit 023bf71

Browse files
committed
clear up some docs and talk about plotting positions [ci skip]
1 parent e8dca99 commit 023bf71

File tree

6 files changed

+328
-34
lines changed

6 files changed

+328
-34
lines changed

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Tutorials
7070

7171
tutorial/getting_started.rst
7272
tutorial/closer_look_at_viz.rst
73+
tutorial/closer_look_at_plot_pos.rst
7374

7475
Testing
7576
=======

docs/tutorial/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
notebooks:
22

33
tools/nb_to_doc.py getting_started
4-
tools/nb_to_doc.py closer_look_at_viz
4+
tools/nb_to_doc.py closer_look_at_viz
5+
tools/nb_to_doc.py closer_look_at_plot_pos
Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Using different formulations of plotting positions\n",
8+
"### Looking at normal vs Weibull scales + Cunnane vs Weibull plotting positions\n",
9+
"\n",
10+
"When drawing a percentile, quantile, or probability plot, the potting positions of ordered data must be computed.\n",
11+
"\n",
12+
"For a sample $X$ with population size $n$, the plotting position of of the $j^\\mathrm{th}$ element is defined as:\n",
13+
"\n",
14+
"$$ \\frac{x_{j} - \\alpha}{n + 1 - \\alpha - \\beta } $$\n",
15+
"\n",
16+
"In this equation, α and β can take on several values. Common values are described below:"
17+
]
18+
},
19+
{
20+
"cell_type": "raw",
21+
"metadata": {},
22+
"source": [
23+
" \"type 4\" (α=0, β=1)\n",
24+
" Linear interpolation of the empirical CDF.\n",
25+
" \"type 5\" or \"hazen\" (α=0.5, β=0.5)\n",
26+
" Piecewise linear interpolation.\n",
27+
" \"type 6\" or \"weibull\" (α=0, β=0)\n",
28+
" Weibull plotting positions. Unbiased exceedance probability for all distributions.\n",
29+
" Recommended for hydrologic applications.\n",
30+
" \"type 7\" (α=1, β=1)\n",
31+
" The default values in R.\n",
32+
" Not recommended with probability scales as the min and max data points get plotting positions of 0 and 1, respectively, and therefore cannot be shown.\n",
33+
" \"type 8\" (α=1/3, β=1/3)\n",
34+
" Approximately median-unbiased.\n",
35+
" \"type 9\" or \"blom\" (α=0.375, β=0.375)\n",
36+
" Approximately unbiased positions if the data are normally distributed.\n",
37+
" \"median\" (α=0.3175, β=0.3175)\n",
38+
" Median exceedance probabilities for all distributions (used in ``scipy.stats.probplot``).\n",
39+
" \"apl\" or \"pwm\" (α=0.35, β=0.35)\n",
40+
" Used with probability-weighted moments.\n",
41+
" \"cunnane\" (α=0.4, β=0.4)\n",
42+
" Nearly unbiased quantiles for normally distributed data.\n",
43+
" This is the default value.\n",
44+
" \"gringorten\" (α=0.44, β=0.44)\n",
45+
" Used for Gumble distributions."
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"The purpose of this tutorial is to show how the selected α and β can alter the shape of a probability plot.\n",
53+
"\n",
54+
"First let's get some analytical setup out of the way..."
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": null,
60+
"metadata": {
61+
"collapsed": true
62+
},
63+
"outputs": [],
64+
"source": [
65+
"%matplotlib inline"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {
72+
"collapsed": true
73+
},
74+
"outputs": [],
75+
"source": [
76+
"import numpy\n",
77+
"from matplotlib import pyplot\n",
78+
"from scipy import stats\n",
79+
"import seaborn\n",
80+
"\n",
81+
"clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}\n",
82+
"seaborn.set(style='ticks', context='talk', color_codes=True, rc=clear_bkgd)\n",
83+
"\n",
84+
"import probscale\n",
85+
"\n",
86+
"\n",
87+
"def format_axes(ax1, ax2):\n",
88+
" \"\"\" Sets axes labels and grids \"\"\"\n",
89+
" for ax in (ax1, ax2):\n",
90+
" if ax is not None:\n",
91+
" ax.set_ylim(bottom=1, top=99)\n",
92+
" ax.set_xlabel('Values of Data')\n",
93+
" seaborn.despine(ax=ax)\n",
94+
" ax.yaxis.grid(True)\n",
95+
" \n",
96+
" ax1.legend(loc='upper left', numpoints=1, frameon=False)\n",
97+
" ax1.set_ylabel('Normal Probability Scale')\n",
98+
" if ax2 is not None:\n",
99+
" ax2.set_ylabel('Weibull Probability Scale')"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"metadata": {},
105+
"source": [
106+
"Here we'll generate some fake, normally distributed data and define a Weibull distribution from scipy to use for a probability scale."
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"metadata": {
113+
"collapsed": false
114+
},
115+
"outputs": [],
116+
"source": [
117+
"numpy.random.seed(0) # reproducible\n",
118+
"data = numpy.random.normal(loc=5, scale=1.25, size=37)\n",
119+
"\n",
120+
"# simple weibull distribution\n",
121+
"weibull = stats.weibull_min(2)"
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"metadata": {},
127+
"source": [
128+
"Now let's create probability plots on both Weibull and normal probability scales.\n",
129+
"Additionally, we'll compute the plotting positions two different but commone ways for each plot.\n",
130+
"\n",
131+
"First, in blue circles, we'll show the data with Weibull (α=0, β=0) plotting positions.\n",
132+
"Weibull plotting positions are commonly use in my field, water resources engineering.\n",
133+
"\n",
134+
"In green squares, we'll use Cunnane (α=0.4, β=0.4) plotting positions.\n",
135+
"Cunnane plotting positions are good for normally distributed data and are the default values."
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": null,
141+
"metadata": {
142+
"collapsed": false
143+
},
144+
"outputs": [],
145+
"source": [
146+
"w_opts = {'label': 'Weibull (α=0, β=0)', 'marker': 'o', 'markeredgecolor': 'b'}\n",
147+
"c_opts = {'label': 'Cunnane (α=0.4, β=0.4)', 'marker': 's', 'markeredgecolor': 'g'}\n",
148+
"\n",
149+
"common_opts = {\n",
150+
" 'markerfacecolor': 'none',\n",
151+
" 'markeredgewidth': 1.25,\n",
152+
" 'linestyle': 'none'\n",
153+
"}\n",
154+
"\n",
155+
"fig, (ax1, ax2) = pyplot.subplots(figsize=(10, 8), ncols=2, sharex=True, sharey=False)\n",
156+
"\n",
157+
"for dist, ax in zip([None, weibull], [ax1, ax2]):\n",
158+
" for opts, postype in zip([w_opts, c_opts,], ['weibull', 'cunnane']):\n",
159+
" probscale.probplot(data, ax=ax, dist=dist, probax='y', \n",
160+
" scatter_kws={**opts, **common_opts}, \n",
161+
" pp_kws={'postype': postype})\n",
162+
"\n",
163+
"format_axes(ax1, ax2)\n",
164+
"fig.tight_layout()"
165+
]
166+
},
167+
{
168+
"cell_type": "markdown",
169+
"metadata": {},
170+
"source": [
171+
"This demostrates that the different formulations of the plotting positions vary most at the extreme values of the dataset. \n",
172+
"\n",
173+
"Next, let's compare the HAzen/Type 5 (α=0.5, β=0.5) formulation to Cunnane.\n",
174+
"Hzewn plotting positions (shown as red triangles) represet a piece-wise linear interpolation of the emperical cumulative distribution function of the dataset.\n",
175+
"\n",
176+
"Given the values of α and β=0.5 vary only slightly from the Cunnane values, the plotting position predictably are similar."
177+
]
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": null,
182+
"metadata": {
183+
"collapsed": false,
184+
"scrolled": false
185+
},
186+
"outputs": [],
187+
"source": [
188+
"h_opts = {'label': 'Hazen (α=0.5, β=0.5)', 'marker': '^', 'markeredgecolor': 'r'}\n",
189+
"fig, (ax1, ax2) = pyplot.subplots(figsize=(10, 8), ncols=2, sharex=True, sharey=False)\n",
190+
"\n",
191+
"for dist, ax in zip([None, weibull], [ax1, ax2]):\n",
192+
" for opts, postype in zip([c_opts, h_opts,], ['cunnane', 'Hazen']):\n",
193+
" probscale.probplot(data, ax=ax, dist=dist, probax='y', \n",
194+
" scatter_kws={**opts, **common_opts}, \n",
195+
" pp_kws={'postype': postype})\n",
196+
"\n",
197+
"format_axes(ax1, ax2)\n",
198+
"fig.tight_layout()"
199+
]
200+
},
201+
{
202+
"cell_type": "markdown",
203+
"metadata": {},
204+
"source": [
205+
"At the risk of showing a very cluttered and hard to read figure, let's throw all three on the same normal probability scale:"
206+
]
207+
},
208+
{
209+
"cell_type": "code",
210+
"execution_count": null,
211+
"metadata": {
212+
"collapsed": false
213+
},
214+
"outputs": [],
215+
"source": [
216+
"fig, ax1 = pyplot.subplots(figsize=(6, 8))\n",
217+
"\n",
218+
"for opts, postype in zip([w_opts, c_opts, h_opts,], ['weibull', 'cunnane', 'hazen']):\n",
219+
" probscale.probplot(data, ax=ax1, dist=None, probax='y', \n",
220+
" scatter_kws={**opts, **common_opts}, \n",
221+
" pp_kws={'postype': postype})\n",
222+
" \n",
223+
"format_axes(ax1, None)\n",
224+
"fig.tight_layout()"
225+
]
226+
},
227+
{
228+
"cell_type": "markdown",
229+
"metadata": {},
230+
"source": [
231+
"Again, the different values of α and β don't significantly alter the shape of the probability plot near between -- say -- the lower and upper quartiles.\n",
232+
"Beyond the quartiles, however, the difference is more obvious.\n",
233+
"\n",
234+
"The cell below computes the plotting positions with the three sets of α and β values that we've investigated and prints the first ten value for easy comparison."
235+
]
236+
},
237+
{
238+
"cell_type": "code",
239+
"execution_count": null,
240+
"metadata": {
241+
"collapsed": false
242+
},
243+
"outputs": [],
244+
"source": [
245+
"# weibull plotting positions and sorted data\n",
246+
"w_probs, _ = probscale.plot_pos(data, postype='weibull')\n",
247+
"\n",
248+
"# normal plotting positions, returned \"data\" is identical to above\n",
249+
"c_probs, _ = probscale.plot_pos(data, postype='cunnane')\n",
250+
"\n",
251+
"# type 4 plot positions\n",
252+
"h_probs, _ = probscale.plot_pos(data, postype='hazen')\n",
253+
"\n",
254+
"# convert to percentages\n",
255+
"w_probs *= 100\n",
256+
"c_probs *= 100\n",
257+
"h_probs *= 100\n",
258+
"\n",
259+
"print('Weibull: ', numpy.round(w_probs[:10], 2))\n",
260+
"print('Cunnane: ', numpy.round(c_probs[:10], 2))\n",
261+
"print('Hazen: ', numpy.round(h_probs[:10], 2))"
262+
]
263+
}
264+
],
265+
"metadata": {
266+
"kernelspec": {
267+
"display_name": "Python 3",
268+
"language": "python",
269+
"name": "python3"
270+
},
271+
"language_info": {
272+
"codemirror_mode": {
273+
"name": "ipython",
274+
"version": 3
275+
},
276+
"file_extension": ".py",
277+
"mimetype": "text/x-python",
278+
"name": "python",
279+
"nbconvert_exporter": "python",
280+
"pygments_lexer": "ipython3",
281+
"version": "3.5.1"
282+
}
283+
},
284+
"nbformat": 4,
285+
"nbformat_minor": 0
286+
}

0 commit comments

Comments
 (0)