Skip to content

Commit 2b5008c

Browse files
committed
Clean up the API around bootstrapping the best-fit
1 parent 1157929 commit 2b5008c

File tree

7 files changed

+81
-37
lines changed

7 files changed

+81
-37
lines changed

docs/tutorial/closer_look_at_plot_pos.ipynb

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
"metadata": {},
66
"source": [
77
"# Using different formulations of plotting positions\n",
8-
"### Looking at normal vs Weibull scales + Cunnane vs Weibull plotting positions\n",
8+
"\n",
9+
"## Computing plotting positions\n",
910
"\n",
1011
"When drawing a percentile, quantile, or probability plot, the potting positions of ordered data must be computed.\n",
1112
"\n",
@@ -102,6 +103,13 @@
102103
" ax2.set_ylabel('Weibull Probability Scale')"
103104
]
104105
},
106+
{
107+
"cell_type": "markdown",
108+
"metadata": {},
109+
"source": [
110+
"## Normal vs Weibull scales and Cunnane vs Weibull plotting positions"
111+
]
112+
},
105113
{
106114
"cell_type": "markdown",
107115
"metadata": {},
@@ -173,6 +181,8 @@
173181
"source": [
174182
"This demostrates that the different formulations of the plotting positions vary most at the extreme values of the dataset. \n",
175183
"\n",
184+
"### Hazen plotting positions\n",
185+
"\n",
176186
"Next, let's compare the Hazen/Type 5 (α=0.5, β=0.5) formulation to Cunnane.\n",
177187
"Hazen plotting positions (shown as red triangles) represet a piece-wise linear interpolation of the emperical cumulative distribution function of the dataset.\n",
178188
"\n",
@@ -205,6 +215,8 @@
205215
"cell_type": "markdown",
206216
"metadata": {},
207217
"source": [
218+
"### Summary\n",
219+
"\n",
208220
"At the risk of showing a very cluttered and hard to read figure, let's throw all three on the same normal probability scale:"
209221
]
210222
},
@@ -267,7 +279,7 @@
267279
],
268280
"metadata": {
269281
"kernelspec": {
270-
"display_name": "Python 3",
282+
"display_name": "Python [default]",
271283
"language": "python",
272284
"name": "python3"
273285
},
@@ -281,7 +293,7 @@
281293
"name": "python",
282294
"nbconvert_exporter": "python",
283295
"pygments_lexer": "ipython3",
284-
"version": "3.5.1"
296+
"version": "3.5.2"
285297
}
286298
},
287299
"nbformat": 4,

docs/tutorial/closer_look_at_viz.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -436,7 +436,7 @@
436436
"cell_type": "markdown",
437437
"metadata": {},
438438
"source": [
439-
"## Adding best-fit lines\n",
439+
"## Best-fit lines\n",
440440
"\n",
441441
"Adding a best-fit line to a probability plot can provide insight as to whether or not a dataset can be characterized by a distribution.\n",
442442
"\n",
@@ -446,6 +446,8 @@
446446
"Visual attributes of the line can be controled with the `line_kws` parameter.\n",
447447
"If you want label the best-fit line, that is where you specify its label.\n",
448448
"\n",
449+
"### Simple examples\n",
450+
"\n",
449451
"The most trivial case is a P-P plot with a linear data axis"
450452
]
451453
},
@@ -705,7 +707,7 @@
705707
],
706708
"metadata": {
707709
"kernelspec": {
708-
"display_name": "Python 3",
710+
"display_name": "Python [default]",
709711
"language": "python",
710712
"name": "python3"
711713
},

docs/tutorial/getting_started.ipynb

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@
7474
"source": [
7575
"## Background\n",
7676
"\n",
77+
"### Built-in matplotlib scales\n",
78+
"\n",
7779
"To the casual user, you can set matplotlib scales to either \"linear\" or \"log\" (logarithmic). There are others (e.g., logit, symlog), but I haven't seen them too much in the wild.\n",
7880
"\n",
7981
"Linear scales are the default:"
@@ -374,8 +376,9 @@
374376
}
375377
],
376378
"metadata": {
379+
"anaconda-cloud": {},
377380
"kernelspec": {
378-
"display_name": "Python 3",
381+
"display_name": "Python [default]",
379382
"language": "python",
380383
"name": "python3"
381384
},

probscale/algo.py

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -61,17 +61,6 @@ def _fit_simple(x, y, xhat, fitlogs=None):
6161
return yhat, results
6262

6363

64-
def _bs_resid(x, y, xhat, fitlogs=None, niter=10000, alpha=0.05):
65-
index = _make_boot_index(len(x), niter)
66-
yhat, results = _fit_simple(x, y, xhat, fitlogs=fitlogs)
67-
resid = y - yhat
68-
bs_y = y + resid[index]
69-
70-
percentiles = 100 * numpy.array([alpha*0.5, 1 - alpha*0.5])
71-
yhat_lo, yhat_hi = numpy.percentile(bs_y, percentiles, axis=0)
72-
return yhat_lo, yhat_hi
73-
74-
7564
def _bs_fit(x, y, xhat, fitlogs=None, niter=10000, alpha=0.05):
7665
"""
7766
Percentile method bootstrapping of linear fit of x and y data using

probscale/tests/test_validate.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -90,14 +90,14 @@ def test_axis_label(value, expected):
9090
assert result == expected
9191

9292

93-
@pytest.mark.parametrize(('value', 'expected'), [
94-
('fit', algo._bs_fit),
95-
('resids', algo._bs_resid),
96-
('junk', None)
93+
@pytest.mark.parametrize(('value', 'expected', 'error'), [
94+
('fit', algo._bs_fit, None),
95+
('resids', None, NotImplementedError),
96+
('junk', None, ValueError)
9797
])
98-
def test_estimator(value, expected):
99-
if expected is None:
100-
with pytest.raises(ValueError):
98+
def test_estimator(value, expected, error):
99+
if error is not None:
100+
with pytest.raises(error):
101101
validate.estimator(value)
102102
else:
103103
est = validate.estimator(value)

probscale/validate.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
from matplotlib import pyplot
22

3+
from .algo import _bs_fit
4+
35

46
def axes_object(ax):
57
""" Checks if a value if an Axes. If None, a new one is created.
@@ -85,12 +87,12 @@ def other_options(options):
8587
return dict() if options is None else options.copy()
8688

8789
def estimator(value):
88-
from .algo import _bs_fit, _bs_resid
8990
if value.lower() in ['res', 'resid', 'resids', 'residual', 'residuals']:
90-
est = _bs_resid
91+
msg = 'Bootstrapping the residuals is not ready yet'
92+
raise NotImplementedError(msg)
9193
elif value.lower() in ['fit', 'values']:
9294
est = _bs_fit
9395
else:
9496
raise ValueError('estimator must be either "resid" or "fit".')
9597

96-
return est
98+
return est

probscale/viz.py

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,50 +10,67 @@
1010

1111
def probplot(data, ax=None, plottype='prob', dist=None, probax='x',
1212
problabel=None, datascale='linear', datalabel=None,
13-
bestfit=False, estimate_ci=False,
14-
return_best_fit_results=False,
15-
scatter_kws=None, line_kws=None, pp_kws=None,
16-
**fgkwargs):
13+
bestfit=False, return_best_fit_results=False,
14+
estimate_ci=False, ci_kws=None, pp_kws=None,
15+
scatter_kws=None, line_kws=None, **fgkwargs):
1716
"""
1817
Probability, percentile, and quantile plots.
1918
2019
Parameters
2120
----------
2221
data : array-like
2322
1-dimensional data to be plotted
23+
2424
ax : matplotlib axes, optional
2525
The Axes on which to plot. If one is not provided, a new Axes
2626
will be created.
27+
2728
plottype : string (default = 'prob')
2829
Type of plot to be created. Options are:
2930
3031
- 'prob': probabilty plot
3132
- 'pp': percentile plot
3233
- 'qq': quantile plot
3334
35+
3436
dist : scipy distribution, optional
3537
A distribtion to compute the scale's tick positions. If not
3638
specified, a standard normal distribution will be used.
39+
3740
probax : string, optional (default = 'x')
3841
The axis ('x' or 'y') that will serve as the probability (or
3942
quantile) axis.
43+
4044
problabel, datalabel : string, optional
4145
Axis labels for the probability/quantile and data axes
4246
respectively.
47+
4348
datascale : string, optional (default = 'log')
4449
Scale for the other axis that is not
50+
4551
bestfit : bool, optional (default is False)
4652
Specifies whether a best-fit line should be added to the plot.
53+
4754
return_best_fit_results : bool (default is False)
4855
If True a dictionary of results of is returned along with the
4956
figure.
50-
scatter_kws, line_kws : dictionary, optional
51-
Dictionary of keyword arguments passed directly to ``ax.plot``
52-
when drawing the scatter points and best-fit line, respectively.
53-
pp_kws : dictionary, optional
57+
58+
estimate_ci : bool, optional (False)
59+
Estimate and draw a confidence band around the best-fit line
60+
using a percentile bootstrap.
61+
62+
ci_kws : dict, optional
63+
Dictionary of keyword arguments passed directly to
64+
``viz.fit_line`` when computing the best-fit line.
65+
66+
pp_kws : dict, optional
5467
Dictionary of keyword arguments passed directly to
5568
``viz.plot_pos`` when computing the plotting positions.
5669
70+
scatter_kws, line_kws : dict, optional
71+
Dictionary of keyword arguments passed directly to ``ax.plot``
72+
when drawing the scatter points and best-fit line, respectively.
73+
5774
Other Parameters
5875
----------------
5976
color : string, optional
@@ -82,7 +99,8 @@ def probplot(data, ax=None, plottype='prob', dist=None, probax='x',
8299
-------
83100
fig : matplotlib.Figure
84101
The figure on which the plot was drawn.
85-
result : dictionary of linear fit results, optional
102+
103+
result : dict of linear fit results, optional
86104
Keys are:
87105
88106
- q : array of quantiles
@@ -93,6 +111,7 @@ def probplot(data, ax=None, plottype='prob', dist=None, probax='x',
93111
See also
94112
--------
95113
viz.plot_pos
114+
viz.fit_line
96115
numpy.polyfit
97116
scipy.stats.probplot
98117
scipy.stats.mstats.plotting_positions
@@ -287,7 +306,9 @@ def plot_pos(data, postype=None, alpha=None, beta=None):
287306
----------
288307
data : array-like
289308
The values whose plotting positions need to be computed.
309+
290310
postype : string, optional (default: "cunnane")
311+
291312
alpha, beta : float, optional
292313
Custom plotting position parameters is the options available
293314
through the `postype` parameter are insufficient.
@@ -296,6 +317,7 @@ def plot_pos(data, postype=None, alpha=None, beta=None):
296317
-------
297318
plot_pos : numpy.array
298319
The computed plotting positions, sorted.
320+
299321
data_sorted : numpy.array
300322
The original data values, sorted.
301323
@@ -384,9 +406,11 @@ def fit_line(x, y, xhat=None, fitprobs=None, fitlogs=None, dist=None,
384406
----------
385407
x, y : array-like
386408
Independent and dependent data, respectively.
409+
387410
xhat : array-like, optional
388411
The values at which ``yhat`` should should be estimated. If
389412
not provided, falls back to the sorted values of ``x``.
413+
390414
fitprobs, fitlogs : str, optional.
391415
Defines how data should be transformed. Valid values are
392416
'x', 'y', or 'both'. If using ``fitprobs``, variables should
@@ -395,12 +419,23 @@ def fit_line(x, y, xhat=None, fitprobs=None, fitlogs=None, dist=None,
395419
Log transform = lambda x: numpy.log(x).
396420
Take care to not pass the same value to both ``fitlogs`` and
397421
``figprobs`` as both transforms will be applied.
422+
398423
dist : distribution, optional
399424
A fully-spec'd scipy.stats distribution-like object
400425
such that ``dist.ppf`` and ``dist.cdf`` can be called. If not
401426
provided, defaults to a minimal implementation of
402427
scipt.stats.norm.
403428
429+
estimate_ci : bool, optional (False)
430+
Estimate and draw a confidence band around the best-fit line
431+
using a percentile bootstrap.
432+
433+
niter : int, optional (default = 10000)
434+
Number of bootstrap iterations if ``estimate_ci`` is provided.
435+
436+
alpha : float, optional (default = 0.05)
437+
The confidence level of the bootstrap estimate.
438+
404439
Returns
405440
-------
406441
xhat, yhat : numpy arrays
@@ -414,6 +449,7 @@ def fit_line(x, y, xhat=None, fitprobs=None, fitlogs=None, dist=None,
414449
- yhat_hi (upper confidence interval of the estimated y-vals)
415450
416451
"""
452+
417453
fitprobs = validate.fit_argument(fitprobs, "fitprobs")
418454
fitlogs = validate.fit_argument(fitlogs, "fitlogs")
419455

@@ -445,7 +481,7 @@ def fit_line(x, y, xhat=None, fitprobs=None, fitlogs=None, dist=None,
445481
yhat, results = algo._fit_simple(x, y, xhat, fitlogs=fitlogs)
446482

447483
if estimate_ci:
448-
yhat_lo, yhat_hi = algo._fit_ci(x, y, xhat, fitlogs=fitlogs,
484+
yhat_lo, yhat_hi = algo._bs_fit(x, y, xhat, fitlogs=fitlogs,
449485
niter=niter, alpha=alpha)
450486
else:
451487
yhat_lo, yhat_hi = None, None

0 commit comments

Comments
 (0)