|
2 | 2 | \alias{densityBC}
|
3 | 3 | \title{Kernel Density Estimation with Optional Boundary Correction}
|
4 | 4 | \description{
|
5 |
| - A simple implementation of fixed-bandwidth kernel density |
| 5 | + Fixed-bandwidth kernel density |
6 | 6 | estimation on the real line, or the positive real half-line,
|
7 |
| - including optional corrections for a boundary at zero. |
| 7 | + including optional corrections for a boundary at zero. |
8 | 8 | }
|
9 | 9 | \usage{
|
10 | 10 | densityBC(x, kernel = "epanechnikov", bw=NULL,
|
|
18 | 18 | internal=list())
|
19 | 19 | }
|
20 | 20 | \arguments{
|
21 |
| - \item{x}{Numeric vector.} |
| 21 | + \item{x}{Numeric vector of observed values.} |
22 | 22 | \item{kernel}{String specifying kernel.
|
23 | 23 | Options are
|
24 | 24 | \code{"gaussian"}, \code{"rectangular"},
|
|
27 | 27 | \code{"biweight"},
|
28 | 28 | \code{"cosine"} and \code{"optcosine"}.
|
29 | 29 | (Partial matching is used).
|
| 30 | + Options are described in the help for \code{\link[stats]{density.default}}. |
30 | 31 | }
|
31 | 32 | \item{bw,h}{
|
32 | 33 | Alternative specifications of the scale factor for the kernel.
|
33 | 34 | The bandwidth \code{bw} is the standard deviation of the
|
34 | 35 | kernel (this agrees with the argument \code{bw} in
|
35 |
| - \code{\link[stats]{density.default}}. |
| 36 | + \code{\link[stats]{density.default}}). |
36 | 37 | The rescale factor \code{h} is the factor by which
|
37 | 38 | the `standard form' of the kernel is rescaled. For the Epanechnikov
|
38 | 39 | kernel, \code{h = bw * sqrt(5)} is the
|
|
66 | 67 | String (partially matched) specifying a correction for the boundary effect
|
67 | 68 | bias at \eqn{r=0} when estimating a density on the positive
|
68 | 69 | half line. Possible values are
|
69 |
| - \code{"none"}, \code{"weighted"}, \code{"convolution"}, \code{"reflection"} |
70 |
| - and \code{"bdrykern"}. |
| 70 | + \code{"none"}, \code{"weighted"}, \code{"convolution"}, |
| 71 | + \code{"reflection"}, \code{"bdrykern"} and \code{"JonesFoster"}. |
71 | 72 | }
|
72 | 73 | \item{fast}{
|
73 | 74 | Logical value specifying whether to perform the calculation rapidly
|
74 | 75 | using the Fast Fourier Transform (\code{fast=TRUE})
|
75 | 76 | or to use slower, exact code (\code{fast=FALSE}, the default).
|
76 |
| - Option \code{zerocor="bdrykern"} is not available when \code{fast=TRUE}. |
77 | 77 | }
|
78 | 78 | \item{internal}{
|
79 | 79 | Internal use only.
|
|
83 | 83 | }
|
84 | 84 | }
|
85 | 85 | \details{
|
| 86 | + This function computes a fixed-bandwidth kernel estimate of |
| 87 | + a probability density on the real line, or the positive half-line, |
| 88 | + including optional boundary corrections |
| 89 | + for truncation of the density onto the positive half line. |
| 90 | +
|
| 91 | + Weighted estimates are supported, including negative weights. |
| 92 | + Weights are not renormalised to sum to 1. The resulting |
| 93 | + probability density estimate is not renormalised to integrate to 1. |
| 94 | +
|
| 95 | + Options for the smoothing kernel |
| 96 | + are described in the help for \code{\link[stats]{density.default}}. |
| 97 | + The default is the Epanechnikov (truncated quadratic) kernel. |
| 98 | + |
86 | 99 | If \code{zerocor} is missing, or given as \code{"none"},
|
87 |
| - this function computes the fixed bandwidth kernel estimator of the |
88 |
| - probability density on the real line. |
| 100 | + this function computes the fixed-bandwidth kernel estimator of the |
| 101 | + probability density on the real line, |
| 102 | + using \code{\link[stats]{density.default}}. |
| 103 | + The estimated probability density (unnormalised) is |
| 104 | + \deqn{ |
| 105 | + \widehat f(x) = \sum_{i=1}^n w_i \; \kappa(x - x_i) |
| 106 | + }{ |
| 107 | + f(x) = sum[i=1,...,n] w[i] * kappa(x - x[i]) |
| 108 | + } |
| 109 | + where \eqn{x_1,\ldots,x_n}{x[1], ..., x[n]} are the data values, |
| 110 | + \eqn{w_1,\ldots,w_n}{w[1], ..., w[n]} are the weights (defaulting |
| 111 | + to \eqn{w_i = 1/n}{w[i] = 1/n}), |
| 112 | + and \eqn{\kappa}{kappa} is the kernel, a probability density on the |
| 113 | + real line. |
89 | 114 |
|
90 |
| - If \code{zerocor} is given, it is assumed that the density |
91 |
| - is confined to the positive half-line, and a boundary correction is |
| 115 | + If \code{zerocor} is given, the probability density is assumed to be |
| 116 | + confined to the positive half-line; the numerical values in \code{x} |
| 117 | + must all be non-negative; and a boundary correction is |
92 | 118 | applied to compensate for bias arising due to truncation at the origin:
|
93 | 119 | \describe{
|
94 | 120 | \item{\code{zerocor="weighted"}:}{
|
|
97 | 123 | where \eqn{m(x) = 1 - F(-x)} is the total mass of the kernel centred on
|
98 | 124 | \eqn{x} that lies in the positive half-line, and \eqn{F(x)} is the
|
99 | 125 | cumulative distribution function of the kernel.
|
| 126 | + The corrected estimate is |
| 127 | + \deqn{ |
| 128 | + \widehat f_W(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x_i)} |
| 129 | + }{ |
| 130 | + fW(x) = sum[i=1,...,n] w[i] * kappa(x - x[i])/(1-F(-x[i])) |
| 131 | + } |
100 | 132 | This is the \dQuote{cut-and-normalization} method of
|
101 | 133 | Gasser and \ifelse{latex}{\out{M\"{u}ller}}{Mueller} (1979).
|
102 | 134 | Effectively the kernel is renormalized so that it integrates to 1,
|
103 | 135 | and the adjusted kernel conserves mass.
|
104 | 136 | }
|
105 | 137 | \item{\code{zerocor="convolution"}:}{
|
106 |
| - The estimate of the density \eqn{f(r)} is |
107 |
| - weighted by the factor \eqn{1/m(r)} where \eqn{m(r) = 1 - F(-r)} |
| 138 | + The estimate of the density \eqn{f(x)} is |
| 139 | + weighted by the factor \eqn{1/m(x)} where \eqn{m(r) = 1 - F(-x)} |
108 | 140 | is given above.
|
| 141 | + The corrected estimate is |
| 142 | + \deqn{ |
| 143 | + \widehat f_C(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x)} |
| 144 | + }{ |
| 145 | + fC(x) = sum[i=1,...,n] w[i] * kappa(x - x[i])/(1-F(-x)) |
| 146 | + } |
109 | 147 | This is the \dQuote{convolution}, \dQuote{uniform}
|
110 | 148 | or \dQuote{zero-order} boundary correction method
|
111 | 149 | often attributed to Diggle (1985).
|
112 | 150 | This correction does not conserve mass.
|
| 151 | + It is faster to compute than the weighted correction. |
113 | 152 | }
|
114 | 153 | \item{\code{zerocor="reflection"}:}{
|
115 | 154 | if the kernel centred at data point \eqn{x_i}{x[i]}
|
116 | 155 | has a tail that lies on the negative half-line, this tail is
|
117 | 156 | reflected onto the positive half-line.
|
118 |
| - This is the reflection method first proposed by |
119 |
| - Boneva et al (1971). |
120 |
| - This correction conserves mass. |
| 157 | + The corrected estimate is |
| 158 | + \deqn{ |
| 159 | + \widehat f_R(x) = \sum_{i=1}^n w_i \; |
| 160 | + [ \kappa(x - x_i) + \kappa(-x - x_i) ] |
| 161 | + }{ |
| 162 | + fR(x) = sum[i=1,...,n] w[i] * ( kappa(x - x[i]) + kappa(-x - x[i]) ) |
| 163 | + } |
| 164 | + This is the \dQuote{reflection} method first proposed by |
| 165 | + Boneva et al (1971). This correction conserves mass. |
| 166 | + The estimated density always has zero derivative at the origin, |
| 167 | + \eqn{\widehat f_R^\prime(0) = 0}{fR'(0) = 0}, which may or |
| 168 | + may not be desirable. |
121 | 169 | }
|
122 | 170 | \item{\code{zerocor="bdrykern"}:}{
|
123 | 171 | The density estimate is computed using the
|
124 | 172 | Linear Boundary Kernel associated with the chosen kernel
|
125 | 173 | (Wand and Jones, 1995, page 47).
|
126 |
| - That is, when estimating the density \eqn{f(r)} for values of |
127 |
| - \eqn{r} close to zero (defined as \eqn{r < h} for all kernels |
| 174 | + The estimated (unnormalised) probability density is |
| 175 | + \deqn{ |
| 176 | + \widehat f_B(x) = \sum_{i=1}^n w_i \; |
| 177 | + [ A(x) + (x-x_i) B(x)] \kappa(x - x_i) |
| 178 | + }{ |
| 179 | + fB(x) = sum[i=1,...,n] w[i] * |
| 180 | + ( A(x) + (x-x[i]) B(x)) * kappa(x - x[i]) |
| 181 | + } |
| 182 | + where \eqn{A(x) = a_2(x)/D(x)}{A(x) = a[2](x)/D(x)} and |
| 183 | + \eqn{B(x) = -a_1(x)/D(x)}{B(x) = -a[1](x)/D(x)} |
| 184 | + with |
| 185 | + \eqn{D(x) = a_0(x) a_2(x) - a_1(x)^2}{D(x) = a[0](x) a[2](x) - a[1](x)^2} |
| 186 | + where |
| 187 | + \eqn{ |
| 188 | + a_k(x) = \int_{-\infty}^x t^k \kappa(t) dt. |
| 189 | + }{ |
| 190 | + a[k](x) = integral[-Inf,x] ( t^k * kappa(t) dt). |
| 191 | + } |
| 192 | + That is, when estimating the density \eqn{f(x)} for values of |
| 193 | + \eqn{x} close to zero (defined as \eqn{x < h} for all kernels |
128 | 194 | except the Gaussian), the kernel contribution
|
129 |
| - \eqn{k_h(r - x_i)}{k[h](r - x[i])} is multiplied by a |
130 |
| - term that is a linear function of \eqn{r - x_i}{r - x[i]}, |
131 |
| - with coefficients depending on \eqn{r}. |
| 195 | + \eqn{k_h(x - x_i)}{k[h](x - x[i])} is multiplied by a |
| 196 | + term that is a linear function of \eqn{x - x_i}{x - x[i]}, |
| 197 | + with coefficients depending on \eqn{x}. |
132 | 198 | This correction does not conserve mass and may result in
|
133 | 199 | negative values, but is asymptotically optimal.
|
| 200 | + Computation time for this estimate is greater than for |
| 201 | + the options above. |
134 | 202 | }
|
135 | 203 | \item{\code{zerocor="JonesFoster"}:}{
|
136 | 204 | The modification of the Boundary Kernel estimate
|
137 |
| - proposed by Jones and Foster (1996) is computed. This is equal to |
138 |
| - \eqn{ |
139 |
| - \overline f(r) \exp( \hat f(r)/\overline f(r) - 1) |
| 205 | + proposed by Jones and Foster (1996) is computed: |
| 206 | + \deqn{ |
| 207 | + \widehat f_{JF}(x) = |
| 208 | + \widehat f_C(x) |
| 209 | + \exp\left( \frac{\widehat f_B(x)}{\widehat f_C(r)} - 1 \right) |
140 | 210 | }{
|
141 |
| - f#(r) exp(f*(r)/f#(r) - 1) |
| 211 | + fJF(x) = fC(x) exp(fB(x)/fC(x) - 1) |
142 | 212 | }
|
143 |
| - where \eqn{\overline f(r)}{f#(r)} is the convolution estimator |
144 |
| - and \eqn{\hat f(r)}{f*(r)} is the linear boundary kernel estimator. |
| 213 | + where \eqn{\widehat f_C(r)}{fC(r)} is the convolution estimator |
| 214 | + and \eqn{\widehat f_B(r)}{fB(r)} is the linear boundary kernel estimator. |
145 | 215 | This ensures that the estimate is always nonnegative
|
146 |
| - and retains the asymptotic optimality of the linear boundary kernel. |
| 216 | + and retains the asymptotic optimality of the linear boundary |
| 217 | + kernel. |
| 218 | + Computation time for this estimate |
| 219 | + is greater than for all the options above. |
147 | 220 | }
|
148 | 221 | }
|
149 | 222 | If \code{fast=TRUE}, the calculations are performed rapidly using
|
|
167 | 240 | legend(2, 0.8,
|
168 | 241 | legend=c("fixed bandwidth", "boundary kernel", "true density"),
|
169 | 242 | col=1:3, lty=rep(1,3))
|
| 243 | +} |
| 244 | +\seealso{ |
| 245 | + \code{\link[stats]{density.default}}. |
| 246 | + |
| 247 | + \code{\link{dkernel}} for the kernel itself. |
170 | 248 |
|
| 249 | + \code{\link{densityAdaptiveKernel.default}} for adaptive |
| 250 | + (variable-bandwidth) estimation. |
171 | 251 | }
|
172 | 252 | \references{
|
173 | 253 | \ournewpaper
|
|
176 | 256 | Spline transformations: three new diagnostic aids for the
|
177 | 257 | statistical data-analyst (with discussion).
|
178 | 258 | \emph{Journal of the Royal Statistical Society, Series B},
|
179 |
| - \bold{33}, 1–-70. |
| 259 | + \bold{33}, 1--70. |
180 | 260 |
|
181 | 261 | Diggle, P.J. (1985)
|
182 | 262 | A kernel method for smoothing point process data.
|
183 | 263 | \emph{Journal of the Royal Statistical Society, Series C (Applied Statistics)},
|
184 |
| - \bold{34} 138–-147. |
| 264 | + \bold{34} 138--147. |
185 | 265 |
|
186 | 266 | Gasser, Th. and \ifelse{latex}{\out{M\"{u}ller}}{Mueller}, H.-G. (1979).
|
187 | 267 | Kernel estimation of regression functions.
|
188 | 268 | In Th. Gasser and M. Rosenblatt (editors)
|
189 | 269 | \emph{Smoothing Techniques for Curve Estimation}, pages
|
190 |
| - 23-–68. Springer, Berlin. |
| 270 | + 23--68. Springer, Berlin. |
191 | 271 |
|
192 | 272 | Jones, M.C. and Foster, P.J. (1996)
|
193 | 273 | A simple nonnegative boundary correction method for kernel density
|
|
0 commit comments