

   density {base}                               R Documentation

   KKeerrnneell DDeennssiittyy EEssttiimmaattiioonn

   DDeessccrriippttiioonn::

        The function `density' computes kernel density esti-
        mates with the given kernel and bandwidth.

        The generic functions `plot' and `print' have methods
        for density objects.

   UUssaaggee::

        density(x, bw, adjust = 1,
                kernel=c("gaussian", "epanechnikov", "rectangular", "triangular",
                         "biweight", "cosine", "optcosine"),
                window = kernel, width,
                give.Rkern = FALSE,
                n = 512, from, to, cut = 3, na.rm = FALSE)
        print(dobj)
        plot(dobj, main = NULL, xlab = NULL, ylab = "Density", type = "l",
             zero.line = TRUE, ...)

   AArrgguummeennttss::

          x: the data from which the estimate is to be com-
             puted.

         bw: the smoothing bandwidth to be used.  The kernels
             are scaled such that this is the standard devia-
             tion of the smoothing kernel.  It defaults to 0.9
             times the minimum of the standard deviation and
             the interquartile range divided by 1.34 times the
             sample size to the negative one-fifth power (=
             Silverman's ``rule of thumb'') unless the quar-
             tiles coincide where `bw > 0' will be guaranteed.
             The specified (or default) value of `bw' is multi-
             plied by `adjust'.

     adjust: the bandwidth used is actually `adjust*bw'.  This
             makes it easy to specify values like ``half the
             default'' bandwidth.

   kernel,window: a character string giving the smoothing ker-
             nel to be used.  This must be one of `"gaussian"',
             `"rectangular"', `"triangular"', `"epanechnikov"',
             `"biweight"', `"cosine"' or `"optcosine"', with
             default `"gaussian"', and may be abbreviated to a
             unique prefix (single letter).

             `"cosine"' is smoother than `"optcosine"', which
             is the usual ``cosine'' kernel in the literature
             and almost MSE-efficient.

      width: this exists for compatibility with S; if given,
             and `bw' is not, will set `bw = width/4'.

   give.Rkern: logical; if true, no density is estimated, and
             the ``canonical bandwidth'' of the chosen `kernel'
             is returned instead.

          n: the number of equally spaced points at which the
             density is to be estimated.  When `n > 512', it is
             rounded up to the next power of 2 for efficieny
             reasons (`fft').

    from,to: the left and right-most points of the grid at
             which the density is to be estimated.

        cut: by default, the values of `left' and `right' are
             `cut' bandwidths beyond the extremes of the data.
             This allows the estimated density to drop to
             approximately zero at the extremes.

      na.rm: logical; if `TRUE', missing values are removed
             from `x'. If `FALSE' any missing values cause an
             error.

       dobj: a ``density'' object.

   main, xlab, ylab, type: plotting parameters with useful
             defaults.

        ...: further plotting parameters.

   zero.line: logical; if `TRUE', add a base line at y = 0

   DDeettaaiillss::

        The algorithm used in `density' disperses the mass of
        the empirical distribution function over a regular grid
        of at least 512 points and then uses the fast Fourier
        transform to convolve this approximation with a dis-
        cretized version of the kernel and then uses linear
        approximation to evaluate the density at the specified
        points.

        The statistical properties of a kernel are determined
        by sig^2 (K) = int(t^2 K(t) dt) which is always = 1 for
        our kernels (and hence the bandwidth `bw' is the stan-
        dard deviation of the kernel) and R(K) = int(K^2(t)
        dt).
        MSE-equivalent bandwidths (for different kernels) are
        proportional to sig(K) R(K) which is scale invariant
        and for our kernels equal to R(K).  This value is
        returned when `give.Rkern = TRUE'.  See the examples
        for using exact equivalent bandwidths.

        Infinite values in `x' are assumed to correspond to a
        point mass at `+/-Inf' and the density estimate is of
        the sub-density on `(-Inf, +Inf)'.

   VVaalluuee::

        If `give.Rkern' is true, the number R(K), otherwise an
        object with class `"density"' whose underlying struc-
        ture is a list containing the following components.

          x: the `n' coordinates of the points where the den-
             sity is estimated.

          y: the estimated density values.

         bw: the bandwidth used.

          N: the sample size after elimination of missing val-
             ues.

       call: the call which produced the result.

   data.name: the deparsed name of the `x' argument.

     has.na: logical, for compatibility (always FALSE).

   RReeffeerreenncceess::

        Silverman, B. W. (1986).  Density Estimation.  London:
        Chapman and Hall.

        Venables, W. N. and B. D. Ripley (1994,7,9).  Modern
        Applied Statistics with S-PLUS.  New York: Springer.

        Scott, D. W. (1992).  Multivariate Density Estimation.
        Theory, Practice and Visualization.  New York: Wiley.

        Sheather, S. J. and M. C. Jones (1991).  ``A reliable
        data-based bandwidth selection method for kernel den-
        sity estimation.  J. Roy. Statist. Soc. B, 683-690.

   SSeeee AAllssoo::

        `convolve', `hist'.

   EExxaammpplleess::

        plot(density(c(-20,rep(0,98),20)),xlim=c(-4,4))# IQR = 0

        # The Old Faithful geyser data
        data(faithful)
        d <- density(faithful$eruptions, bw=0.15)
        d
        plot(d)

        plot(d, type="n")
        polygon(d, col="wheat")

        ## Missing values:
        x <- xx <- faithful$eruptions
        x[i.out <- sample(length(x), 10)] <- NA
        doR <- density(x, bw=0.15, na.rm = TRUE)
        lines(doR, col="blue")
        points(xx[i.out], rep(.01,10))

        (kernels <- eval(formals(density)$kernel))

        plot (density(0,bw=1))
        for(i in 2:length(kernels))
           lines(density(0,bw=1, kern= kernels[i]), col=i)
        mtext(side=3, "R's density() kernels with bw=1")
        legend(1.5,.4, leg=kernels, col=seq(kernels),lty=1, cex=.8, y.int=1)

        (RKs <- cbind(sapply(kernels, function(k)density(kern=k, give.Rkern=TRUE))))
        100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies

        data(precip)
        plot(density(precip, n=2^13))
        for(i in 2:length(kernels))
           lines(density(precip, kern= kernels[i], n=2^13), col=i)
        mtext(side=3, "same scale bandwidths, 7 different kernels")

        ## Bandwidth Adjustment for "Exactly Equivalent Kernels"
        h.f <- sapply(kernels, function(k)density(kern=k, give.Rkern=TRUE))
        (h.f <- (h.f["gaussian"] / h.f)^ .2)
        ## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..

        plot(density(precip, n=2^13))
        for(i in 2:length(kernels))
           lines(density(precip, adjust = h.f[i], kern= kernels[i], n=2^13), col=i)
        mtext(side=3, "equivalent bandwidths, 7 different kernels")
        legend(55,.035, leg=kernels, col=seq(kernels),lty=1)

