

   aggregate {base}                             R Documentation

   CCoommppuuttee SSuummmmaarryy SSttaattiissttiiccss ooff DDaattaa SSuubbsseettss

   DDeessccrriippttiioonn::

        Splits the data into subsets, computes summary statis-
        tics for each, and returns the result in a convenient
        form.

   UUssaaggee::

        aggregate(x, ...)
        aggregate.default(x, ...)
        aggregate.data.frame(x, by, FUN, ...)
        aggregate.ts(x, nfrequency = 1, FUN = sum, ndeltat = 1)

   AArrgguummeennttss::

          x: an R object.

         by: a list of grouping elements, each as long as the
             variables in `x'.  Names for the grouping vari-
             ables are provided if they are not given.

        FUN: a scalar function to compute the summary statis-
             tics which can be applied to all data subsets.

   nfrequency: new number of observations per unit of time;
             must be a divisor of the frequency of `x'.

    ndeltat: new fraction of the sampling period between suc-
             cessive observations; must be a divisor of the
             sampling interval of `x'.

        ...: further arguments passed to the method used.

   DDeettaaiillss::

        `aggregate' is a generic functions with methods for
        data frames and time series.

        The default method `aggregate.default' uses the time
        series method if `x' is a time series, and otherwise
        coerces `x' to a data frame and calls the data frame
        method.

        `aggregate.data.frame' is the data frame method.  If
        `x' is not a data frame, it is coerced to one.  Then,
        each of the variables (columns) in `x' is split into
        subsets of cases (rows) of identical combinations of
        the components of `by', and `FUN' is applied to each
        such subset with further arguments in `...' passed to
        it.  (I.e., `tapply(VAR, by, FUN, ..., simplify =
        FALSE)' is done for each variable `VAR' in `x', conve-
        niently wrapped into one call to `lapply()'.)  Empty
        subsets are removed, and the result is reformatted into
        a data frame containing the variables in `by' and `x'.
        The ones arising from `by' contain the unique combina-
        tions of grouping values used for determining the sub-
        sets, and the ones arising from `x' the corresponding
        summary statistics for the subset of the respective
        variables in `x'.

        `aggregate.ts' is the time series method.  If `x' is
        not a time series, it is coerced to one.  Then, the
        variables in `x' are split into appropriate blocks of
        length `frequency(x) / nfrequency', and `FUN' is
        applied to each such block.  The result returned is a
        time series with frequency `nfrequency' holding the
        aggregated values.

   AAuutthhoorr((ss))::

        Kurt Hornik

   SSeeee AAllssoo::

        `apply', `lapply', `tapply'.

   EExxaammpplleess::

        data(state)

        ## Compute the averages for the variables in `state.x77', grouped
        ## according to the region (Northeast, South, North Central, West) that
        ## each state belongs to.
        aggregate(state.x77, list(Region = state.region), mean)

        ## Compute the averages according to region and the occurrence of more
        ## than 130 days of frost.
        aggregate(state.x77,
                  list(Region = state.region,
                       Cold = state.x77[,"Frost"] > 130),
                  mean)
        ## (Note that no state in `South' is THAT cold.)

        data(presidents)
        ## Compute the average annual approval ratings for American presidents.
        aggregate(presidents, nf = 1, FUN = mean)

