

   ppr {modreg}                                 R Documentation

   PPrroojjeeccttiioonn PPuurrssuuiitt RReeggrreessssiioonn

   DDeessccrriippttiioonn::

        Fit a projection pursuit regression model.

   UUssaaggee::

        ppr(formula, data = sys.parent(), weights,
            subset, na.action, contrasts = NULL,
            ww = rep(1,q), nterms, max.terms=nterms, optlevel = 2,
            sm.method = c("supsmu", "spline", "gcvspline"),
            bass = 0, span = 0, df = 5, gcvpen = 1)

        ppr(x, y, weights = rep(1,n),
            ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2,
            sm.method = c("supsmu", "spline", "gcvspline"),
            bass = 0, span = 0, df = 5, gcvpen = 1)

   AArrgguummeennttss::

    formula: a regression formula specifying one or more
             response variables and the explanatory variables.

          x: matrix of explanatory variables.  Rows represent
             observations, and columns represent variables.
             Missing values are not accepted.

     nterms: number of terms to include in the final model.

       data: Data frame from which variables specified in `for-
             mula' are preferentially to be taken.

    weights: a vector of weights for each case.

         ww: a vector of weights for each response, so the fit
             criterion is the sum over case `i' and responses
             `j' of `w_i ww_j (y_ij - fit_ij)^2' divided by the
             sum of `w_i'.

     subset: An index vector specifying the cases to be used in
             the training sample.  (NOTE: If given, this argu-
             ment must be named.)

   na.action: A function to specify the action to be taken if
             `NA's are found. The default action is for the
             procedure to fail.  An alternative is `na.omit',
             which leads to rejection of cases with missing
             values on any required variable.  (NOTE: If given,
             this argument must be named.)

   contrasts: the contrasts to be used when any factor explana-
             tory variables are coded.

   max.terms: maximum number of terms to choose from when
             building the model.

   optlevel: integer from 0 to 3 which determines the thorough-
             ness of an optimization routine in the SMART pro-
             gram. See the Details section.

   sm.method: the method used for smoothing the ridge func-
             tions.  The default is to use Friedman's super
             smoother `supsmu'.  The alternatives are to use
             the smoothing spline code underlying
             `smooth.spline', either with a specified (equiva-
             lent) degrees of freedom for each ridge functions,
             or to allow the smoothness to be chosen by GCV.

       bass: super smoother bass tone control used with auto-
             matic span selection (see `supsmu'); the range of
             values is 0 to 10, with larger values resulting in
             increased smoothing.

       span: super smoother span control (see `supsmu').  The
             default, `0', results in automatic span selection
             by local cross validation. `span' can also take a
             value in `(0, 1]'.

         df: if `sm.method' is `"spline"' specifies the smooth-
             ness of each ridge term via the requested equiva-
             lent degrees of freedom.

     gcvpen: if `sm.method' is `"gcvspline"' this is the
             penalty used in the GCV selection for each degree
             of freedom used.

   DDeettaaiillss::

        The basic method is given by Friedman (1984), and is
        essentially the same code used by S-PLUS's `ppreg'.
        This code is extremely sensitive to the compiler used.

        The algorithm first adds up to `max.terms' ridge terms
        one at a time; it will use less if it is unable to find
        a term to add that makes sufficient difference.  It
        then removes the least "important" term at each step
        until `nterm' terms are left.

        The levels of optimization (argument `optlevel') differ
        in how thoroughly the models are refitted during this
        process.  At level 0 the existing ridge terms are not
        refitted.  At level 1 the projection directions are not
        refitted, but the ridge functions and the regression
        coefficients are.  Levels 2 and 3 refit all the terms
        and are equivalent for one response; level 3 is more
        careful to re-balance the contributions from each
        regressor at each step and so is a little less likely
        to converge to a saddle point of the sum of squares
        criterion.

   VVaalluuee::

        A list with the following components, many of which are
        for use by the method functions.

       call: the matched call

          p: the number of explanatory variables (after any
             coding)

          q: the number of response variables

         ml: the argument `max.terms'

        gof: the overall residual (weighted) sum of squares for
             the selected model

       gofn: the overall residual (weighted) sum of squares
             against the number of terms, up to `max.terms'.
             Will be invalid (and zero) for less than `nterms'.

         df: the argument `df'

        edf: if `sm.method' is `"spline"' or `"gcvspline"' the
             equivalent number of degrees of freedom for each
             ridge term used.

     xnames: the names of the explanatory variables

     ynames: the names of the response variables

      alpha: a matrix of the projection directions, with a col-
             umn for each ridge term

       beta: a matrix of the coefficients applied for each
             response to the ridge terms: the rows are the
             responses and the columns the ridge terms

         yb: the weighted means of each response

         ys: the overall scale factor used: internally the
             responses are divided by `ys' to have unit total
             weighted sum of squares.

   fitted.values: the fitted values, as a matrix if `q > 1'

   residuals: the residuals, as a matrix if `q > 1'

       smod: internal work array, which includes the ridge
             functions evaluated at the training set points.

   RReeffeerreenncceess::

        Friedman, J. H. and Stuetzle, W. (1981) Projection pur-
        suit regression.  Journal of the American Statistical
        Association, 76, 817-823.

        Friedman, J. H. (1984) SMART User's Guide.  Laboratory
        for Computational Statistics, Stanford University Tech-
        nical Report No. 1.

   SSeeee AAllssoo::

        `plot.ppr', `supsmu', `smooth.spline'

   EExxaammpplleess::

        # Note: your numerical values may differ
        data(rock)
        attach(rock)
        area1 <- area/10000; peri1 <- peri/10000
        rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                  data=rock, nterms=2, max.terms=5)
        rock.ppr
        # Call:
        # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
        #     nterms = 2, max.terms = 5)
        #
        # Goodness of fit:
        #  2 terms  3 terms  4 terms  5 terms
        # 8.737806 5.289517 4.745799 4.490378

        summary(rock.ppr)
        # .....  (same as above)
        # .....
        #
        # Projection direction vectors:
        #       term 1      term 2
        # area1  0.34357179  0.37071027
        # peri1 -0.93781471 -0.61923542
        # shape  0.04961846  0.69218595
        #
        # Coefficients of ridge terms:
        #    term 1    term 2
        # 1.6079271 0.5460971

        par(mfrow=c(3,2))# maybe: , pty="s")
        plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)")
        plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)")
        plot(update(rock.ppr, sm.method="gcv", gcvpen=2),
             main = "update(..., sm.method=

