ltsReg              package:robustbase              R Documentation

_L_e_a_s_t _T_r_i_m_m_e_d _S_q_u_a_r_e_s _R_o_b_u_s_t (_H_i_g_h _B_r_e_a_k_d_o_w_n) _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Carries out least trimmed squares (LTS) robust (high breakdown
     point) regression.

_U_s_a_g_e:

     ltsReg(x, ...)

     ## S3 method for class 'formula':
     ltsReg(formula, data, subset, weights, na.action,
            model = TRUE, x.ret = FALSE, y.ret = FALSE,
            contrasts = NULL, offset, ...)

     ## Default S3 method:
     ltsReg(x, y, intercept = TRUE, alpha = 1/2, nsamp = 500,
            adjust = FALSE, mcd = TRUE, qr.out = FALSE, yname = NULL,
            seed = NULL, use.correction=TRUE, control, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a 'formula' of the form 'y ~ x1 + x2 + ...'.

    data: data frame from which variables specified in 'formula' are to
          be taken.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

 weights: an optional vector of weights to be used in the fitting
          process. *NOT USED YET*. 

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset.  The
          "factory-fresh" default is 'na.omit'.  Another possible value
          is 'NULL', no action.  Value 'na.exclude' can be useful.

model, x.ret, y.ret: 'logical's indicating if the model frame, the
          model matrix and the response are to be returned,
          respectively.

contrasts: an optional list.  See the 'contrasts.arg' of
          'model.matrix.default'.

  offset: this can be used to specify an _a priori_ known component to
          be included in the linear predictor during fitting.  An
          'offset' term can be included in the formula instead or as
          well, and if both are specified their sum is used.

       x: a matrix or data frame containing the explanatory variables.

       y: the response: a vector of length the number of rows of 'x'.

intercept: if true, a model with constant term will be estimated;
          otherwise no constant term will be included.  Default is
          'intercept = TRUE'  

   alpha: the percentage (roughly) of squared residuals whose sum will
          be minimized, by default 0.5.  In general, 'alpha' must
          between 0.5 and 1.

   nsamp: number of subsets used for initial estimates or '"best"' or
          '"exact"'.  Default is 'nsamp = 500'.  For 'nsamp="best"'
          exhaustive enumeration is done, as long as the number of
          trials does not exceed 5000.  For '"exact"', exhaustive
          enumeration will be attempted however many samples are
          needed. In this case a warning message will be displayed
          saying that the computation can take a very long time. 

  adjust: whether to perform intercept adjustment at each step. Since
          this can be time consuming, the default is 'adjust = FALSE'.

     mcd: whether to compute robust distances using Fast-MCD.

  qr.out: whether to return the QR decomposition (see 'qr'); defaults
          to false.

   yname: the name of the dependent variable.  Default is 'yname =
          NULL'

    seed: initial seed for random generator, see 'rrcov.control'.

use.correction: whether to use finite sample correction factors.
          Default is 'use.correction=TRUE'

 control: a list with estimation options - same as these provided in
          the function specification.  If the control object is
          supplied, the parameters from it will be used.  If parameters
          are passed also in the invocation statement, they will
          override the corresponding elements of the control object.

     ...: arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     The LTS regression method minimizes the sum of the h smallest
     squared residuals, where h > n/2, i.e. at least half the number of
     observations must be used.  The default value of h (when
     'alpha=1/2') is roughly n / 2, more precisely, '(n+p+1) %/% 2'
     where n is the total number of observations, but by setting
     'alpha', the user may choose higher values up to n, where h =
     h(alpha,n,p) = 'h.alpha.n(alpha,n,p)'.  The LTS estimate of the
     error scale is given by the minimum of the objective function
     multiplied by a consistency factor and a finite sample correction
     factor - see Pison et al. (2002) for details.  The rescaling
     factors for the raw and final estimates are returned also in the
     vectors 'raw.cnp2' and 'cnp2' of length 2 respectively.  The
     finite sample corrections can be suppressed by setting
     'use.correction=FALSE'.  The computations are performed using the
     Fast LTS algorithm proposed by Rousseeuw and Van Driessen (1999).

     As always, the formula interface has an implied intercept term
     which can be removed either by 'y ~ x - 1' or 'y ~ 0 + x'.  See
     'formula' for more details.

_V_a_l_u_e:

     The function 'ltsReg' returns an object of class '"lts"'. The
     'summary' method function is used to obtain (and print) a summary
     table of the results, and 'plot()' can be used to plot them, see
     the the specific help pages.

     The generic accessor functions 'coefficients', 'fitted.values' and
     'residuals' extract various useful features of the value returned
     by 'ltsReg'.

     An object of class 'lts' is a 'list' containing at least the
     following components: 

    crit: the value of the objective function of the LTS regression
          method, i.e., the sum of the h smallest squared raw
          residuals. 

coefficients: vector of coefficient estimates (including the intercept
          by default when 'intercept=TRUE'), obtained after
          reweighting. 

    best: the best subset found and used for computing the raw
          estimates, with 'length(best) == quan =
          h.alpha.n(alpha,n,p)'. 

fitted.values: vector like 'y' containing the fitted values of the
          response after reweighting.

residuals: vector like 'y' containing the residuals from the weighted
          least squares regression.

   scale: scale estimate of the reweighted residuals.  

   alpha: same as the input parameter 'alpha'.

    quan: the number h of observations which have determined the least
          trimmed squares estimator.

intercept: same as the input parameter 'intercept'.

    cnp2: a vector of length two containing the consistency correction
          factor and the finite sample correction factor of the final
          estimate of the error scale.

raw.coefficients: vector of raw coefficient estimates (including the
          intercept, when 'intercept=TRUE').

raw.scale: scale estimate of the raw residuals.

raw.resid: vector like 'y' containing the raw residuals from the
          regression.

raw.cnp2: a vector of length two containing the consistency correction
          factor and the finite sample correction factor of the raw
          estimate of the error scale.

  lts.wt: vector like y containing weights that can be used in a
          weighted least squares.  These weights are 1 for points with
          reasonably small raw residuals, and 0 for points with large
          raw residuals. 

  method: character string naming the method (Least Trimmed Squares).

       X: the input data as a matrix (including intercept column if
          applicable).

       Y: the response variable as a vector.

_N_o_t_e:

     We strongly recommend using 'lmrob()' instead of 'ltsReg' (_See
     also_ below)!

_A_u_t_h_o_r(_s):

     Valentin Todorov valentin.todorov@chello.at, based on work written
     for S-plus by Peter Rousseeuw and Katrien van Driessen from
     University of Antwerp.

_R_e_f_e_r_e_n_c_e_s:

     Peter J. Rousseeuw (1984), Least Median of Squares Regression.
     _Journal of the American Statistical Association_ *79*, 871-881.

     P. J. Rousseeuw and A. M. Leroy (1987) _Robust Regression and
     Outlier Detection._ Wiley.

     P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for
     the minimum covariance determinant estimator. _Technometrics_
     *41*, 212-223.

     Pison, G., Van Aelst, S., and Willems, G. (2002) Small Sample
     Corrections for LTS and MCD. _Metrika_ *55*, 111-123.

_S_e_e _A_l_s_o:

     'lmrob.S()' provides a fast S estimator with similar breakdown
     point as 'ltsReg()' but better efficiency.
      For data analysis, rather use 'lmrob' which is based on
     'lmrob.S'.

     'covMcd'; 'summary.lts' for summaries.

     The generic functions 'coef', 'residuals', 'fitted'.

_E_x_a_m_p_l_e_s:

     data(heart)
     ## Default method works with 'x'-matrix and y-var:
     heart.x <- data.matrix(heart[, 1:2]) # the X-variables
     heart.y <- heart[,"clength"]
     ltsReg(heart.x, heart.y)

     data(stackloss)
     ltsReg(stack.loss ~ ., data = stackloss)

