validate               package:Design               R Documentation

_R_e_s_a_m_p_l_i_n_g _V_a_l_i_d_a_t_i_o_n _o_f _a _F_i_t_t_e_d _M_o_d_e_l'_s _I_n_d_e_x_e_s _o_f _F_i_t

_D_e_s_c_r_i_p_t_i_o_n:

     The 'validate' function when used on an object created by one of
     the 'Design' series does resampling validation of a  regression
     model, with or without backward step-down variable deletion.

_U_s_a_g_e:

     # fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
     validate(fit, method="boot", B=40,
              bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, 
              pr=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

     fit: a fit derived by e.g. 'lrm', 'cph', 'psm', 'ols'. The options
          'x=TRUE' and 'y=TRUE' must have been specified. 

  method: may be '"crossvalidation"', '"boot"' (the default), '".632"',
          or '"randomization"'. See 'predab.resample' for details.  Can
          abbreviate, e.g. '"cross", "b", ".6"'. 

       B: number of repetitions.  For 'method="crossvalidation"', is
          the number of groups of omitted observations. 

      bw: 'TRUE' to do fast step-down using the 'fastbw' function, for
          both the overall model and for each repetition. 'fastbw'
          keeps parameters together that represent the same factor. 

    rule: Applies if 'bw=TRUE'.  '"aic"' to use Akaike's information
          criterion as a stopping rule (i.e., a factor is deleted if
          the chi-square falls below twice its degrees of freedom), or
          '"p"' to use P-values. 

    type: '"residual"' or '"individual"' - stopping rule is for
          individual factors or for the residual chi-square for all
          variables deleted 

     sls: significance level for a factor to be kept in a model, or for
          judging the residual chi-square. 

    aics: cutoff on AIC when 'rule="aic"'. 

      pr: 'TRUE' to print results of each repetition 

     ...: parameters for each specific validate function, and
          parameters to pass to 'predab.resample' (note especially the
          'group', 'cluster', amd 'subset' parameters).

          For 'psm', you can pass the 'maxiter' parameter here (passed
          to  'survreg.control', default is 15 iterations) as well as a
          'tol' parameter  for judging matrix singularity in 'solvet'
          (default is 1e-12) and a 'rel.tolerance' parameter that is
          passed to 'survreg.control' (default is 1e-5). 

_D_e_t_a_i_l_s:

     It provides bias-corrected indexes that are specific to each type
     of model. For 'validate.cph' and 'validate.psm', see
     'validate.lrm', which is similar. 
      For 'validate.cph' and 'validate.psm', there is an extra argument
     'dxy', which if 'TRUE' causes the 'rcorr.cens' function to be
     invoked to compute the Somers' Dxy rank correlation to be computed
     at each resample (this takes a bit longer than the likelihood
     based statistics). The values corresponting to the row Dxy are
     equal to 2 * (C - 0.5) where C is the C-index or concordance
     probability. 

     For 'validate.cph' with 'dxy=TRUE', you must specify an argument
     'u' if the model is stratified, since survival curves can then
     cross and X beta is not 1-1 with predicted survival. 
      There is also 'validate' method for 'tree', which only does
     cross-validation and which has a different list of arguments.

_V_a_l_u_e:

     a matrix with rows corresponding to the statistical indexes and
     columns for columns for the original index, resample estimates, 
     indexes applied to the whole or omitted sample using the model
     derived from the resample, average optimism, corrected index, and
     number of successful re-samples.

_S_i_d_e _E_f_f_e_c_t_s:

     prints a summary, and optionally statistics for each re-fit

_A_u_t_h_o_r(_s):

     Frank Harrell
      Department of Biostatistics, Vanderbilt University
      f.harrell@vanderbilt.edu

_S_e_e _A_l_s_o:

     'validate.ols', 'validate.cph', 'validate.lrm', 'validate.tree',
     'predab.resample', 'fastbw', 'Design', 'Design.trans', 'calibrate'

_E_x_a_m_p_l_e_s:

     # See examples for validate.cph, validate.lrm, validate.ols
     # Example of validating a parametric survival model:

     n <- 1000
     set.seed(731)
     age <- 50 + 12*rnorm(n)
     label(age) <- "Age"
     sex <- factor(sample(c('Male','Female'), n, TRUE))
     cens <- 15*runif(n)
     h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
     dt <- -log(runif(n))/h
     e <- ifelse(dt <= cens,1,0)
     dt <- pmin(dt, cens)
     units(dt) <- "Year"
     S <- Surv(dt,e)

     f <- psm(S ~ age*sex, x=TRUE, y=TRUE)  # Weibull model
     # Validate full model fit
     validate(f, B=10)                # usually B=150

     # Validate stepwise model with typical (not so good) stopping rule
     # bw=TRUE does not preserve hierarchy of terms at present
     validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")

