vgam                  package:VGAM                  R Documentation

_F_i_t_t_i_n_g _V_e_c_t_o_r _G_e_n_e_r_a_l_i_z_e_d _A_d_d_i_t_i_v_e _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a vector generalized additive model (VGAM).  This is a large
     class of models that includes generalized additive models (GAMs)
     and vector generalized linear models (VGLMs) as special cases.

_U_s_a_g_e:

     vgam(formula, family, data = list(), weights = NULL, subset = NULL, 
          na.action = na.fail, etastart = NULL, mustart = NULL, 
          coefstart = NULL, control = vgam.control(...), offset = NULL, 
          method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
          contrasts = NULL, constraints = NULL, 
          extra = list(), qr.arg = FALSE, smart = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

     In the following, M is the number of additive predictors.

 formula: a symbolic description of the model to be fit. The RHS of the
          formula is applied to each linear/additive predictor.
          Different variables in each linear/additive predictor can be
          chosen by specifying constraint matrices.

  family: a function of class '"vglmff"'  (see 'vglmff-class')
          describing what statistical model is to be fitted. These are
          called ``'VGAM' family functions''.

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from
          'environment(formula)', typically the environment from which
          'vgam' is called.

 weights: an optional vector or matrix of (prior) weights to be used in
          the fitting process. If 'weights' is a matrix, then it must
          be in _matrix-band_ form, whereby the first M  columns of the
          matrix are the diagonals, followed by the upper-diagonal
          band, followed by the band above that, etc. In this case,
          there can be up to M(M+1) columns, with the last column
          corresponding to the (1,M) elements of the weight matrices.

  subset: an optional logical vector specifying a subset of
          observations to be used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain 'NA's. The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          ``factory-fresh'' default is 'na.omit'.

etastart: starting values for the linear/additive predictors. It is a
          M-column matrix. If M=1 then it may be a vector.

 mustart: starting values for the fitted values. It can be a vector or
          a matrix. Some family functions do not make use of this
          argument.

coefstart: starting values for the coefficient vector.

 control: a list of parameters for controlling the fitting process. See
          'vgam.control' for details.

  offset: a vector or M-column matrix of offset values. These are _a
          priori_ known and are added to the linear/additive predictors
          during fitting.

  method: the method to be used in fitting the model. The default (and
          presently only) method 'vgam.fit' uses iteratively reweighted
          least squares (IRLS).

   model: a logical value indicating whether the _model frame_ should
          be assigned in the 'model' slot.

x.arg, y.arg: logical values indicating whether the model matrix and
          response vector/matrix used in the fitting process should be
          assigned in the 'x' and 'y' slots.  Note the model matrix is
          the LM model matrix; to get the VGAM model matrix type
          'model.matrix(vgamfit)' where 'vgamfit' is a 'vgam' object.

contrasts: an optional list. See the 'contrasts.arg' of
          'model.matrix.default'.

constraints: an optional list  of constraint matrices.  The components
          of the list must be named with the term it corresponds to
          (and it must match in character format exactly).  Each
          constraint matrix must have M rows, and be of full-column
          rank. By default, constraint matrices are the M by M identity
          matrix unless arguments in the family function itself
          override these values.  If 'constraints' is used it must
          contain _all_ the terms; an incomplete list is not accepted.

   extra: an optional list with any extra information that might be
          needed by the 'VGAM' family function.

  qr.arg: logical value indicating whether the slot 'qr', which returns
          the QR decomposition of the VLM model matrix, is returned on
          the object.

   smart: logical value indicating whether smart prediction
          ('smartpred') will be used.

     ...: further arguments passed into 'vgam.control'.

_D_e_t_a_i_l_s:

     A vector generalized additive model (VGAM) is loosely defined as a
     statistical model that is a function of M additive predictors. The
     central formula is given by

                  eta_j = sum_{k=1}^p f_{(j)k}(x_k)

     where x_k is the kth explanatory variable (almost always x_1=1 for
     the intercept term), and f_{(j)k} are smooth functions of x_k that
     are estimated by smoothers. The first term in the summation is
     just the intercept. Currently only one type of smoother is
     implemented and this is called a _vector (cubic smoothing spline)
     smoother_. Here, j=1,...,M where M is finite. If all the functions
     are constrained to be linear then the resulting model is a vector
     generalized linear model (VGLM). VGLMs are best fitted with
     'vglm'.

     Vector (cubic smoothing spline) smoothers are represented by 's()'
     (see 's'). Local regression via 'lo()' is _not_ supported. The
     results of 'vgam' will differ from the S-PLUS and R 'gam' function
     (in the 'gam' R package) because 'vgam' uses a different knot
     selection algorithm. In general, fewer knots are chosen because
     the computation becomes expensive when the number of additive
     predictors M is large.

     The underlying algorithm of VGAMs is iteratively reweighted least
     squares (IRLS) and modified vector backfitting using vector
     splines. B-splines are used as the basis functions for the vector
     (smoothing) splines.  'vgam.fit' is the function that actually
     does the work. The smoothing code is based on F. O'Sullivan's BART
     code. 

     A closely related methodology based on VGAMs called _constrained
     additive ordination_ (CAO) first forms a linear combination of the
     explanatory variables  (called _latent variables_) and then fits a
     GAM to these. This is implemented in the function 'cao' for a very
     limited choice of family functions.

_V_a_l_u_e:

     An object of class '"vgam"' (see 'vgam-class' for further
     information).

_N_o_t_e:

     This function can fit a wide variety of statistical models. Some
     of these are harder to fit than others because of inherent
     numerical difficulties associated with some of them. Successful
     model fitting benefits from cumulative experience. Varying the
     values of arguments in the 'VGAM' family function itself is a good
     first step if difficulties arise, especially if initial values can
     be inputted. A second, more general step, is to vary the values of
     arguments in 'vgam.control'. A third step is to make use of
     arguments such as 'etastart', 'coefstart' and 'mustart'.

     Some 'VGAM' family functions end in '"ff"' to avoid interference
     with other functions, e.g., 'binomialff', 'poissonff',
     'gaussianff', 'gammaff'. This is because 'VGAM' family functions
     are incompatible with 'glm' (and also 'gam' in the 'gam' library
     and 'gam' in the 'mgcv' library).

     The smart prediction ('smartpred') library is packed with the
     'VGAM' library.

     The theory behind the scaling parameter is currently being made
     more rigorous, but it it should give the same value as the scale
     parameter for GLMs.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. and Wild, C. J. (1996) Vector generalized additive
     models. _Journal of the Royal Statistical Society, Series B,
     Methodological_, *58*, 481-493.

     <URL: http://www.stat.auckland.ac.nz/~yee>

_S_e_e _A_l_s_o:

     'vgam.control', 'vgam-class', 'vglmff-class', 'plotvgam', 'vglm',
     's', 'vsmooth.spline', 'cao'.

_E_x_a_m_p_l_e_s:

     # Nonparametric proportional odds model 
     data(pneumo)
     pneumo = transform(pneumo, let=log(exposure.time))
     vgam(cbind(normal,mild,severe) ~ s(let), cumulative(par=TRUE), pneumo)

     # Nonparametric logistic regression 
     data(hunua) 
     fit = vgam(agaaus ~ s(altitude, df=2), binomialff, hunua)
     ## Not run: 
     plot(fit, se=TRUE)
     ## End(Not run)

     # Fit two species simultaneously 
     fit2 = vgam(cbind(agaaus, kniexc) ~ s(altitude, df=c(2,3)),
                 binomialff(mv=TRUE), hunua)
     coef(fit2, mat=TRUE)   # Not really interpretable 
     ## Not run: 
     plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2)
     attach(hunua)
     o = order(altitude)
     matplot(altitude[o], fitted(fit2)[o,], type="l", lwd=2, las=1,
         xlab="Altitude (m)", ylab="Probability of presence",
         main="Two plant species' response curves", ylim=c(0,.8))
     rug(altitude)
     detach(hunua)
     ## End(Not run)

