vglm                  package:VGAM                  R Documentation

_F_i_t_t_i_n_g _V_e_c_t_o_r _G_e_n_e_r_a_l_i_z_e_d _L_i_n_e_a_r _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     'vglm' is used to fit vector generalized linear models (VGLMs).
     This is a large class of models that includes generalized linear
     models (GLMs) as special cases.

_U_s_a_g_e:

     vglm(formula, family, data = list(), weights = NULL, subset = NULL, 
          na.action = na.fail, etastart = NULL, mustart = NULL, 
          coefstart = NULL, control = vglm.control(...), offset = NULL, 
          method = "vglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
          contrasts = NULL, constraints = NULL, extra = list(), 
          qr.arg = FALSE, smart = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

     In the following, M is the number of linear predictors.

 formula: a symbolic description of the model to be fit. The RHS of the
          formula is applied to each linear predictor. Different
          variables in each linear predictor can be chosen by
          specifying constraint matrices. 

  family: a function of class '"vglmff"' describing what statistical
          model is to be fitted. These are called ``'VGAM' family
          functions''.

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from
          'environment(formula)', typically the environment from which
          'vglm' is called.

 weights: an optional vector or matrix of (prior) weights  to be used
          in the fitting process. If 'weights' is a matrix, then it
          must be in _matrix-band_ form, whereby the first M  columns
          of the matrix are the diagonals, followed by the
          upper-diagonal band, followed by the band above that, etc. In
          this case, there can be up to M(M+1) columns, with the last
          column corresponding to the (1,M) elements of the weight
          matrices.

  subset: an optional logical vector specifying a subset of
          observations to  be used in the fitting process. 

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          ``factory-fresh'' default is 'na.omit'.

etastart: starting values for the linear predictors. It is a M-column
          matrix. If M=1 then it may be a vector.

 mustart: starting values for the  fitted values. It can be a vector or
          a matrix.  Some family functions do not make use of this
          argument.

coefstart: starting values for the coefficient vector.

 control: a list of parameters for controlling the fitting process. 
          See 'vglm.control' for details.

  offset: a vector or M-column matrix of offset values.  These are _a
          priori_ known and are added to the linear predictors during
          fitting.

  method: the method to be used in fitting the model.  The default (and
          presently only) method 'vglm.fit' uses iteratively reweighted
          least squares (IRLS).

   model: a logical value indicating whether the _model frame_ should
          be assigned in the 'model' slot.

x.arg, y.arg: logical values indicating whether the model matrix and
          response vector/matrix used in the fitting process should be
          assigned in the 'x' and 'y' slots. Note the model matrix is
          the LM model matrix; to get the VGLM model matrix type
          'model.matrix(vglmfit)' where 'vglmfit' is a 'vglm' object. 

contrasts: an optional list. See the 'contrasts.arg' of
          'model.matrix.default'.

constraints: an optional list  of constraint matrices. The components
          of the list must be named with the term it corresponds to
          (and it must match in character format exactly).  Each
          constraint matrix must have M rows, and be of full-column
          rank. By default, constraint matrices are the M by M identity
          matrix unless arguments in the family function itself
          override these values.  If 'constraints' is used it must
          contain _all_ the terms; an incomplete list is not accepted.

   extra: an optional list with any extra information that might be
          needed by the 'VGAM' family function.

  qr.arg: logical value indicating whether the slot 'qr', which returns
          the QR decomposition of the VLM model matrix, is returned on
          the object.

   smart: logical value indicating whether smart prediction
          ('smartpred') will be used.

     ...: further arguments passed into 'vglm.control'. 

_D_e_t_a_i_l_s:

     A vector generalized linear model (VGLM) is loosely defined as a
     statistical model that is a function of M linear predictors. The
     central formula is given by

                          eta_j = beta_j^T x

     where x is a vector of explanatory variables (sometimes just a 1
     for an intercept), and beta_j is a vector of regression
     coefficients to be estimated. Here, j=1,...,M where M is finite.
     Then one can write eta=(eta_1,...,eta_M)^T as a vector of linear
     predictors.

     Most users will find 'vglm' similar in flavour to 'glm'.  The
     function 'vglm.fit' actually does the work.

_V_a_l_u_e:

     An object of class '"vglm"', which has the following slots. Some
     of these may not be assigned to save space, and will be recreated
     if necessary later. 

   extra: the list 'extra' at the end of fitting.

  family: the family function (of class '"vglmff"').

    iter: the number of IRLS iterations used.

predictors: a M-column matrix of linear predictors.

  assign: a named list which matches the columns and the (LM) model
          matrix terms.

    call: the matched call.

coefficients: a named vector of coefficients.

constraints: a named list of constraint matrices used in the fitting. 

contrasts: the contrasts used (if any).

 control: list of control parameter used in the fitting.

criterion: list of convergence criterion evaluated at the final IRLS
          iteration.

df.residual: the residual degrees of freedom.

df.total: the total degrees of freedom.

dispersion: the scaling parameter.

 effects: the effects.

fitted.values: the fitted values, as a matrix. This may be missing or
          consist entirely of 'NA's, e.g., the Cauchy model. 

    misc: a list to hold miscellaneous parameters.

   model: the model frame.

na.action: a list holding information about missing values.

  offset: if non-zero, a M-column matrix of offsets.

    post: a list where post-analysis results may be put.

 preplot: used by 'plotvgam', the plotting parameters may be put here.

prior.weights: initially supplied weights.

      qr: the QR decomposition used in the fitting.

       R: the *R* matrix in the QR decomposition used in the fitting.

    rank: numerical rank of the fitted model.

residuals: the _working_ residuals at the final IRLS iteration.

     rss: residual sum of squares at the final IRLS iteration with the
          adjusted dependent vectors and weight matrices.

smart.prediction: a list of data-dependent parameters (if any) that are
          used by smart prediction.

   terms: the 'terms' object used.

 weights: the weight matrices at the final IRLS iteration. This is in
          matrix-band form.

       x: the model matrix (linear model LM, not VGLM).

 xlevels: the levels of the factors, if any, used in fitting.

       y: the response, in matrix form.


     This slot information is repeated at 'vglm-class'.

_N_o_t_e:

     This function can fit a wide variety of statistical models. Some
     of these are harder to fit than others because of inherent
     numerical difficulties associated with some of them. Successful
     model fitting benefits from cumulative experience. Varying the
     values of arguments in the 'VGAM' family function itself is a good
     first step if difficulties arise, especially if initial values can
     be inputted. A second, more general step, is to vary the values of
     arguments in 'vglm.control'. A third step is to make use of
     arguments such as 'etastart', 'coefstart' and 'mustart'.

     Some 'VGAM' family functions end in '"ff"' to avoid interference
     with other functions, e.g., 'binomialff', 'poissonff',
     'gaussianff', 'gammaff'. This is because 'VGAM' family functions
     are incompatible with 'glm' (and also 'gam' in the 'gam' library
     and 'gam' in the 'mgcv' library).

     The smart prediction ('smartpred') library is packed with the
     'VGAM' library.

     The theory behind the scaling parameter is currently being made
     more rigorous, but it it should give the same value as the scale
     parameter for GLMs.

     In Example 5 below, the 'xij' argument to illustrate covariates
     that are specific to a linear predictor. Here, 'lop'/'rop' are the
     ocular pressures of the left/right eye (artificial data).
     Variables 'leye' and 'reye' might be the presence/absence of a
     particular disease on the LHS/RHS eye respectively.  See 'fill'
     for more details and examples.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector
     generalized linear models. _Statistical Modelling_, *3*, 15-41.

     Yee, T. W. and Wild, C. J. (1996) Vector generalized additive
     models. _Journal of the Royal Statistical Society, Series B,
     Methodological_, *58*, 481-493.

     The 'VGAM' library can be downloaded starting from <URL:
     http://www.stat.auckland.ac.nz/~yee>. Other 'VGAM' resources and
     documentation can be found there.

_S_e_e _A_l_s_o:

     'vglm.control', 'vglm-class', 'vglmff-class', 'smartpred',
     'vglm.fit', 'fill', 'rrvglm', 'vgam'. Methods functions include 
     'coef.vlm', 'predict.vglm', 'summary.vglm', etc.

_E_x_a_m_p_l_e_s:

     # Example 1. Dobson (1990) Page 93: Randomized Controlled Trial :
     counts = c(18,17,15,20,10,20,25,13,12)
     outcome = gl(3,1,9)
     treatment = gl(3,3)
     print(d.AD <- data.frame(treatment, outcome, counts))
     vglm.D93 = vglm(counts ~ outcome + treatment, family=poissonff)
     summary(vglm.D93)

     # Example 2. Multinomial logit model
     data(pneumo)
     pneumo = transform(pneumo, let=log(exposure.time))
     vglm(cbind(normal, mild, severe) ~ let, multinomial, pneumo)

     # Example 3. Proportional odds model
     fit = vglm(cbind(normal,mild,severe) ~ let, cumulative(par=TRUE), pneumo)
     coef(fit, matrix=TRUE) 
     constraints(fit) 
     fit@x # LM model matrix
     model.matrix(fit) # Larger VGLM model matrix

     # Example 4. Bivariate logistic model 
     data(coalminers)
     fit = vglm(cbind(nBnW, nBW, BnW, BW) ~ age, binom2.or, coalminers, trace=TRUE)
     coef(fit, matrix=TRUE)
     fit@y

     # Example 5. The use of the xij argument
     n = 1000
     eyes = data.frame(lop = runif(n), rop = runif(n)) 
     eyes = transform(eyes, 
                      leye = ifelse(runif(n) < logit(-1+2*lop, inverse=TRUE), 1, 0),
                      reye = ifelse(runif(n) < logit(-1+2*rop, inverse=TRUE), 1, 0))
     fit = vglm(cbind(leye,reye) ~ lop + rop + fill(lop),
                binom2.or(exchangeable=TRUE, zero=3),
                xij = op ~ lop + rop + fill(lop), data=eyes)
     coef(fit)
     coef(fit, matrix=TRUE)
     coef(fit, matrix=TRUE, compress=FALSE)

     # Here's one method to handle the xij argument with a term that
     # produces more than one column in the model matrix. 
     POLY3 = function(x, ...) {
         # A cubic 
         poly(c(x,...), 3)[1:length(x),]
     }

     fit = vglm(cbind(leye,reye) ~ POLY3(lop,rop) + POLY3(rop,lop) + fill(POLY3(lop,rop)),
                binom2.or(exchangeable=TRUE, zero=3),  data=eyes,
                xij = POLY3(op) ~ POLY3(lop,rop) + POLY3(rop,lop) + 
                                  fill(POLY3(lop,rop)))
     coef(fit)
     coef(fit, matrix=TRUE)
     coef(fit, matrix=TRUE, compress=FALSE)
     predict(fit)[1:4,]

