cao                   package:VGAM                   R Documentation

_F_i_t_t_i_n_g _C_o_n_s_t_r_a_i_n_e_d _A_d_d_i_t_i_v_e _O_r_d_i_n_a_t_i_o_n (_C_A_O)

_D_e_s_c_r_i_p_t_i_o_n:

     A constrained additive ordination (CAO) model is fitted using the
     _reduced-rank vector generalized additive model_ (RR-VGAM)
     framework.

_U_s_a_g_e:

     cao(formula, family, data = list(),
         weights = NULL, subset = NULL, na.action = na.fail,
         etastart = NULL, mustart = NULL, coefstart = NULL,
         control = cao.control(...), offset = NULL,
         method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
         contrasts = NULL, constraints = NULL,
         extra = NULL, qr.arg = FALSE, smart = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

     The arguments of 'cao' are a mixture of those from 'vgam' and
     'cqo', but with some extras in 'cao.control'. Currently, not all
     of the following arguments work properly.

 formula: a symbolic description of the model to be fit.  The RHS of
          the formula is used to construct the latent variables, upon
          which the smooths are applied.  All the variables in the
          formula are used for the construction of latent variables
          except for those specified by the argument 'Norrr', which is
          itself a formula.  The LHS of the formula contains the
          response variables, which should be a matrix with each column
          being a response (species).

  family: a function of class '"vglmff"' describing what statistical
          model is to be fitted. See 'cqo' for a list of those
          presently implemented,

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from
          'environment(formula)', typically the environment from which
          'cao' is called.

 weights: an optional vector or matrix of (prior) weights to be used in
          the fitting process.  For 'cao', this argument currently
          should not be used.

  subset: an optional logical vector specifying a subset of
          observations to be used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          ``factory-fresh'' default is 'na.omit'.

etastart: starting values for the linear predictors.  It is a M-column
          matrix. If M=1 then it may be a vector.  For 'cao', this
          argument currently should not be used.

 mustart: starting values for the fitted values. It can be a vector or
          a matrix.  Some family functions do not make use of this
          argument. For 'cao', this argument currently should not be
          used.

coefstart: starting values for the coefficient vector.  For 'cao', this
          argument currently should not be used.

 control: a list of parameters for controlling the fitting process. See
          'cao.control' for details.

  offset: a vector or M-column matrix of offset values.  These are _a
          priori_ known and are added to the linear predictors during
          fitting.  For 'cao', this argument currently should not be
          used.

  method: the method to be used in fitting the model.  The default (and
          presently only) method 'cao.fit' uses iteratively reweighted
          least squares (IRLS) within FORTRAN code called from 'optim'.

   model: a logical value indicating whether the _model frame_ should
          be assigned in the 'model' slot.

x.arg, y.arg: logical values indicating whether the model matrix and
          response vector/matrix used in the fitting process should be
          assigned in the 'x' and 'y' slots.  Note the model matrix is
          the linear model (LM) matrix.

contrasts: an optional list. See the 'contrasts.arg' of
          'model.matrix.default'.

constraints: an optional list  of constraint matrices.  For 'cao', this
          argument currently should not be used.  The components of the
          list must be named with the term it corresponds to (and it
          must match in character format).  Each constraint matrix must
          have M rows, and be of full-column rank. By default,
          constraint matrices are the M by M identity matrix unless
          arguments in the family function itself override these
          values.  If 'constraints' is used it must contain _all_ the
          terms; an incomplete list is not accepted.

   extra: an optional list with any extra information that might be
          needed by the family function.  For 'cao', this argument
          currently should not be used.

  qr.arg: For 'cao', this argument currently should not be used.

   smart: logical value indicating whether smart prediction
          ('smartpred') will be used.

     ...: further arguments passed into 'cao.control'.

_D_e_t_a_i_l_s:

     CAO can be loosely be thought of as the result of fitting
     generalized additive models (GAMs) to several responses (e.g.,
     species) against a very small number of latent variables.  Each
     latent variable is a linear combination of the explanatory
     variables; the coefficients *C* (called C below) are called
     _constrained coefficients_ or _canonical coefficients_, and are
     interpreted as weights or loadings. The *C* are estimated by
     maximum likelihood estimation.  It is often a good idea to apply
     'scale' to each explanatory variable first.

     For each response (e.g., species), each latent variable is
     smoothed by a cubic smoothing spline, thus CAO is data-driven. If
     each smooth were a quadratic then CAO would simplify to
     _constrained quadratic ordination_ (CQO; formerly called
     _canonical Gaussian ordination_ or CGO). If each smooth were
     linear then CAO would simplify to _constrained linear ordination_
     (CLO). CLO can theoretically be fitted with 'cao' by specifying
     'df1.nl=0', however it is more efficient to use 'rrvglm'.

     Currently, only 'Rank=1' is implemented, and only 'Norrr = ~1'
     models are handled.

     With binomial data, the default formula is

          logit(P[Y_s=1]) =  eta_s = f_s(nu),    s=1,2,...,S

     where x_2 is a vector of environmental variables, and nu=C^T x_2
     is a R-vector of latent variables. The eta_s is an additive
     predictor for species s, and it models the probabilities of
     presence as an additive model on the logit scale.  The matrix C is
     estimated from the data, as well as the smooth functions f_s.  The
     argument 'Norrr = ~ 1' specifies that the vector x_1, defined for
     RR-VGLMs and QRR-VGLMs, is simply a 1 for an intercept. Here, the
     intercept in the model is absorbed into the functions. A 'cloglog'
     link may be preferable over a 'logit' link.

     With Poisson count data, the formula is

                    log(E[Y_s]) =  eta_s = f_s(nu)

     which models the mean response as an additive models on the log
     scale.

     The fitted latent variables (site scores) are scaled to have unit
     variance.  The concept of a tolerance is undefined for CAO models,
     but the optima and maxima are defined. The generic functions 'Max'
     and 'Opt' should work for CAO objects, but note that if the
     maximum occurs at the boundary then 'Max' will return a 'NA'. 
     Inference for CAO models is currently undeveloped.

_V_a_l_u_e:

     An object of class '"cao"' (this may change to '"rrvgam"' in the
     future). Several generic functions can be applied to the object,
     e.g., 'Coef', 'ccoef', 'lvplot', 'summary'.

_W_a_r_n_i_n_g:

     CAO models present a difficult optimization problem, partly
     because the log-likelihood function contains many local solutions.
      To obtain the (global) solution the user is advised to try many
     initial values. This can be done by setting 'Bestof' some
     appropriate value (see 'cao.control'). Trying many initial values
     becomes progressively more important as the nonlinear degrees of
     freedom of the smooths increase. Use 'set.seed' to make your
     results reproducible.

_N_o_t_e:

     CAO models are computationally expensive, therefore setting 'trace
     = TRUE' is a good idea, as well as running it on a simple random
     sample of the data set instead.

     Sometimes the IRLS algorithm does not converge within the FORTRAN
     code. This results in warnings being issued.  In particular, if an
     error code of 3 is issued, then this indicates the IRLS algorithm
     has not converged. One possible remedy is to increase or decrease
     the nonlinear degrees of freedom so that the curves become more or
     less flexible, respectively.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. (2006) Constrained additive ordination. _Ecology_,
     *87*, 203-213.

_S_e_e _A_l_s_o:

     'cao.control', 'Coef.cao', 'cqo', 'lv', 'Opt', 'Max', 'lv',
     'persp.cao', 'poissonff', 'binomialff', 'negbinomial', 'gamma2',
     'gaussianff', 'set.seed', 'gam'.

_E_x_a_m_p_l_e_s:

     data(hspider)
     hspider[,1:6] = scale(hspider[,1:6]) # Standardized environmental vars
     set.seed(149)
     ap1 = cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
               WaterCon + BareSand + FallTwig +
               CoveMoss + CoveHerb + ReflLux,
               family = poissonff, data = hspider, Rank = 1,
               df1.nl = c(Pardpull=2.7, 2.5),
               Bestof = 7, Crow1positive = FALSE)
     sort(ap1@misc$deviance.Bestof) # A history of all the iterations

     Coef(ap1)
     ccoef(ap1)

     ## Not run: 
     par(mfrow=c(2,2))
     plot(ap1)   # All the curves are unimodal; some quite symmetric

     par(mfrow=c(1,1), las=1)
     index = 1:ncol(ap1@y)
     lvplot(ap1, lcol=index, pcol=index, y=TRUE)

     trplot(ap1, label=TRUE, col=index)
     abline(a=0, b=1, lty=2)

     trplot(ap1, label=TRUE, col="blue", log="xy", whichSp=c(1,3))
     abline(a=0, b=1, lty=2)

     persp(ap1, col=index, lwd=2, label=TRUE)
     abline(v=Opt(ap1), lty=2, col=index)
     abline(h=Max(ap1), lty=2, col=index)
     ## End(Not run)

