uqo                   package:VGAM                   R Documentation

_F_i_t_t_i_n_g _U_n_c_o_n_s_t_r_a_i_n_e_d _Q_u_a_d_r_a_t_i_c _O_r_d_i_n_a_t_i_o_n (_U_Q_O)

_D_e_s_c_r_i_p_t_i_o_n:

     An _unconstrained quadratic ordination_ (UQO) (equivalently,
     noncanonical Gaussian ordination) model is fitted using the 
     _quadratic unconstrained vector generalized linear model_
     (QU-VGLM) framework. In this documentation, M is the number of
     linear predictors or species.

_U_s_a_g_e:

     uqo(formula, family, data = list(), weights = NULL, subset = NULL,
         na.action = na.fail, etastart = NULL, mustart = NULL,
         coefstart = NULL, control = uqo.control(...), offset = NULL,
         method = "uqo.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
         contrasts = NULL, constraints = NULL, extra = NULL,
         qr.arg = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit. Since there is
          no x_2 vector by definition, the RHS of the formula has all
          terms belonging to the x_1 vector.

  family: a function of class '"vglmff"' describing what statistical
          model is to be fitted. Currently two families are supported:
          Poisson and binomial. 

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from
          'environment(formula)', typically the environment from which
          'uqo' is called. 

 weights: an optional vector or matrix of (prior) weights  to be used
          in the fitting process. This argument should not be used.

  subset: an optional logical vector specifying a subset of
          observations to  be used in the fitting process. 

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          ``factory-fresh'' default is 'na.omit'. 

etastart: starting values for the linear predictors. It is a M-column
          matrix. If M=1 then it may be a vector. 

 mustart: starting values for the  fitted values. It can be a vector or
          a matrix.  Some family functions do not make use of this
          argument. 

coefstart: starting values for the coefficient vector. 

 control: a list of parameters for controlling the fitting process. 
          See 'uqo.control' for details. 

  offset: a vector or M-column matrix of offset values. This argument
          should not be used. 

  method: the method to be used in fitting the model. The default (and
          presently only) method 'uqo.fit' uses iteratively reweighted
          least squares (IRLS). 

   model: a logical value indicating whether the _model frame_ should
          be assigned in the 'model' slot. 

x.arg, y.arg: logical values indicating whether the model matrix and
          response matrix used in the fitting process should be
          assigned in the 'x' and 'y' slots. Note the model matrix is
          the LM model matrix.

contrasts: an optional list. See the 'contrasts.arg' of
          'model.matrix.default'. 

constraints: an optional list  of constraint matrices. This argument
          should not be used. 

   extra: an optional list with any extra information that   might be
          needed by the family function.  

  qr.arg: logical value indicating whether the slot 'qr', which returns
          the QR decomposition of the VLM model matrix, is returned on
          the object. This argument should not be set 'TRUE'. 

     ...: further arguments passed into 'uqo.control'. 

_D_e_t_a_i_l_s:

     _Unconstrained quadratic ordination_ models fit symmetric
     bell-shaped response curves/surfaces to response data, but the
     latent variables are largely free parameters and are not
     constrained to be linear combinations of the environmental
     variables.  This poses a difficult optimization problem.  The
     current algorithm is very simple and will often fail (even for
     'Rank=1') but hopefully this will be improved in the future.

     The central formula is given by

        eta = B_1^T x_1 + A nu + sum_{m=1}^M (nu^T D_m nu) e_m

     where x_1 is a vector (usually just a 1 for an intercept), nu is a
     R-vector of latent variables, e_m is a vector of 0s but with a 1
     in the mth position. The eta are a vector of linear/additive
     predictors, e.g., the mth element is eta_m = log(E[Y_m]) for the
     mth species.  The matrices B_1, A, and D_m are estimated from the
     data, i.e., contain the regression coefficients. Also, nu is
     estimated. The tolerance matrices satisfy T_s = -(0.5 D_s^(-1). 
     Many important UQO details are directly related to arguments in
     'uqo.control'; see also 'cqo' and 'qrrvglm.control'.

     Currently, only Poisson and binomial 'VGAM' family functions are
     implemented for this function, and dispersion parameters for these
     are assumed known.  Thus the Poisson is catered for by
     'poissonff', and the binomial by 'binomialff'. Those beginning
     with '"quasi"' have dispersion parameters that are estimated for
     each species, hence will give an error message here.

_V_a_l_u_e:

     An object of class '"uqo"' (this may change to '"quvglm"' in the
     future).

_W_a_r_n_i_n_g:

     Local solutions are not uncommon when fitting UQO models.  To
     increase the chances of obtaining the global solution, set
     'ITolerances=TRUE' or 'EqualTolerances=TRUE' and increase the
     value of the argument 'Bestof' in 'uqo.control'. For
     reproducibility of the results, it pays to set a different random
     number seed before calling 'uqo' (the function 'set.seed' does
     this).

     The function 'uqo' is very sensitive to initial values, and there
     is a lot of room for improvement here.

     UQO is computationally expensive.  It pays to keep the rank to no
     more than 2, and 1 is much preferred over 2. The data needs to
     conform closely to the statistical model.

     Currently there is a bug with the argument 'Crow1positive' in
     'uqo.control'. This argument might be interpreted as controlling
     the sign of the first site score, but currently this is not done.

_N_o_t_e:

     The site scores are centered. When R>1, they are uncorrelated and
     should be unique up to a rotation.

     The argument 'Bestof' in 'uqo.control' controls the number of
     models fitted (each uses different starting values) to the data.
     This argument is important because convergence may be to a _local_
     solution rather than the _global_ solution. Using more starting
     values increases the chances of finding the global solution. Local
     solutions arise because the optimization problem is highly
     nonlinear.

     In the example below, a CQO model is fitted and used for providing
     initial values for a UQO model.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. (2004) A new technique for maximum-likelihood canonical
     Gaussian ordination. _Ecological Monographs_, *74*, 685-701.

     Yee, T. W. (2005) On constrained and unconstrained quadratic
     ordination. _Manuscript in preparation_.

     Yee, T. W. (2006) Constrained additive ordination. _Ecology_,
     *87*, 203-213.

_S_e_e _A_l_s_o:

     'uqo.control', 'cqo', 'qrrvglm.control', 'rcqo', 'poissonff',
     'binomialff', 'Coef.uqo', 'lvplot.uqo', 'persp.uqo', 'trplot.uqo',
     'vcov.uqo', 'set.seed', 'hspider'.

_E_x_a_m_p_l_e_s:

     ## Not run: 
     data(hspider)
     set.seed(123)  # This leads to the global solution
     hspider[,1:6] = scale(hspider[,1:6]) # Standardized environmental vars
     p1 = cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
                    Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
                    Trocterr, Zoraspin) ~
              WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
              ITolerances = TRUE, fam = poissonff, data = hspider, 
              Crow1positive=TRUE, Bestof=3, trace=FALSE)
     if(deviance(p1) > 1589.0) stop("suboptimal fit obtained")

     set.seed(111)
     up1 = uqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
                     Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
                     Trocterr, Zoraspin) ~ 1,
               family = poissonff, data = hspider,
               ITolerances = TRUE,
               Crow1positive = TRUE, lvstart = -lv(p1))
     if(deviance(up1) > 1310.0) stop("suboptimal fit obtained")

     nos = ncol(up1@y) # Number of species
     clr = (1:(nos+1))[-7]  # to omit yellow
     lvplot(up1, las=1, y=TRUE, pch=1:nos, scol=clr, lcol=clr, 
            pcol=clr, llty=1:nos, llwd=2)
     legend(x=2, y=135, dimnames(up1@y)[[2]], col=clr, lty=1:nos,
            lwd=2, merge=FALSE, ncol=1, x.inter=4.0, bty="l", cex=0.9)

     # Compare the site scores between the two models
     plot(lv(p1), lv(up1), xlim=c(-3,4), ylim=c(-3,4), las=1)
     abline(a=0, b=-1, lty=2, col="blue", xpd=FALSE)
     cor(lv(p1, ITol=TRUE), lv(up1))

     # Another comparison between the constrained and unconstrained models
     # The signs are not right so they are similar when reflected about 0 
     par(mfrow=c(2,1))
     persp(up1, main="Red/Blue are the constrained/unconstrained models",
           label=TRUE, col="blue", las=1)
     persp(p1, add=FALSE, col="red")
     1-pchisq(deviance(p1) - deviance(up1), df=52-30)
     ## End(Not run)

