rcqo                  package:VGAM                  R Documentation

_C_o_n_s_t_r_a_i_n_e_d _Q_u_a_d_r_a_t_i_c _O_r_d_i_n_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Random generation for constrained quadratic ordination (CQO).

_U_s_a_g_e:

     rcqo(n, p, S, Rank = 1,
          family = c("poisson", "negbinomial", "binomial-poisson",
                     "Binomial-negbinomial", "ordinal-poisson",
                     "Ordinal-negbinomial", "gamma2"),
          EqualMaxima = FALSE, EqualTolerances = TRUE, ESOptima = FALSE,
          loabundance = if (EqualMaxima) hiabundance else 10,
          hiabundance = 100, sdlv = c(1.5/2^(0:3))[1:Rank],
          sdOptima = ifelse(ESOptima, 1.5/Rank, 1) * sdlv,
          sdTolerances = 0.25, Kvector = 1, Shape = 1,
          sqrt = FALSE, Log = FALSE, rhox = 0.5, breaks = 4,
          seed = NULL, Crow1positive=TRUE)

_A_r_g_u_m_e_n_t_s:

       n: Number of sites. It is denoted by n below.

       p: Number of environmental variables, including an intercept
          term.  It is denoted by p below. Must be no less than 1+R in
          value.

       S: Number of species. It is denoted by S below.

    Rank: The rank or the number of latent variables or true dimension
          of the data on the reduced space. This must be either 1, 2, 3
          or 4. It is denoted by R.

  family: What type of species data is to be returned. The first choice
          is the default. If binomial then a 0 means absence and 1
          means presence. If ordinal then the 'breaks' argument is
          passed into the 'breaks' argument of 'cut'. Note that either
          the Poisson or negative binomial distributions are used to
          generate binomial and ordinal data, and that an upper-case
          choice is used for the negative binomial distribution (this
          makes it easier for the user). If '"gamma2"' then this is the
          2-parameter gamma distribution.

EqualMaxima: Logical. Does each species have the same maxima? See
          arguments 'loabundance' and 'hiabundance'.

EqualTolerances: Logical. Does each species have the same tolerance? If
          'TRUE' then the common value is 1 along every latent
          variable, i.e., all species' tolerance matrices are the
          order-R identity matrix.

ESOptima: Logical. Do the species have equally spaced optima? If 'TRUE'
          then the quantity S^(1/R) must be an integer with value 2 or
          more. That is, there has to be an appropriate number of
          species in total. This is so that a grid of optimum values is
          possible in R-dimensional latent variable space in order to
          place the species' optima. Also see the argument
          'sdTolerances'.

loabundance, hiabundance: Numeric. These are recycled to a vector of
          length S. The species have a maximum between 'loabundance'
          and 'hiabundance'. That is, at their optimal environment, the
          mean abundance of each species is between the two
          componentwise values. If 'EqualMaxima' is 'TRUE' then
          'loabundance' and 'hiabundance' must have the same values. If
          'EqualMaxima' is 'FALSE' then the logarithm of the maxima are
          uniformly distributed between 'log(loabundance)' and
          'log(hiabundance)'.

    sdlv: Numeric, of length R (recycled if necessary). Site scores
          along each latent variable have these standard deviation
          values. This must be a decreasing sequence of values because
          the first ordination axis contains the greatest spread of the
          species' site scores, followed by the second axis, followed
          by the third axis, etc.

sdOptima: Numeric, of length R (recycled if necessary). If
          'ESOptima=FALSE' then, for the rth latent variable axis, the
          optima of the species are generated from a normal
          distribution centered about 0. If 'ESOptima=TRUE' then the S
          optima are equally spaced about 0 along every latent variable
          axis. Regardless of the value of 'ESOptima', the optima are
          then scaled to give standard deviation 'sdOptima[r]'.

sdTolerances: Logical. If 'EqualTolerances=FALSE' then, for the rth
          latent variable, the species' tolerances are chosen from a
          normal distribution with mean 1 and standard deviation
          'sdTolerances[r]'. However, the first species 'y1' has its
          tolerance matrix set equal to the order-R identity matrix.
          All tolerance matrices for all species are diagonal in this
          function. This argument is ignored if 'EqualTolerances' is
          'TRUE', otherwise it is recycled to length R if necessary.

 Kvector: A vector of positive k values (recycled to length S if
          necessary) for the negative binomial distribution (see
          'negbinomial' for details). Note that a natural default value
          does not exist, however the default value here is probably a
          realistic one, and that for large values of mu one has Var(Y)
          = mu^2 / k approximately.

   Shape: A vector of positive lambda values (recycled to length S if
          necessary) for the 2-parameter gamma distribution (see
          'gamma2' for details).  Note that a natural default value
          does not exist, however the default value here is probably a
          realistic one, and that Var(Y) = mu^2 / lambda.

    sqrt: Logical. Take the square-root of the negative binomial
          counts? Assigning 'sqrt=TRUE' when 'family="negbinomial"'
          means that the resulting species data can be considered very
          crudely to be approximately Poisson distributed. They will
          not integers in general but much easier (less numerical
          problems) to estimate using something like 'cqo(...,
          family="poissonff")'.

     Log: Logical. Take the logarithm of the gamma random variates?
          Assigning 'Log=TRUE' when 'family="gamma2"' means that the
          resulting species data can be considered very crudely to be
          approximately Gaussian distributed about its (quadratic)
          mean. The result is that it is much easier (less numerical
          problems) to estimate using something like 'cqo(...,
          family="gaussianff")'.

    rhox: Numeric, less than 1 in absolute value. The correlation
          between the environmental variables. The correlation matrix
          is a matrix of 1's along the diagonal and 'rhox' in the
          off-diagonals. Note that each environmental variable is
          normally distributed with mean 0. The standard deviation of
          each environmental variable is chosen so that the site scores
          have the determined standard deviation, as given by argument
          'sdlv'.

  breaks: If 'family' is assigned an ordinal value then this argument
          is used to define the cutpoints. It is fed into the 'breaks'
          argument of 'cut'.

    seed: Optionally, a single value, interpreted as an integer. If
          given, it is passed into 'set.seed'. This argument can be
          used to obtain reproducible results.

Crow1positive: See 'qrrvglm.control' for details.

_D_e_t_a_i_l_s:

     This function generates data coming from a constrained quadratic
     ordination (CQO) model. In particular, data coming from a _species
     packing model_ can be generated with this function. The species
     packing model states that species have equal tolerances, equal
     maxima, and optima which are uniformly distributed over the latent
     variable space. This can be achieved by assigning the arguments
     'ESOptima = TRUE', 'EqualMaxima = TRUE', 'EqualTolerances = TRUE'.

     At present, the Poisson and negative binomial abundances are
     generated first using 'loabundance' and 'hiabundance', and if
     'family' is binomial or ordinal then it is converted into these
     forms.

     In CQO theory the n * p matrix X is partitioned into two parts X_1
     and X_2. The matrix X_2 contains the `real' environmental
     variables whereas the variables in X_1 are just for adjustment
     purposes; they contain the intercept terms and other variables
     that one wants to adjust for when (primarily) looking at the
     variables in X_2. This function has X_1 only being a matrix of
     ones, i.e., containing an intercept only.

_V_a_l_u_e:

     A n * (p-1+S) data frame with components and attributes. In the
     following the attributes are labelled with double quotes. 

x2, x3, x4, ..., xp: The environmental variables. This makes up the n *
          (p-1) X_2 matrix. Note that 'x1' is not present; it is
          effectively a vector of ones since it corresponds to an
          intercept term when 'cqo' is applied to the data.

y1, y2, x3, ..., yS: The species data. This makes up the n * S matrix
          Y. This will be of the form described by the argument
          'family'.

"ccoefficients": The (p-1) * R matrix of constrained coefficients (or
          canonical coefficients). These are also known as weights or
          loadings.

"formula": The formula involving the species and environmental variable
          names. This can be used directly in the 'formula' argument of
          'cqo'.

"logmaxima": The S-vector of species' maxima, on a log scale. These are
          uniformly distributed between 'log(loabundance)' and
          'log(hiabundance)'.

    "lv": The n * R matrix of site scores. Each successive column
          (latent variable) has sample standard deviation equal to
          successive values of 'sdlv'.

"optima": The S * R matrix of species' optima.

"tolerances": The S * R matrix of species' tolerances. These are the
          square root of the diagonal elements of the tolerance
          matrices (recall that all tolerance matrices are restricted
          to being diagonal in this function).

     Other attributes are '"break"', '"family"', '"Rank"',
     '"loabundance"', '"hiabundance"', '"EqualTolerances"',
     '"EqualMaxima"', '"seed"', as inputted.

_N_o_t_e:

     This function is under development and is not finished yet. There
     may be a few bugs.

     Yet to do: add an argument that allows absences to be equal to the
     first level if ordinal data is requested.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. (2004) A new technique for maximum-likelihood canonical
     Gaussian ordination. _Ecological Monographs_, *74*, 685-701.

     Yee, T. W. (2006) Constrained additive ordination. _Ecology_,
     *87*, 203-213.

     ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of
     gradient analysis. _Advances in Ecological Research_, *18*,
     271-317.

_S_e_e _A_l_s_o:

     'cqo', 'qrrvglm.control', 'cut', 'binomialff', 'poissonff',
     'negbinomial', 'gamma2', 'gaussianff'.

_E_x_a_m_p_l_e_s:

     # Example 1: Species packing model:
     n = 100; p = 5; S = 5
     mydata = rcqo(n, p, S, ESOpt=TRUE, EqualMax=TRUE)
     names(mydata)
     myform = attr(mydata, "formula")
     fit = cqo(myform, fam=poissonff, ITol=TRUE, data=mydata,
               Bestof=3)  # Fit a CQO model to the data
     ## Not run: 
     matplot(attr(mydata, "lv"), mydata[,-(1:(p-1))], col=1:S)
     persp(fit, col=1:S, add=TRUE)
     lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S)  # The same plot as above
     ## End(Not run)

     # Compare the fitted model with the 'truth'
     ccoef(fit)  # The fitted model
     attr(mydata, "ccoefficients") # The 'truth'

     c(sd(attr(mydata, "lv")), sd(lv(fit))) # Both values should be approx equal

     # Example 2: negative binomial data fitted using a Poisson model:
     n = 200; p = 5; S = 5
     mydata = rcqo(n, p, S, fam="negbin", sqrt=TRUE)
     myform = attr(mydata, "formula")
     fit = cqo(myform, fam=poissonff, ITol=TRUE, dat=mydata)
     ## Not run: 
     lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S)
     ## End(Not run)
     # Compare the fitted model with the 'truth'
     ccoef(fit)  # The fitted model
     attr(mydata, "ccoefficients") # The 'truth'

     # Example 3: gamma2 data fitted using a Gaussian model:
     n = 200; p = 5; S = 3
     mydata = rcqo(n, p, S, fam="gamma2", Log=TRUE)
     fit = cqo(attr(mydata, "formula"), fam=gaussianff, ITol=TRUE, dat=mydata)
     ## Not run: 
     matplot(attr(mydata, "lv"), exp(mydata[,-(1:(p-1))]), col=1:S) # 'raw' data
     lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S)  # Fitted model to transformed data
     ## End(Not run)
     # Compare the fitted model with the 'truth'
     ccoef(fit)  # The fitted model
     attr(mydata, "ccoefficients") # The 'truth'

