betabinomial              package:VGAM              R Documentation

_B_e_t_a-_b_i_n_o_m_i_a_l _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a beta-binomial distribution by maximum likelihood
     estimation. The two parameters here are the mean and correlation
     coefficient.

_U_s_a_g_e:

     betabinomial(lmu="logit", lrho="logit", irho=0.5, zero=2)

_A_r_g_u_m_e_n_t_s:

lmu, lrho: Link functions applied to the two parameters. See 'Links'
          for more choices. The defaults ensure the parameters remain
          in (0,1). 

    irho: Optional initial value for the correlation parameter. If
          given, it must be in (0,1), and is recyled to the necessary
          length.  Assign this argument a value if a convergence
          failure occurs. Setting 'irho=NULL' means an initial value is
          obtained internally, though this can give unsatisfactory
          results.

    zero: An integer specifying which linear/additive predictor is to
          be modelled as an intercept only. If assigned, the single
          value should be either '1' or '2'. The default is to have a
          single correlation parameter. To model both parameters as
          functions of the covariates assign 'zero=NULL'.

_D_e_t_a_i_l_s:

     There are several parameterizations of the beta-binomial
     distribution. This family function directly models the mean and
     correlation parameter, i.e., the probability of success.  The
     model can be written T|P=p ~ Binomial(N,p) where P has a beta
     distribution with shape parameters alpha and beta. Here, N is the
     number of trials (e.g., litter size), T=NY is the number of
     successes, and p is the probability of a success (e.g., a
     malformation). That is, Y is the _proportion_ of successes. Like
     'binomialff', the fitted values are the estimated probability of
     success (i.e., E[Y] and not E[T])  and the prior weights N are
     attached separately on the object in a slot.

     The probability function is

      P(T=t) = choose(N,t) B(alpha+t, beta+N-t) / B(alpha, beta)

     where t=0,1,...,N, and B is the beta function with shape
     parameters alpha and beta. Recall Y = T/N is the real response
     being modelled.

     The default model is eta1 =logit(mu) and eta2 = logit(rho) because
     both parameters lie between 0 and 1. The mean (of Y) is  p = mu =
     alpha / (alpha + beta) and the variance (of Y) is 
     mu(1-mu)(1+(N-1)rho)/N. Here, the correlation rho is given by 1/(1
     + alpha + beta) and is the correlation between the N individuals
     within a litter. A _litter effect_ is typically reflected by a
     positive value of rho. It is known as the _over-dispersion
     parameter_.

     This family function uses Fisher scoring. Elements of the
     second-order expected derivatives with respect to alpha and  beta
     are computed numerically, which may  fail for large alpha, beta, 
     N or else take a long time.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm'.

     Suppose 'fit' is a fitted beta-binomial model. Then 'fit@y'
     contains the sample proportions y, 'fitted(fit)' returns estimates
     of E(Y), and 'weights(fit, type="prior")' returns the number of
     trials N.

_W_a_r_n_i_n_g:

     This family function is prone to numerical difficulties due to the
     expected information matrices not being positive-definite or
     ill-conditioned over some regions of the parameter space. If
     problems occur try setting 'irho' to some other value, or else use
     the 'etastart' argument of 'vglm', etc.

_N_o_t_e:

     This function processes the input in the same way as 'binomialff'.
     But it does not handle  the case N=1 very well because there are
     two parameters to estimate, not one, for each row of the input.
     Cases where N=1 can be omitted via the  'subset' argument of
     'vglm'.

     The _extended_ beta-binomial distribution of Prentice (1986) is
     currently not implemented in the 'VGAM' package as it has
     range-restrictions for the correlation parameter that are
     currently too difficult to handle in this package.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Moore, D. F. and Tsiatis, A. (1991) Robust estimation of the
     variance in moment methods for extra-binomial and extra-Poisson
     variation. _Biometrics_, *47*, 383-401.

     Prentice, R. L. (1986) Binary regression using an extended
     beta-binomial distribution, with discussion of correlation induced
     by covariate measurement errors. _Journal of the American
     Statistical Association_, *81*, 321-327.

_S_e_e _A_l_s_o:

     'betabin.ab', 'Betabin', 'binomialff', 'betaff', 'dirmultinomial',
     'lirat'.

_E_x_a_m_p_l_e_s:

     # Example 1
     N = 10; mu = 0.5; rho = 0.8 
     y = rbetabin(n=100, size=N, prob=mu, rho=rho)
     fit = vglm(cbind(y,N-y) ~ 1, betabinomial, trace=TRUE)
     coef(fit, matrix=TRUE)
     Coef(fit)
     cbind(fit@y, weights(fit, type="prior"))[1:5,]

     # Example 2
     data(lirat)
     fit = vglm(cbind(R,N-R) ~ 1, betabinomial, data=lirat,
                trace=TRUE, subset=N>1)
     coef(fit, matrix=TRUE)
     Coef(fit)
     t(fitted(fit))
     t(fit@y)
     t(weights(fit, type="prior"))

     # Example 3, which is more complicated
     lirat = transform(lirat, fgrp = factor(grp))
     summary(lirat)   # Only 5 litters in group 3
     fit2 = vglm(cbind(R,N-R) ~ fgrp + hb, betabinomial(zero=2),
                data=lirat, trace=TRUE, subset=N>1)
     coef(fit2, matrix=TRUE)
     ## Not run: 
     plot(lirat$hb[lirat$N>1], fit2@misc$rho,
          xlab="Hemoglobin", ylab="Estimated rho",
          pch=as.character(lirat$grp[lirat$N>1]),
          col=lirat$grp[lirat$N>1])
     ## End(Not run)
     ## Not run: 
     data(lirat)
     attach(lirat)
     # cf. Figure 3 of Moore and Tsiatis (1991)
     plot(hb, R/N, pch=as.character(grp), col=grp, las=1,
          xlab="Hemoglobin level", ylab="Proportion Dead",
          main="Fitted values (lines)")
     detach(lirat)

     smalldf = lirat[lirat$N>1,]
     for(gp in 1:4) {
         xx = smalldf$hb[smalldf$grp==gp]
         yy = fitted(fit2)[smalldf$grp==gp]
         o = order(xx)
         lines(xx[o], yy[o], col=gp)
     }
     ## End(Not run)

