hyperg                 package:VGAM                 R Documentation

_H_y_p_e_r_g_e_o_m_e_t_r_i_c _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Family function for a hypergeometric distribution where either the
     number of white balls or the total number of white and black balls
     are unknown.

_U_s_a_g_e:

     hyperg(N=NULL, D=NULL, lprob="logit", earg=list(), iprob=NULL)

_A_r_g_u_m_e_n_t_s:

       N: Total number of white and black balls in the urn. Must be a
          vector with positive values, and is recycled, if necessary,
          to the same length as the response. One of 'N' and 'D' must
          be specified. 

       D: Number of white balls in the urn. Must be a vector with
          positive values, and is recycled, if necessary, to the same
          length as the response. One of 'N' and 'D' must be specified. 

   lprob: Link function for the probabilities. See 'Links' for more
          choices.

    earg: List. Extra argument for the link. See 'earg' in 'Links' for
          general information.

   iprob: Optional initial value for the probabilities. The default is
          to choose initial values internally.

_D_e_t_a_i_l_s:

     Consider the scenario from 'Hypergeometric' where there are N=m+n
     balls in an urn, where m are white and n are black. A simple
     random sample (i.e., _without_ replacement) of k balls is taken.
     The response here is the sample _proportion_ of white balls. In
     this document,  'N' is N=m+n, 'D' is m (for the number of
     ``defectives'', in quality control terminology, or equivalently,
     the number of marked individuals). The parameter to be estimated
     is the population proportion of white balls, viz. prob = m/(m+n).

     Depending on which one of 'N' and 'D' is inputted, the estimate of
     the other parameter can be obtained from the equation prob =
     m/(m+n), or equivalently, 'prob = D/N'.  However, the
     log-factorials are computed using 'lgamma' and both m and n are
     not restricted to being integer. Thus if an integer N is to be
     estimated, it will be necessary to evaluate the likelihood
     function at integer values about the estimate, i.e., at
     'trunc(Nhat)' and 'ceiling(Nhat)' where 'Nhat' is the (real)
     estimate of N.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm', 'vgam', 'rrvglm',
     'cqo', and 'cao'.

_W_a_r_n_i_n_g:

     No checking is done to ensure that certain values are within
     range, e.g., k <= N.

_N_o_t_e:

     The response can be of one of three formats: a factor (first level
     taken as success), a vector of proportions of success, or a
     2-column matrix (first column = successes) of counts.  The
     argument 'weights' in the modelling function can also be
     specified. In particular, for a general vector of proportions, you
     will need to specify 'weights' because the number of trials is
     needed.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Evans, M., Hastings, N. and Peacock, B. (2000) _Statistical
     Distributions_, New York: Wiley-Interscience, Third edition.

_S_e_e _A_l_s_o:

     'Hypergeometric', 'binomialff'.

_E_x_a_m_p_l_e_s:

     nn = 100
     m = 5   # number of white balls in the population
     k = rep(4, len=nn)   # sample sizes
     n = 4   # number of black balls in the population
     y  = rhyper(nn=nn, m=m, n=n, k=k)
     yprop = y / k  # sample proportions

     # N is unknown, D is known. Both models are equivalent:
     fit = vglm(cbind(y,k-y) ~ 1, hyperg(D=m), trace=TRUE, crit="c")
     fit = vglm(yprop ~ 1, hyperg(D=m), weight=k, trace=TRUE, crit="c")

     # N is known, D is unknown. Both models are equivalent:
     fit = vglm(cbind(y,k-y) ~ 1, hyperg(N=m+n), trace=TRUE, crit="l")
     fit = vglm(yprop ~ 1, hyperg(N=m+n), weight=k, trace=TRUE, crit="l")

     coef(fit, matrix=TRUE)
     Coef(fit)  # Should be equal to the true population proportion
     unique(m / (m+n))  # The true population proportion
     fit@extra
     fitted(fit)[1:4]
     summary(fit)

