zeroinfl                package:pscl                R Documentation

_Z_e_r_o-_i_n_f_l_a_t_e_d _C_o_u_n_t _D_a_t_a _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit zero-inflated regression models for count data via maximum
     likelihood.

_U_s_a_g_e:

     zeroinfl(formula, data, subset, na.action, weights, offset,
       dist = c("poisson", "negbin", "geometric"),
       link = c("logit", "probit", "cloglog", "cauchit", "log"),
       control = zeroinfl.control(...),
       model = TRUE, y = TRUE, x = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: symbolic description of the model, see details.

data, subset, na.action: arguments controlling formula processing via
          'model.frame'.

 weights: optional numeric vector of weights.

  offset: optional numeric vector with an a priori known component to
          be included in the linear predictor of the count model.

    dist: character specification of count model family (a log link is 
          always used).

    link: character specification of link function in the binary
          zero-inflation model (a binomial family is always used).

 control: a list of control arguments specified via 'zeroinfl.control'.

model, y, x: logicals. If 'TRUE' the corresponding components of the
          fit (model frame, response, model matrix) are returned.

     ...: arguments passed to 'zeroinfl.control' in the default setup.

_D_e_t_a_i_l_s:

     Zero-inflated count models are two-component mixture models
     combining a point mass at zero with a proper count distribution.
     Thus, there are two sources of zeros: zeros may come from  both
     the point mass and from the count component. Usually the count
     model  is a poisson or negative binomial regression (with log
     link).  The geometric distribution is a special case of the
     negative binomial with size parameter equal to 1. For modeling the
     unobserved state (zero vs. count), a binary model is used: in the
     simplest case only with an intercept but potentially containing
     regressors. For this zero-inflation model, a binomial model with
     different links can be used, typically logit or probit.

     The 'formula' can be used to specify both components of the model:
     If a 'formula' of type 'y ~ x1 + x2' is supplied, then the same
     regressors are employed in both components. This is equivalent to
     'y ~ x1 + x2 | x1 + x2'. Of course, a different set of regressors
     could be specified for the count and zero-inflation component,
     e.g., 'y ~ x1 + x2 | z1 + z2 + z3' giving the count data model 'y
     ~ x1 + x2' conditional on ('|') the zero-inflation model 'y ~ z1 +
     z2 + z3'. A simple inflation model where all zero counts have the
     same probability of belonging to the zero component can by
     specified by the formula 'y ~ x1 + x2 | 1'.

     All parameters are estimated by maximum likelihood using 'optim',
     with control options set in 'zeroinfl.control'. Starting values
     can be supplied, estimated by the EM (expectation maximization)
     algorithm, or by 'glm.fit' (the default). Standard errors are
     derived numerically using the Hessian matrix returned by 'optim'.
     See 'zeroinfl.control' for details.

     The returned fitted model object is of class '"zeroinfl"' and is
     similar to fitted '"glm"' objects. For elements such as
     '"coefficients"' or '"terms"' a list is returned with elements for
     the zero and count component, respectively. For details see below.

     A set of standard extractor functions for fitted model objects is
     available for objects of class '"zeroinfl"', including methods to
     the generic functions 'print', 'summary', 'coef',  'vcov',
     'logLik', 'residuals',  'predict', 'fitted', 'terms',
     'model.matrix'. See 'predict.zeroinfl' for more details on all
     methods.

_V_a_l_u_e:

     An object of class '"zeroinfl"', i.e., a list with components
     including 

coefficients: a list with elements '"count"' and '"zero"' containing
          the coefficients from the respective models,

residuals: a vector of raw residuals (observed - fitted),

fitted.values: a vector of fitted means,

   optim: a list with the output from the 'optim' call for minimizing
          the negative log-likelihood,

 control: the control arguments passed to the 'optim' call,

   start: the starting values for the parameters passed to the 'optim'
          call,

 weights: the case weights used,

  offset: the offset vector used (if any),

       n: number of observations,

 df.null: residual degrees of freedom for the null model (= 'n - 2'),

df.residual: residual degrees of freedom for fitted model,

   terms: a list with elements '"count"', '"zero"' and '"full"'
          containing the terms objects for the respective models,

   theta: estimate of the additional theta parameter of the negative
          binomial model (if a negative binomial regression is used),

SE.logtheta: standard error for log(theta),

  loglik: log-likelihood of the fitted model,

    vcov: covariance matrix of all coefficients in the model (derived
          from the Hessian of the 'optim' output),

    dist: character string describing the count distribution used,

    link: character string describing the link of the zero-inflation
          model,

 linkinv: the inverse link function corresponding to 'link',

converged: logical indicating successful convergence of 'optim',

    call: the original function call,

 formula: the original formula,

  levels: levels of the categorical regressors,

contrasts: a list with elements '"count"' and '"zero"' containing the
          contrasts corresponding to 'levels' from the respective
          models,

   model: the full model frame (if 'model = TRUE'),

       y: the response count vector (if 'y = TRUE'),

       x: a list with elements '"count"' and '"zero"' containing the
          model matrices from the respective models (if 'x = TRUE'),

_A_u_t_h_o_r(_s):

     Achim Zeileis <Achim.Zeileis@R-project.org>

_R_e_f_e_r_e_n_c_e_s:

     Cameron, A. Colin and Pravin K. Trevedi. 1998. _Regression
     Analysis of Count  Data._ New York: Cambridge University Press.

     Cameron, A. Colin and Pravin K. Trivedi. 2005. _Microeconometrics:
     Methods and Applications_. Cambridge: Cambridge University Press.

     Lambert, Diane. 1992. "Zero-Inflated Poisson Regression, with an
     Application to Defects in Manufacturing." _Technometrics_.
     *34*(1):1-14

     Zeileis, Achim, Christian Kleiber and Simon Jackman 2008.
     "Regression Models for Count Data in R."  _Journal of Statistical
     Software_, *27*(8). URL <URL: http://www.jstatsoft.org/v27/i08/>.

_S_e_e _A_l_s_o:

     'zeroinfl.control', 'glm', 'glm.fit', 'glm.nb', 'hurdle'

_E_x_a_m_p_l_e_s:

     ## data
     data("bioChemists", package = "pscl")

     ## without inflation
     ## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
     fm_pois <- glm(art ~ ., data = bioChemists, family = poisson)
     fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson)
     fm_nb <- glm.nb(art ~ ., data = bioChemists)

     ## with simple inflation (no regressors for zero component)
     fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists)
     fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")

     ## inflation with regressors
     ## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
     fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists)
     fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")

