zeroinfl                package:pscl                R Documentation

_Z_e_r_o-_i_n_f_l_a_t_e_d _C_o_u_n_t _D_a_t_a _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit zero-inflated regression models for count data via maximum
     likelihood.

_U_s_a_g_e:

     zeroinfl(formula, data, subset, na.action,
       dist = c("poisson", "negbin", "geometric"),
       link = c("logit", "probit", "cloglog", "cauchit", "log"),
       control = zeroinfl.control(...),
       model = TRUE, y = TRUE, x = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: symbolic description of the model, see details.

data, subset, na.action: arguments controlling formula processing via
          'model.frame'.

    dist: character specification of count model family (a log link is 
          always used).

    link: character specification of link function in the binary
          zero-inflation model (a binomial family is always used).

 control: a list of control arguments specified via 'zeroinfl.control'.

model, y, x: logicals. If 'TRUE' the corresponding components of the
          fit (model frame, response, model matrix) are returned.

     ...: arguments passed to 'zeroinfl.control' in the default setup.

_D_e_t_a_i_l_s:

     Zero-inflated count models are two-component mixture models
     combining a point mass at zero with a proper count distribution.
     Thus, there are two sources of zeros: zeros may come from  both
     the point mass and from the count component. Usually the count
     model  is a poisson or negative binomial regression (with log
     link).  The geometric distribution is a special case of the
     negative binomial with size parameter equal to 1. For modeling the
     unobserved state (zero vs. count), a binary model is used: in the
     simplest case only with an intercept but potentially containing
     regressors. For this zero-inflation model, a binomial model with
     different links can be used, typically logit or probit.

     The 'formula' mainly describes the count data model, i.e., 'y ~ x1
     + x2' specifies a count data regression where all zero counts have
     the same probability of belonging to the zero component. This is
     equivalent to the model 'y ~ x1 + x2 | 1', making it more explicit
     that the zero-inflation model only has an intercept. Additionally,
     further regressors can be added to the zero-inflation model so
     that not all zeros have the same probability for belonging to the
     point mass component or to the count component. A typical formula
     is 'y ~ x1 + x2 | z1 + z2'. The regressors in the zero and the
     count component can be overlapping (or identical).

     All parameters are estimated by maximum likelihood using 'optim',
     with control options set in 'zeroinfl.control'. Starting values
     can be supplied, estimated by the EM (expectation maximization)
     algorithm, or by 'glm.fit' (the default). The latter corresponds
     to the first iteration of the EM algorithm and initializes the
     unobserved state as 'y > 0', i.e., all zeros are in the perfect
     component and only the non-zero counts in the count component.
     Standard errors are derived numerically using the Hessian matrix
     returned by 'optim'. See 'zeroinfl.control' for details.

     The returned fitted model object is of class '"zeroinfl"' and is
     similar to fitted '"glm"' objects. For elements such as
     '"coefficients"' or '"terms"' a list is returned with elements for
     the zero and count component, respectively. For details see below.

     A set of standard extractor functions for fitted model objects is
     available for objects of class '"zeroinfl"', including methods to
     the generic functions 'print', 'summary', 'coef',  'vcov',
     'logLik', 'residuals',  'predict', 'fitted', 'terms',
     'model.matrix'. See 'predict.zeroinfl' for more details on all
     methods.

_V_a_l_u_e:

     An object of class '"zeroinfl"', i.e., a list with components
     including 

coefficients: a list with elements '"count"' and '"zero"' containing
          the coefficients from the respective models,

residuals: a vector of raw residuals (observed - fitted),

fitted.values: a vector of fitted means,

   optim: a list with the output from the 'optim' call for minimizing
          the negative log-likelihood,

 control: the control arguments passed to the 'optim' call,

   start: the starting values for the parameters passed to the 'optim'
          call,

       n: number of observations,

 df.null: residual degrees of freedom for the null model (= 'n - 2'),

df.residual: residual degrees of freedom for fitted model,

   terms: a list with elements '"count"', '"zero"' and '"full"'
          containing the terms objects for the respective models,

   theta: estimate of the additional theta parameter of the negative
          binomial model (if a negative binomial regression is used),

SE.logtheta: standard error for log(theta),

  loglik: log-likelihood of the fitted model,

    vcov: covariance matrix of all coefficients in the model (derived
          from the Hessian of the 'optim' output),

    dist: character string describing the count distribution used,

    link: character string describing the link of the zero-inflation
          model,

 linkinv: the inverse link function corresponding to 'link',

converged: logical indicating successful convergence of 'optim',

    call: the original function call,

 formula: the original formula,

  levels: levels of the categorical regressors,

contrasts: a list with elements '"count"' and '"zero"' containing the
          contrasts corresponding to 'levels' from the respective
          models,

   model: the full model frame (if 'model = TRUE'),

       y: the response count vector (if 'y = TRUE'),

       x: a list with elements '"count"' and '"zero"' containing the
          model matrices from the respective models (if 'x = TRUE'),

_A_u_t_h_o_r(_s):

     Achim Zeileis <Achim.Zeileis@R-project.org>

_R_e_f_e_r_e_n_c_e_s:

     Cameron, A. Colin and Pravin K. Trevedi. 1998. _Regression
     Analysis of Count  Data._ New York: Cambridge University Press.

     Cameron, A. Colin and Pravin K. Trivedi. 2005. _Microeconometrics:
     Methods and Applications_. Cambridge: Cambridge University Press.

     Lambert, Diane. 1992. "Zero-Inflated Poisson Regression, with an
     Application to Defects in Manufacturing."
     _Technometrics_.V34(1):1-14

_S_e_e _A_l_s_o:

     'zeroinfl.control', 'glm', 'glm.fit', 'glm.nb', 'hurdle'

_E_x_a_m_p_l_e_s:

     ## from Long (1997)
     data("bioChemists", package = "pscl")

     ## without inflation
     ## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
     fm_pois <- glm(art ~ ., data = bioChemists, family = poisson)
     fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson)
     fm_nb <- glm.nb(art ~ ., data = bioChemists)

     ## with simple inflation
     ## (no regressors for 0 component)
     fm_zip <- zeroinfl(art ~ ., data = bioChemists)
     fm_zinb <- zeroinfl(art ~ ., data = bioChemists, dist = "negbin", EM = TRUE)

     ## inflation with regressors (choose starting values by EM)
     ## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
     fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists, EM = TRUE)
     fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin", EM = TRUE)

