hurdle                 package:pscl                 R Documentation

_H_u_r_d_l_e _M_o_d_e_l_s _f_o_r _C_o_u_n_t _D_a_t_a _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit hurdle regression models for count data via maximum
     likelihood.

_U_s_a_g_e:

     hurdle(formula, data, subset, na.action,
       dist = c("poisson", "negbin", "geometric"),
       zero.dist = c("binomial", "poisson", "negbin", "geometric"),
       link = c("logit", "probit", "cloglog", "cauchit", "log"),
       control = hurdle.control(...),
       model = TRUE, y = TRUE, x = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: symbolic description of the model, see details.

data, subset, na.action: arguments controlling formula processing via
          'model.frame'.

    dist: character specification of count model family.

zero.dist: character specification of the zero hurdle model family.

    link: character specification of link function in the binomial zero
          hurdle (only used if 'zero.dist = "binomial"'.

 control: a list of control arguments specified via 'hurdle.control'.

model, y, x: logicals. If 'TRUE' the corresponding components of the
          fit (model frame, response, model matrix) are returned.

     ...: arguments passed to 'hurdle.control' in the default setup.

_D_e_t_a_i_l_s:

     Hurdle count models are two-component models with a truncated
     count component for positive counts and a hurdle component that
     models the zero counts. Thus, unlike zero-inflation models, there
     are _not_ two sources of zeros: the count model is only employed
     if the hurdle for modeling the occurence of zeros is exceeded. The
     count model is typically a truncated Poisson or negative binomial
     regression (with log link). The geometric distribution is a
     special case of the negative binomial with size parameter equal to
     1. For modeling the hurdle (occurence of positive counts) either a
     binomial model can be employed or a censored count distribution.
     Binomial logit and censored geometric models as the hurdle part
     both lead to  the same likelihood function and thus to the same
     coefficient estimates.

     The 'formula' can be used to specify both components of the model:
     If a 'formula' of type 'y ~ x1 + x2' is supplied, then the same
     regressors are employed in both components. This is equivalent to
     'y ~ x1 + x2 | x1 + x2'. Of course, a different set of regressors
     could be specified for the zero hurdle component, e.g., 'y ~ x1 +
     x2 | z1 + z2 + z3' giving the count data model 'y ~ x1 + x2'
     conditional on ('|') the zero hurdle model 'y ~ z1 + z2 + z3'.

     All parameters are estimated by maximum likelihood using 'optim',
     with control options set in 'hurdle.control'. Starting values can
     be supplied, otherwise they are estimated by 'glm.fit' (the
     default). By default, the two components of the model are
     estimated separately using two 'optim' calls. Standard errors are
     derived numerically using the Hessian matrix returned by 'optim'.
     See 'hurdle.control' for details.

     The returned fitted model object is of class '"hurdle"' and is
     similar to fitted '"glm"' objects. For elements such as
     '"coefficients"' or '"terms"' a list is returned with elements for
     the zero and count components, respectively. For details see
     below.

     A set of standard extractor functions for fitted model objects is
     available for objects of class '"hurdle"', including methods to
     the generic functions 'print', 'summary', 'coef',  'vcov',
     'logLik', 'residuals',  'predict', 'fitted', 'terms',
     'model.matrix'. See 'predict.hurdle' for more details on all
     methods.

_V_a_l_u_e:

     An object of class '"hurdle"', i.e., a list with components
     including 

coefficients: a list with elements '"count"' and '"zero"' containing
          the coefficients from the respective models,

residuals: a vector of raw residuals (observed - fitted),

fitted.values: a vector of fitted means,

   optim: a list (of lists) with the output(s) from the 'optim' call(s)
          for minimizing the negative log-likelihood(s),

 control: the control arguments passed to the 'optim' call,

   start: the starting values for the parameters passed to the 'optim'
          call(s),

       n: number of observations,

 df.null: residual degrees of freedom for the null model (= 'n - 2'),

df.residual: residual degrees of freedom for fitted model,

   terms: a list with elements '"count"', '"zero"' and '"full"'
          containing the terms objects for the respective models,

   theta: estimate of the additional theta parameter of the negative
          binomial model(s) (if negative binomial component is used),

SE.logtheta: standard error(s) for log(theta),

  loglik: log-likelihood of the fitted model,

    vcov: covariance matrix of all coefficients in the model (derived
          from the Hessian of the 'optim' output(s)),

    dist: a list with elements '"count"' and '"zero"' with character
          strings describing the respective distributions used,

    link: character string describing the link if a binomial zero
          hurdle model is used,

 linkinv: the inverse link function corresponding to 'link',

converged: logical indicating successful convergence of 'optim',

    call: the original function call,

 formula: the original formula,

  levels: levels of the categorical regressors,

contrasts: a list with elements '"count"' and '"zero"' containing the
          contrasts corresponding to 'levels' from the respective
          models,

   model: the full model frame (if 'model = TRUE'),

       y: the response count vector (if 'y = TRUE'),

       x: a list with elements '"count"' and '"zero"' containing the
          model matrices from the respective models (if 'x = TRUE').

_A_u_t_h_o_r(_s):

     Achim Zeileis <Achim.Zeileis@R-project.org>

_R_e_f_e_r_e_n_c_e_s:

     Cameron, A. Colin and Pravin K. Trivedi. 1998. _Regression
     Analysis of Count  Data_. New York: Cambridge University Press.

     Cameron, A. Colin and Pravin K. Trivedi 2005. _Microeconometrics:
     Methods and Applications_. Cambridge: Cambridge University Press.

     Mullahy, J. 1986. Specification and Testing of Some Modified Count
     Data Models. _Journal of Econometrics_. V33: 341-365.

_S_e_e _A_l_s_o:

     'hurdle.control', 'glm', 'glm.fit', 'glm.nb', 'zeroinfl'

_E_x_a_m_p_l_e_s:

     ## from Long (1997)
     data("bioChemists", package = "pscl")

     ## logit-poisson
     ## "art ~ ." is the same as "art ~ . | .", i.e.
     ## "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment"
     fm_hp1 <- hurdle(art ~ ., data = bioChemists)
     summary(fm_hp1)

     ## geometric-poisson
     fm_hp2 <- hurdle(art ~ ., data = bioChemists, zero = "geometric")
     summary(fm_hp2)

     ## logit and geometric model are equivalent
     coef(fm_hp1, model = "zero") - coef(fm_hp2, model = "zero")

     ## logit-negbin
     fm_hnb1 <- hurdle(art ~ ., data = bioChemists, dist = "negbin")
     summary(fm_hnb1)

     ## negbin-negbin
     fm_hnb2 <- hurdle(art ~ ., data = bioChemists, dist = "negbin", zero = "negbin")
     summary(fm_hnb2)

