ecoML                  package:eco                  R Documentation

_F_i_t_t_i_n_g _P_a_r_a_m_e_t_r_i_c _M_o_d_e_l_s _a_n_d _Q_u_a_n_t_i_f_y_i_n_g _M_i_s_s_i_n_g _I_n_f_o_r_m_a_t_i_o_n
_f_o_r _E_c_o_l_o_g_i_c_a_l _I_n_f_e_r_e_n_c_e _i_n _2_x_2 _T_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'ecoML' is used to fit parametric models for ecological  inference
     in 2 times 2 tables via Expectation Maximization (EM) algorithms.
     The data is specified in proportions. At it's most basic setting,
     the algorithm assumes that the individual-level proportions (i.e.,
     W_1 and W_2) and distributed bivariate normally (after logit
     transformations). The function calculates point estimates of the
     parameters for models based on different assumptions. The standard
     errors of the point estimates are also computed via Supplemented
     EM algorithms. Moreover, 'ecoML' quantifies the amount of missing
     information associated with each parameter and allows researcher
     to examine the impact of missing information on parameter
     estimation in ecological inference. The models and algorithms are
     described in Imai, Lu and Strauss (Forthcoming).

_U_s_a_g_e:

        ecoML(formula, data = parent.frame(), N = NULL, supplement = NULL, 
              theta.start = c(0,0,1,1,0), fix.rho = FALSE,
              context = FALSE, sem = TRUE, epsilon = 10^(-10), 
          maxit = 1000, loglik = TRUE, hyptest = FALSE, verbose = FALSE)  

_A_r_g_u_m_e_n_t_s:

 formula: A symbolic description of the model to be fit, specifying the
          column and row margins of 2 times 2 ecological tables. 'Y ~
          X' specifies 'Y' as the column margin (e.g., turnout) and 'X'
          (e.g., percent African-American) as the row margin. Details
          and specific examples are given below. 

    data: An optional data frame in which to interpret the variables in
          'formula'. The default is the environment in which 'ecoML' is
          called.  

       N: An optional variable representing the size of the unit; e.g.,
          the total number of voters. 

supplement: An optional matrix of supplemental data. The matrix has two
          columns, which contain additional individual-level data such
          as survey data for W_1 and W_2, respectively.  If 'NULL', no
          additional individual-level data are included in the model.
          The default is 'NULL'. 

 fix.rho: Logical. If 'TRUE', the correlation (when 'context=TRUE') or
          the partial correlation (when 'context=FALSE') between W_1
          and W_2  is fixed through the estimation. For details, see 
          Imai, Lu and Strauss(2006). The default is 'FALSE'. 

 context: Logical. If 'TRUE', the contextual effect is also modeled. In
          this case, the row margin (i.e., X) and the individual-level
          rates (i.e., W_1 and W_2) are assumed to be distributed
          tri-variate normally (after logit transformations). See Imai,
          Lu and Strauss (2006) for details. The default is 'FALSE'.  

     sem: Logical. If 'TRUE', the standard errors of parameter
          estimates are estimated via SEM algorithm, as well as the
          fraction of missing data. The default is 'TRUE'.  

theta.start: A numeric vector that specifies the starting values for
          the mean, variance, and covariance. When 'context = FALSE',
          the elements of 'theta.start' correspond to (E(W_1), E(W_2),
          var(W_1), var(W_2), cor(W_1,W_2)). When 'context = TRUE', the
          elements of 'theta.start' correspond to (E(W_1), E(W_2),
          var(W_1), var(W_2), corr(W_1, X), corr(W_2, X),
          corr(W_1,W_2)). Moreover, when 'fix.rho=TRUE', corr(W_1,W_2)
          is set to be the correlation between W_1 and W_2 when
          'context = FALSE', and the partial correlation between W_1
          and W_2 given X when 'context = FALSE'. The default is
          'c(0,0,1,1,0)'.  

 epsilon: A positive number that specifies the convergence criterion
          for EM algorithm. The square root of 'epsilon' is the
          convergence  criterion for SEM algorithm. The default is
          '10^(-10)'.  

   maxit: A positive integer specifies the maximum number of iterations
          before the convergence criterion is met. The default is
          '1000'. 

  loglik: Logical. If 'TRUE', the value of the log-likelihood function
          at each iteration of EM is saved. The default is 'TRUE'. 

 hyptest: Logical. If 'TRUE', model is estimated under the null
          hypothesis that means of W1 and W2 are the same.  The default
          is 'FALSE'.  

 verbose: Logical. If 'TRUE', the progress of the EM and SEM algorithms
          is printed to the screen. The default is 'FALSE'. 

_D_e_t_a_i_l_s:

     When 'SEM' is 'TRUE', 'ecoML' computes the observed-data 
     information matrix for the parameters of interest based on
     Supplemented-EM  algorithm. The inverse of the observed-data
     information matrix can be used  to estimate the
     variance-covariance matrix for the parameters estimated from EM
     algorithms. In addition, it also computes the expected
     complete-data  information matrix. Based on these two measures,
     one can further calculate  the fraction of missing information
     associated with each parameter. See Imai, Lu and Strauss (2006)
     for more details about fraction of missing information.

     Moreover, when 'hytest=TRUE', 'ecoML' allows to estimate the 
     parametric model under the null hypothesis that 'mu_1=mu_2'. One 
     can then construct the likelihood ratio test to assess the
     hypothesis of  equal means. The associated fraction of missing
     information for the test  statistic can be also calculated. For
     details, see Imai, Lu and Strauss (2006) for details.

_V_a_l_u_e:

     An object of class 'ecoML' containing the following elements: 

    call: The matched call.

       X: The row margin, X.

       Y: The column margin, Y.

       N: The size of each table, N.

 context: The assumption under which model is estimated. If  'context =
          FALSE', CAR assumption is adopted and no contextual effect is
          modeled. If 'context = TRUE', NCAR assumption is adopted, and
          contextual effect is modeled.

     sem: Whether SEM algorithm is used to estimate the standard errors
          and observed information matrix for the parameter estimates.

 fix.rho: Whether the correlation or the partial correlation between
          W_1 an W_2 is fixed in the estimation.

     r12: If 'fix.rho = TRUE', the value that corr(W_1, W_2) is fixed
          to.

 epsilon: The precision criterion for EM convergence.  sqrt{epsilon} is
          the precision criterion for SEM convergence.

theta.sem: The ML estimates of E(W_1),E(W_2), var(W_1),var(W_2), and
          cov(W_1,W_2). If 'context = TRUE', E(X),cov(W_1,X), 
          cov(W_2,X) are also reported.

       W: In-sample estimation of W_1 and W_2.

suff.stat: The sufficient statistics for 'theta.em'.

iters.em: Number of EM iterations before convergence is achieved.

iters.sem: Number of SEM iterations before convergence is achieved.

  loglik: The log-likelihood of the model when convergence is achieved.

loglik.log.em: A vector saving the value of the log-likelihood function
          at each iteration of the EM algorithm.

mu.log.em: A matrix saving the unweighted mean estimation of the
          logit-transformed individual-level proportions (i.e., W_1 and
          W_2) at each iteration of the EM process.

Sigma.log.em: A matrix saving the log of the variance estimation of the
          logit-transformed individual-level proportions (i.e., W_1 and
          W_2) at each iteration of EM process. Note, non-transformed
          variances are displayed on the screen (when 'verbose =
          TRUE').

rho.fisher.em: A matrix saving the fisher transformation of the
          estimation of the correlations between the logit-transformed
          individual-level proportions (i.e., W_1 and W_2) at each
          iteration of EM process. Note, non-transformed correlations
          are displayed on the screen (when 'verbose = TRUE').

      DM: The matrix characterizing the rates of convergence of the EM 
          algorithms. Such information is also used to calculate the
          observed-data information matrix

    Icom: The (expected) complete data information matrix estimated 
          via SEM algorithm. When 'context=FALSE, fix.rho=TRUE', 
          'Icom' is 4 by 4. When 'context=FALSE, fix.rho=FALSE', 
          'Icom' is 5 by 5. When 'context=TRUE', 'Icom'  is 9 by 9.

    Iobs: The observed information matrix. The dimension of  'Iobs' is
          same as 'Icom'.

   Imiss: The difference between 'Icom' and 'Iobs'.  The dimension of
          'Imiss' is same as 'miss'.

    Vobs: The (symmetrized) variance-covariance matrix of the ML
          parameter estimates. The dimension of 'Vobs' is same as 
          'Icom'.

    Iobs: The (expected) complete-data variance-covariance matrix.  The
          dimension of 'Iobs' is same as 'Icom'.

Vobs.original: The estimated variance-covariance matrix of the  ML
          parameter  estimates. The dimension of 'Vobs' is same as 
          'Icom'.

    Fmis: The fraction of missing information associated with each 
          parameter estimation. 

   VFmis: The proportion of increased variance associated with each 
          parameter estimation due to observed data. 

  Ieigen: The largest eigen value of 'Imiss'.

Icom.trans: The complete data information matrix for the fisher 
          transformed parameters.

Iobs.trans: The observed data information matrix for the fisher 
          transformed parameters.

Fmis.trans: The fractions of missing information associated with  the
          fisher transformed parameters.

_A_u_t_h_o_r(_s):

     Kosuke Imai, Department of Politics, Princeton University,
     kimai@Princeton.Edu, <URL: http://imai.princeton.edu>; Ying Lu,
     Department of Sociology, University of Colorado at Boulder, 
     ying.lu@Colorado.Edu; Aaron Strauss, Department of Politics,
     Princeton University, abstraus@Princeton.Edu.

_R_e_f_e_r_e_n_c_e_s:

     Imai, Kosuke, Ying Lu and Aaron Strauss. (Forthcoming). "eco: R
     Package for Ecological Inference in 2x2 Tables" Journal of
     Statistical Software, available at <URL:
     http://imai.princeton.edu/research/eco.html>

     Imai, Kosuke, Ying Lu and Aaron Strauss. (Forthcoming). "Bayesian
     and Likelihood Inference for 2 x 2 Ecological Tables: An
     Incomplete Data Approach" Political Analysis, available at <URL:
     http://imai.princeton.edu/research/eiall.html>

_S_e_e _A_l_s_o:

     'eco', 'ecoNP', 'summary.ecoML'

_E_x_a_m_p_l_e_s:

     ## load the census data
     data(census)

     ## NOTE: convergence has not been properly assessed for the following
     ## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
     ## In the first example below, in the interest of time, only part of the
     ## data set is analyzed and the convergence requirement is less stringent
     ## than the default setting.

     ## In the second example, the program is arbitrarily halted 100 iterations
     ## into the simulation, before convergence.

     ## load the Robinson's census data
     data(census)

     ## fit the parametric model with the default model specifications
     ## Not run: res <- ecoML(Y ~ X, data = census[1:100,],N=census[1:100,3],epsilon=10^(-6), verbose = TRUE)
     ## summarize the results
     ## Not run: summary(res)

     ## obtain out-of-sample prediction
     ## Not run: out <- predict(res, verbose = TRUE)
     ## summarize the results
     ## Not run: summary(out)

     ## fit the parametric model with some individual 
     ## level data using the default prior specification
     surv <- 1:600
     ## Not run: 
     res1 <- ecoML(Y ~ X, context = TRUE, data = census[-surv,], 
                        supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)
     ## End(Not run)
     ## summarize the results
     ## Not run: summary(res1)

