RegressionTestsInterface     package:fRegression     R Documentation

_R_e_g_r_e_s_s_i_o_n _T_e_s_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     A collection and description of functions  to test linear
     regression  models, including tests for higher serial
     correlations, for  heteroskedasticity, for autocorrelations  of
     disturbances, for linearity, and functional  relations. 

     The methods are:

       '"bg"'     Breusch-Godfrey test for higher order serial correlation,
       '"bp"'     Breusch-Pagan test for heteroskedasticity,
       '"dw"'     Durbin-Watson test for autocorrelation of disturbances,
       '"gq"'     Goldfeld-Quandt test for heteroskedasticity,
       '"harv"'   Harvey-Collier test for linearity,
       '"hmc"'    Harrison-McCabe test for heteroskedasticity,
       '"rain"'   Rainbow test for linearity, and
       '"reset"'  Ramsey's RESET test for functional relation.

     There is nothing new, it's just a wrapper to the underlying test
     functions from R's contributed package 'lmtest'. The functions are
     available as "Builtin" functions. Nevertheless, the user can 
     still install and use the original functions from R's 'lmtest' 
     package.

_U_s_a_g_e:

     lmTest(formula, method = c("bg", "bp", "dw", "gq", "harv", "hmc", 
         "rain", "reset"), data = list(), ...)
         
     bgTest(formula, order = 1, type = c("Chisq", "F"), data = list())
     bpTest(formula, varformula = NULL, studentize = TRUE, data = list())
     dwTest(formula, alternative = c("greater", "two.sided", "less"),
         iterations = 15, exact = NULL, tol = 1e-10, data = list())
     gqTest(formula, point=0.5, order.by = NULL, data = list())
     harvTest(formula, order.by = NULL, data = list())
     hmcTest(formula, point = 0.5, order.by = NULL, simulate.p = TRUE, 
         nsim = 1000, plot = FALSE, data = list()) 
     rainTest(formula, fraction = 0.5, order.by = NULL, center = NULL, 
         data = list())
     resetTest(formula, power = 2:3, type = c("fitted", "regressor", "princomp"), 
         data = list())

_A_r_g_u_m_e_n_t_s:

alternative: [dwTest] - 
           a character string specifying the alternative hypothesis,
          either '"greater"', '"two.sided"', or '"less"'. 

  center: [rainTest] - 
           a numeric value. If center is smaller than '1' it is 
          interpreted as percentages of data, i.e. the subset is chosen
           that 'n*fraction' observations are around observation 
          number 'n*center'. If 'center' is greater than  '1' it is
          interpreted to be the index of the center of  the subset. By
          default center is '0.5'. If the Mahalanobis  distance is
          chosen center is taken to be the mean regressor,  but can be
          specified to be a k-dimensional vector if k is the  number of
          regressors and should be in the range of the  respective
          regressors.  

    data: an optional data frame containing the variables in the model.
           By default the variables are taken from the environment
          which  'lmTest' and the other tests are called from. 

   exact: [dwTest] - 
           a logical flag. If set to 'FALSE' a normal approximation 
          will be used to compute the p value, if 'TRUE' the "pan" 
          algorithm is used. The default is to use "pan" if the sample
          size  is '< 100'.  

 formula: a symbolic description for the linear model to be tested. 

fraction: [rainTest] - 
           a numeric value, by default 0.5. The percentage of
          observations  in the subset is determined by 'fraction*n' if
          'n'  is the number of observations in the model.  

iterations: [dwTest] - 
           an integer specifying the number of iterations when
          calculating the p-value with the "pan" algorithm. By default
          15. 

  method: the test method which should be applied. 

    nsim: [hmcTest] - 
           an integer value. Determins how many runs are used to 
          simulate the p value, by default 1000. 

   order: [bgTest] - 
           an integer. The maximal order of serial correlation to be 
          tested. By default 1. 

order.by: [gqTest][harvTest] - 
           a formula. A formula with a single explanatory variable like
           '~ x'. Then the observations in the model are ordered by 
          the size of 'x'. If set to 'NULL', the default, the 
          observations are assumed to be ordered (e.g. a time series). 
           [rainTest] - 
           either a formula or a string. A formula with a single
          explanatory  variable like '~ x'. The observations in the
          model are  ordered by the size of 'x'. If set to 'NULL', the
          default,  the observations are assumed to be ordered (e.g. a
          time series).  If set to '"mahalanobis"' then the
          observations are ordered  by their Mahalanobis distance of
          the data.  

    plot: [hmcTest] - 
           a logical flag. If 'TRUE' the test statistic for all  
          possible breakpoints is plotted, the default is 'FALSE'.  

   point: [gqTest][hmcTest] - 
           a numeric value. If point is smaller than '1' it is 
          interpreted as percentages of data, i.e. 'n*point' is  taken
          to be the (potential) breakpoint in the variances, if  'n' is
          the number of observations in the model. If  'point' is
          greater than '1' it is interpreted to  be the index of the
          breakpoint. By default '0.5'. 

   power: [resetTest] - 
           integers, by default '2:3'. A vector of positive integers 
          indicating the powers of the variables that should be
          included.  By default it is tested for a quadratic or cubic
          influence of  the fitted response.  

simulate.p: [hmcTest] - 
           a logical. If 'TRUE', the default, a p-value will be 
          assessed by simulation, otherwise the p-value is 'NA'.  

studentize: [bpTest] - 
            a logical value. If set to 'TRUE'  Koenker's studentized
          version of the test statistic will  be used. By default set
          to 'TRUE'. 

     tol: [dwTest] - 
           the tolerance value. Eigenvalues computed have to be greater
          than  'tol=1e-10' to be treated as non-zero.  

    type: [bgTest] - 
           the type of test statistic to be returned. Either '"Chisq"' 
          for the Chi-squared test statistic or '"F"' for the F test 
          statistic. 
           [resetTest] - 
           a string indicating whether powers of the '"fitted"' 
          response, the '"regressor"' variables (factors are left  out)
          or the first principal component, '"princomp"', of  the
          regressor matrix should be included in the extended model.  

varformula: [bpTest] - 
           a formula describing only the potential explanatory
          variables  for the variance, no dependent variable needed. By
          default the  same explanatory variables are taken as in the
          main regression  model.  

     ...: [regTest] - 
           additional arguments passed to the underlying lm test. Some
          of  the tests can specify additional optional arguments like
          for alternative hypothesis, the type of test statistic to be
          returned, or others. All the optional arguments have default
          settings.  

_D_e_t_a_i_l_s:

     *bg - Breusch Godfrey Test:* 

        Under H_0 the test statistic is asymptotically Chi-squared 
     with degrees of freedom as given in 'parameter'. If 'type' is set
     to '"F"' the function returns the exact F statistic which, under
     H_0, follows an F distribution with degrees of freedom as given in
     'parameter'. The starting values for the lagged residuals in the
     supplementary regression are chosen to be 0.
      '[lmtest:bgtest]' 

     *bp - Breusch Pagan Test:* 

      The Breusch-Pagan test fits a linear regression model to the 
     residuals of a linear regression model (by default the same 
     explanatory variables are taken as in the main regression model)
     and rejects if too much of the variance is explained by the
     additional explanatory variables. Under H_0 the test statistic of
     the Breusch-Pagan test  follows a chi-squared distribution with
     'parameter'  (the number of regressors without the constant in the
     model)  degrees of freedom.
        '[lmtest:bptest]' 

     *dw - Durbin Watson Test:* 

      The Durbin-Watson test has the null hypothesis that the
     autocorrelation of the disturbances is 0; it can be tested against
     the alternative  that it is greater than, not equal to, or less
     than 0 respectively.  This can be specified by the 'alternative'
     argument. The null distribution of the Durbin-Watson test
     statistic is a linear combination of chi-squared distributions.
     The p value is computed using a Fortran version of the Applied
     Statistics Algorithm AS 153 by Farebrother (1980, 1984). This
     algorithm is called "pan" or "gradsol". For large sample sizes the
     algorithm might fail to compute the p value; in that case a 
     warning is printed and an approximate p value will be given; this
     p  value is computed using a normal approximation with mean and
     variance  of the Durbin-Watson test statistic.
      '[lmtest:dwtest]' 

     *gq - Goldfeld Quandt Test:* 

      The Goldfeld-Quandt test compares the variances of two submodels
     divided by a specified breakpoint and rejects if the variances
     differ. Under H_0 the test statistic of the Goldfeld-Quandt test 
     follows an F distribution with the degrees of freedom as given in 
     'parameter'.
      '[lmtest:gqtest]' 

     *harv - Harvey Collier Test:* 

      The Harvey-Collier test performs a t-test (with 'parameter' 
     degrees of freedom) on the recursive residuals. If the true
     relationship  is not linear but convex or concave the mean of the
     recursive residuals  should differ from 0 significantly.
      '[lmtest:harvtest]' 

     *hmc - Harrison McCabe Test:* 

        The Harrison-McCabe test statistic is the fraction of the
     residual  sum of squares that relates to the fraction of the data
     before the  breakpoint. Under H_0 the test statistic should be
     close to  the size of this fraction, e.g. in the default case
     close to 0.5.  The null hypothesis is reject if the statistic is
     too small.
      '[lmtest:hmctest]' 

     *rain - Rainbow Test:* 

        The basic idea of the Rainbow test is that even if the true 
     relationship is non-linear, a good linear fit can be achieved  on
     a subsample in the "middle" of the data. The null hypothesis  is
     rejected whenever the overall fit is significantly inferious  to
     the fit of the subsample. The test statistic under H_0  follows an
     F distribution with 'parameter' degrees of  freedom.
      '[lmtest:raintest]' 

     *reset - Ramsey's RESET Test* 

        RESET test is popular means of diagnostic for correctness of 
     functional form. The basic assumption is that under the
     alternative,  the model can be written by the regression y=X *
     beta + Z * gamma. 'Z' is generated by taking powers either of the
     fitted response,  the regressor variables or the first principal
     component of 'X'.  A standard F-Test is then applied to determin
     whether these additional  variables have significant influence.
     The test statistic under  H_0 follows an F distribution with
     'parameter' degrees  of freedom.
      '[lmtest:reset]'

_V_a_l_u_e:

     A list with class '"htest"' containing the following components:

statistic: the value of the test statistic. 

parameter: the lag order. 

 p.value: the p-value of the test. 

  method: a character string indicating what type of test was
          performed. 

data.name: a character string giving the name of the data. 

alternative: a character string describing the alternative hypothesis. 

_N_o_t_e:

     The underlying 'lmtest' package comes wit a lot of helpful
     examples. We highly recommend to install the 'lmtest' package and
     to study the examples given therein.

_A_u_t_h_o_r(_s):

     Achim Zeileis and Torsten Hothorn for the 'lmtest' package, 
      Diethelm Wuertz for the Rmetrics R-port.

_R_e_f_e_r_e_n_c_e_s:

     Breusch, T.S. (1979); _Testing for Autocorrelation in Dynamic
     Linear Models_,  Australian Economic Papers 17, 334-355.

     Breusch T.S. and Pagan A.R. (1979); _A Simple Test for
     Heteroscedasticity and Random  Coefficient Variation_,
     Econometrica 47, 1287-1294

     Durbin J. and Watson G.S. (1950); _Testing for Serial Correlation
     in Least Squares Regression I_, Biometrika 37, 409-428.

     Durbin J. and Watson G.S. (1951); _Testing for Serial Correlation
     in Least Squares Regression II_, Biometrika 38, 159-178.

     Durbin J. and Watson G.S. (1971); _Testing for Serial Correlation
     in Least Squares Regression III_, Biometrika 58, 1-19.

     Farebrother R.W. (1980); _Pan's Procedure for the Tail
     Probabilities of the Durbin-Watson Statistic_, Applied Statistics
     29, 224-227.

     Farebrother R.W. (1984); _The Distribution of a Linear Combination
     of $\chi^2$ Random Variables_, Applied Statistics 33, 366-369.

     Godfrey, L.G. (1978); _Testing Against General Autoregressive and
     Moving Average Error Models when the Regressors Include Lagged
     Dependent Variables_,  Econometrica 46, 1293-1302.

     Goldfeld S.M. and Quandt R.E. (1965); _Some Tests for
     Homoskedasticity_ Journal of the American Statistical Association
     60, 539-547.

     Harrison M.J. and McCabe B.P.M. (1979); _A Test for
     Heteroscedasticity based on Ordinary Least  Squares Residuals_
     Journal of the American Statistical Association 74, 494-499.

     Harvey A. and Collier P. (1977); _Testing for Functional
     Misspecification in Regression  Analysis_, Journal of Econometrics
     6, 103-119.

     Johnston, J. (1984);  _Econometric Methods_,  Third Edition,
     McGraw Hill Inc.

     Kraemer W. and Sonnberger H. (1986); _The Linear Regression Model
     under Test_,  Heidelberg: Physica.

     Racine J. and Hyndman R. (2002); _Using R To Teach Econometrics_,
     Journal of Applied Econometrics 17, 175-189.

     Ramsey J.B. (1969); _Tests for Specification Error in Classical
     Linear Least  Squares Regression Analysis_, Journal of the Royal
     Statistical Society, Series B 31, 350-371.

     Utts J.M. (1982); _The Rainbow Test for Lack of Fit in
     Regression_, Communications in Statistics - Theory and Methods 11,
     1801-1815.

_E_x_a_m_p_l_e_s:

     ## bg | dw -
        # Generate a Stationary and an AR(1) Series:
        x = rep(c(1, -1), 50)
        y1 = 1 + x + rnorm(100)
        # Perform Breusch-Godfrey Test for 1st order serial correlation:
        lmTest(y1 ~ x, "bg")
        # ... or for fourth order serial correlation:
        lmTest(y1 ~ x, "bg", order = 4)    
        # Compare with Durbin-Watson Test Results:
        lmTest(y1 ~ x, "dw")
        y2 = filter(y1, 0.5, method = "recursive")
        lmTest(y2 ~ x, "bg") 
        
     ## bp -
        # Generate a Regressor:
        x = rep(c(-1, 1), 50)
        # Generate heteroskedastic and homoskedastic Disturbances
        err1 = rnorm(100, sd = rep(c(1, 2), 50))
        err2 = rnorm(100)
        # Generate a Linear Relationship:
        y1 = 1 + x + err1
        y2 = 1 + x + err2
        # Perform Breusch-Pagan Test
        bp = lmTest(y1 ~ x, "bp")
        bp
        # Calculate Critical Value for 0.05 Level
        qchisq(0.95, bp$parameter)
        lmTest(y2 ~ x, "bp")
        
     ## dw -
        # Generate two AR(1) Error Terms 
        # with parameter rho = 0 (white noise) 
        # and rho = 0.9 respectively
        err1 = rnorm(100)
        # Generate Regressor and Dependent Variable
        x = rep(c(-1,1), 50)
        y1 = 1 + x + err1
        # Perform Durbin-Watson Test:
        lmTest(y1 ~ x, "dw")
        err2 = filter(err1, 0.9, method = "recursive")
        y2 = 1 + x + err2
        lmTest(y2 ~ x, "dw")
        
     ## gq -
        # Generate a Regressor:
        x = rep(c(-1, 1), 50)
        # Generate Heteroskedastic and Homoskedastic Disturbances:
        err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2))
        err2 = rnorm(100)
        # Generate a Linear Relationship:
        y1 = 1 + x + err1
        y2 = 1 + x + err2
        # Perform Goldfeld-Quandt Test:
        lmTest(y1 ~ x, "gq")
        lmTest(y2 ~ x, "gq")
        
     ## harv -
        # Generate a Regressor and Dependent Variable:
        x = 1:50
        y1 = 1 + x + rnorm(50)
        y2 = y1 + 0.3*x^2
        # Perform Harvey-Collier Test:
        harv = lmTest(y1 ~ x, "harv")
        harv
        # Calculate Critical Value vor 0.05 level:
        qt(0.95, harv$parameter)
        lmTest(y2 ~ x, "harv")
        
     ## hmc -
        # Generate a Regressor:
        x = rep(c(-1, 1), 50)
        # Generate Heteroskedastic and Homoskedastic Disturbances:
        err1 = c(rnorm(50, sd = 1), rnorm(50, sd = 2))
        err2 = rnorm(100)
        # Generate a Linear Relationship:
        y1 = 1 + x + err1
        y2 = 1 + x + err2
        # Perform Harrison-McCabe Test:
        lmTest(y1 ~ x, "hmc")
        lmTest(y2 ~ x, "hmc")
        
     ## rain -
        # Generate Series:
        x = c(1:30)
        y = x^2 + rnorm(30, 0, 2)
        # Perform rainbow Test
        rain = lmTest(y ~ x, "rain")
        rain
        # Compute Critical Value:
        qf(0.95, rain$parameter[1], rain$parameter[2]) 
        
     ## reset -
        # Generate Series:
        x = c(1:30)
        y1 = 1 + x + x^2 + rnorm(30)
        y2 = 1 + x + rnorm(30)
        # Perform RESET Test:
        lmTest(y1 ~ x , "reset", power = 2, type = "regressor")
        lmTest(y2 ~ x , "reset", power = 2, type = "regressor")          

