MissingValues           package:fMultivar           R Documentation

_H_a_n_d_l_i_n_g _M_i_s_s_i_n_g _V_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     A collection and description of functions  for handling missing
     values in 'timeSeries'  objects or in objects which can be
     transformed  into a vector or a two dimensional matrix. 

     The functions are listed by topic. 

       'removeNA'      Removes NAs from a matrix object,
       'substituteNA'  substitute NAs by zero, the column mean or median,
       'interpNA'      interpolates NAs using R's "approx" function,
       'knnNA'         imputes NAs by the "knn"-Algorithm from R's EMV package.

_U_s_a_g_e:

     removeNA(x, ...)
     substituteNA(x, type = c("zeros", "mean", "median"), ...)
     interpNA(x, method = c("linear", "before", "after"), ...)
     knnNA(x, k = max(dim(as.matrix(x))[1]*0.01,2), correlation = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

correlation: [knnNA] - 
           a logical value, if TRUE the selection of the neighbours is
          based  on the sample correlation. The neighbours with the
          highest  correlations are selected. 

       k: [knnNA] - 
           the number of neighboors (rows) to estimate the missing
          values. 

  method: [interpNA] - 
           Specifies the method how to interpolate the matrix column by
          column. One of the applied vector strings: 
          'method="linear"', 'method="before"' or  'method="after"'.
          For the  interpolation the function 'approx' is used. 

    type: [substituteNA] - 
           Three alternative methods are provided to remove NAs from
          the data:  'type="zeros"' replaces the missing values by
          zeros, 'type="mean"' replaces the missing values by the
          column mean, 'type="median"' replaces the missing values by
          the the column median. 

       x: a numeric matrix, or any other object which can be
          transformed into a matrix through 'x = as.matrix(x, ...)'. If
          'x' is a vector, it will be transformed into a
          one-dimensional matrix. 

     ...: arguments to be passed to the function 'as.matrix'. 

_D_e_t_a_i_l_s:

     *Missing Values in Price and Index Series:*

     Applied to 'timeSeries' objects the function 'removeNA' just
     removes rows with NAs from the series. For an interpolation of
     time series points one can use the function 'interpNA'. Three
     different methods of interpolation are offered: '"linear"' does a
     linear interpolation, '"before"' uses the previous value, and
     '"after"' uses the following value. Note, that the  interpolation
     is done on the index scale and not on the time scale.

     The function 'knnNA' estimates missing values of a timeSeries 
     object or of a matrix based on a k-th neighbours algorithm.
     Missing  values can be either -Inf, Inf, NA, or NaN.  Based on the
     Euclidian distance, the algorithm selects the k-th  nearest rows
     (that do not contain any missing values) to the one  containing at
     least one missing value, based on the Euclidian distance  or the
     sample correlation. Then the missing values are replaced by the 
     average of the neighbours. Note, that if a row only contains
     missing  values then the estimation is not possible.
      [EMV:knn].

     *Missing Values in Return Series:*

     For return series the function 'substituteNA' may be useful. The 
     function allows to fill missing values either by 'method="zeros"',
      the 'method="mean"' or the 'method="median"' value of the 
     appropriate columns.

_A_u_t_h_o_r(_s):

     Raphael Gottardo for the 'knn' function, 
      Diethelm Wuertz for the Rmetrics R-port.

_R_e_f_e_r_e_n_c_e_s:

     Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., 
     Tibshirani R., Botstein D., Altman R.B., (2001);  _Missing Value
     Estimation Methods for DNA microarrays_ Bioinformatics 17,
     520-525.

_E_x_a_m_p_l_e_s:

     ## SOURCE("fMultivar.6B-MissingValues")

     ## Create a Matrix with NAs:
        X = matrix(rnorm(100), ncol = 5)
        # a single NA inside:
        X[3, 5] = NA
        # three in a row inside:
        X[17, 2:4] = c(NA, NA, NA)
        # three in a column inside:
        X[13:15, 4] = c(NA, NA, NA)
        # two at the right border:
        X[11:12, 5] = c(NA, NA)
        # one in the lower left corner:
        X[20, 1] = NA
        print(X)
          
     ## Remove rows with NA's
        removeNA(X)
        # Now we have only 12 lines!
        
     ## Subsitute NA's by zeros or column mean
        substituteNA(X, type = "zeros")
        substituteNA(X, type = "mean")
        
     ## Interpolate NA's liearily:
        interpNA(X, method = "linear")
        # Note the corner missing value cannot be interpolated!
        # Take previous values in a column:
        interpNA(X, method = "before")
        # Also here, the corner value is excluded
        
     ## Interpolate using the knn Algorithm:
        knnNA(X)

