runquantile             package:caTools             R Documentation

_Q_u_a_n_t_i_l_e _o_f _M_o_v_i_n_g _W_i_n_d_o_w

_D_e_s_c_r_i_p_t_i_o_n:

     Moving (aka running, rolling) Window Quantile calculated over a
     vector

_U_s_a_g_e:

       runquantile(x, k, probs, type=7, 
              endrule=c("quantile", "NA", "trim", "keep", "constant", "func"))

_A_r_g_u_m_e_n_t_s:

       x: numeric vector of length n

       k: width of moving window; must be an integer between one and n 

 endrule: character string indicating how the values at the beginning 
          and the end, of the array, should be treated. Only first and
          last 'k2'  values at both ends are affected, where 'k2' is
          the half-bandwidth  'k2 = k %/% 2'.

             *  '"quantile"' - applies the 'quantile'  function to
                smaller and smaller sections of the array. Equivalent
                to:  'for(i in 1:k2) out[i]=quantile(x[1:(i+k2)])'. 

             *  '"trim"' - trim the ends; output array length is equal
                to  'length(x)-2*k2 (out = out[(k2+1):(n-k2)])'. This
                option mimics  output of 'apply' '(embed(x,k),1,FUN)'
                and other  related functions.

             *  '"keep"' - fill the ends with numbers from 'x' vector 
                '(out[1:k2] = x[1:k2])'

             *  '"constant"' - fill the ends with first and last
                calculated  value in output array '(out[1:k2] =
                out[k2+1])'

             *  '"NA"' - fill the ends with NA's '(out[1:k2] = NA)'

             *  '"func"' - same as '"quantile"' but implimented in R.
                This option could be very slow, and is included mostly
                for testing

          Similar to 'endrule' in 'runmed' function which has the 
          following options: "'c("median", "keep", "constant")'" . 

   probs: numeric vector of probabilities with values in [0,1] range 
          used by 'runquantile'. For example 'Probs=c(0,0.5,1)' would
          be  equivalent to running 'runmin', 'runmed' and 'runmax'.
          Same as 'probs' in 'quantile' function. 

    type: an integer between 1 and 9 selecting one of the nine quantile
           algorithms, same as 'type' in 'quantile' function.  Another
          even more readable description of nine ways to calculate
          quantiles  can be found at <URL:
          http://mathworld.wolfram.com/Quantile.html>. 

_D_e_t_a_i_l_s:

     Apart from the end values, the result of y = runquantile(x, k) is
     the same as  "'for(j=(1+k2):(n-k2))
     y[j]=quintile(x[(j-k2):(j+k2)],na.rm = TRUE)'". It can handle 
     non-finite numbers like NaN's and Inf's (like 'quantile(x, na.rm =
     TRUE)').

     The main incentive to write this set of functions was relative
     slowness of  majority of moving window functions available in R
     and its packages.  With the  exception of 'runmed', a running
     window median function, all  functions listed in "see also"
     section are slower than very inefficient 
     "'apply(embed(x,k),1,FUN)'" approach. Relative  speeds of
     'runquantile' is O(n*k)

     Functions 'runquantile' and 'runmad' are using insertion sort to 
     sort the moving window, but gain speed by remembering results of
     the previous  sort. Since each time the window is moved, only one
     point changes, all but one  points in the window are already
     sorted. Insertion sort can fix that in O(k)  time.

_V_a_l_u_e:

     Function 'runquantile' returns a matrix of size [n x 
     length(probs)]. Only in case of 'endrule="trim"' the output will
     have  less rows.

_A_u_t_h_o_r(_s):

     Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

_R_e_f_e_r_e_n_c_e_s:


        *  About quantiles: Hyndman, R. J. and Fan, Y. (1996) _Sample 
           quantiles in statistical packages, American Statistician_,
           50, 361. 

        *  About quantiles: Eric W. Weisstein. _Quantile_. From
           MathWorld-  A Wolfram Web Resource. <URL:
           http://mathworld.wolfram.com/Quantile.html> 

        *  About insertion sort used in 'runmad' and 'runquantile':  R.
           Sedgewick (1988): _Algorithms_. Addison-Wesley (page 99)

_S_e_e _A_l_s_o:

     Links related to:

        *  Running Quantile - 'quantile', 'runmed',  'smooth',
           'rollmedian' from  'zoo' library

        *  Other moving window functions  from this package: 'runmin', 
           'runmax', 'runmean', 'runmad' and 'runsd'

        *  Running Minimum - 'min', 'rollMin' from  'fSeries' library

        *  Running Maximum - 'max', 'rollMax' from  'fSeries' library,
           'rollmax' from 'zoo' library

        *  generic running window functions: 'apply'' (embed(x,k), 1,
           FUN)' (fastest), 'rollFun'  from 'fSeries' (slow), 'running'
           from 'gtools'  package (extremely slow for this purpose),
           'rapply' from  'zoo' library, 'subsums' from  'magic'
           library can perform running window operations on data with
           any  dimensions. 

_E_x_a_m_p_l_e_s:

       # show plot using runquantile
       k=31; n=200;
       x = rnorm(n,sd=30) + abs(seq(n)-n/4)
       y=runquantile(x, k, probs=c(0.05, 0.25, 0.5, 0.75, 0.95))
       col = c("black", "red", "green", "blue", "magenta", "cyan")
       plot(x, col=col[1], main = "Moving Window Quantiles")
       lines(y[,1], col=col[2])
       lines(y[,2], col=col[3])
       lines(y[,3], col=col[4])
       lines(y[,4], col=col[5])
       lines(y[,5], col=col[6])
       lab = c("data", "runquantile(.05)", "runquantile(.25)", "runquantile(0.5)", 
               "runquantile(.75)", "runquantile(.95)")
       legend(0,230, lab, col=col, lty=1 )

       # show plot using runquantile
       k=15; n=200;
       x = rnorm(n,sd=30) + abs(seq(n)-n/4)
       y=runquantile(x, k, probs=c(0.05, 0.25, 0.5, 0.75, 0.95))
       col = c("black", "red", "green", "blue", "magenta", "cyan")
       plot(x, col=col[1], main = "Moving Window Quantiles (smoothed)")
       lines(runmean(y[,1],k), col=col[2])
       lines(runmean(y[,2],k), col=col[3])
       lines(runmean(y[,3],k), col=col[4])
       lines(runmean(y[,4],k), col=col[5])
       lines(runmean(y[,5],k), col=col[6])
       lab = c("data", "runquantile(.05)", "runquantile(.25)", "runquantile(0.5)", 
               "runquantile(.75)", "runquantile(.95)")
       legend(0,230, lab, col=col, lty=1 )
       
       # basic tests against runmin & runmax
       y = runquantile(x, k, probs=c(0, 1))
       a = runmin(x,k) # test only the inner part 
       stopifnot(all(a==y[,1], na.rm=TRUE));
       a = runmax(x,k) # test only the inner part
       stopifnot(all(a==y[,2], na.rm=TRUE));
       
       # basic tests against runmed, including testing endrules
       a = runquantile(x, k, probs=0.5, endrule="keep")
       b = runmed(x, k, endrule="keep")
       stopifnot(all(a==b, na.rm=TRUE));
       a = runquantile(x, k, probs=0.5, endrule="constant")
       b = runmed(x, k, endrule="constant")
       stopifnot(all(a==b, na.rm=TRUE));

       # basic tests against apply/embed
       a = runquantile(x,k, c(0.3, 0.7), endrule="trim")
       b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7)))
       eps = .Machine$double.eps ^ 0.5
       stopifnot(all(abs(a-b)<eps));
       
       # test against loop approach
       # this test works fine at the R prompt but fails during package check - need to investigate
       k=25; n=200;
       x = rnorm(n,sd=30) + abs(seq(n)-n/4) # create random data
       x[seq(1,n,11)] = NaN;                # add NANs
       k2 = k
       k1 = k-k2-1
       a = runquantile(x, k, probs=c(0.3, 0.8) )
       b = matrix(0,n,2);
       for(j in 1:n) {
         lo = max(1, j-k1)
         hi = min(n, j+k2)
         b[j,] = quantile(x[lo:hi], probs=c(0.3, 0.8), na.rm = TRUE)
       }
       #stopifnot(all(abs(a-b)<eps));
       
       # compare calculation of array ends
       a = runquantile(x, k, probs=0.4, endrule="quantile") # fast C code
       b = runquantile(x, k, probs=0.4, endrule="func")     # slow R code
       stopifnot(all(abs(a-b)<eps));
       
       # test if moving windows forward and backward gives the same results
       k=51;
       a = runquantile(x     , k, probs=0.4)
       b = runquantile(x[n:1], k, probs=0.4)
       stopifnot(all(a[n:1]==b, na.rm=TRUE));

       # Exhaustive testing of runquantile to standard R approach
       numeric.test = function (x, k) {
         probs=c(1, 25, 50, 75, 99)/100
         a = runquantile(x,k, c(0.3, 0.7), endrule="trim")
         b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7), na.rm=TRUE))
         eps = .Machine$double.eps ^ 0.5
         stopifnot(all(abs(a-b)<eps));
       }
       n=50;
       x = rnorm(n,sd=30) + abs(seq(n)-n/4) # nice behaving data
       for(i in 2:5) numeric.test(x, i)     # test small window sizes
       for(i in 1:5) numeric.test(x, n-i+1) # test large window size
       x[seq(1,50,10)] = NaN;               # add NANs and repet the test
       for(i in 2:5) numeric.test(x, i)     # test small window sizes
       for(i in 1:5) numeric.test(x, n-i+1) # test large window size
       
       # Speed comparison
       ## Not run: 
       x=runif(1e6); k=1e3+1;
       system.time(runquantile(x,k,0.5))    # Speed O(n*k)
       system.time(runmed(x,k))             # Speed O(n * log(k)) 
       
     ## End(Not run)

