eachElem                 package:nws                 R Documentation

_A_p_p_l_y _a _F_u_n_c_t_i_o_n _i_n _P_a_r_a_l_l_e_l _o_v_e_r _a _S_e_t _o_f _L_i_s_t_s _a_n_d _V_e_c_t_o_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     'eachElem' executes function 'fun' multiple times in parallel with
     a varying set of arguments, and returns the results in a list.  It
     is functionally similar to the standard R 'lapply' function, but
     is more flexible in the way that the function arguments can be
     specified.

_U_s_a_g_e:

       ## S4 method for signature 'sleigh':
       eachElem(.Object, fun, elementArgs=list(), fixedArgs=list(), eo=NULL, DEBUG=FALSE)

_A_r_g_u_m_e_n_t_s:

 .Object: sleigh class object.

     fun: the function to be evaluated by the sleigh. In the case of
          functions like '+', '%*%', etc., the function name must be
          quoted.

elementArgs: list of vectors, lists, matrices, and data frames that
          specify (some of) the arguments to be passed to 'fun'.

fixedArgs: list of additional arguments to be passed to 'fun'.

      eo: list specifying special options.

   DEBUG: logical; should 'browser' function be called upon entry to
          'eachElem'?

_D_e_t_a_i_l_s:

     'eachElem' forms argument sets from objects passed in via
     'elementArgs' and 'fixedArgs'.  Both 'elementArgs' and 'fixedArgs'
     should be lists, with each element of these lists corresponding to
     an argument of 'fun'.  The elements of 'elementsArgs' are used to
     specify the arguments that are changing, or varying, from task to
     task, while the elements of 'fixedArgs' are used to specify the
     arguments that do not vary form task to task.  The number of tasks
     that are executed by a call to 'eachElem' is basically equal to
     the length of the longest vector (or list, etc) in 'elementArgs'. 
     If any elements of 'elementArgs' are shorter, then their values
     are recycled, using the standard R rules.

     The elements of 'elementArgs' may be vectors, lists, matrices, or
     data frames.  The vectors and lists are always iterated over by
     element, or "cell", but matrices and data frames can also be
     iterated over by row or column.  This is controlled by the 'by'
     option, specified via the 'eo' argument.  See below for more
     information.

     For example:

     eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))

     This will submit four tasks, since the length of 1:4 is four.  The
     four tasks will be to add the arguments 1 and 100, 2 and 100, 3
     and 100, and 4 and 100.  The result is a list containing the four
     values 101, 102, 103, and 104.

     Another way to do the same thing is with:

     eachElem(s, '+', elementArgs=list(1:4, 100))

     Since the second element of 'elementArgs' is length one, it's
     value is recycled four times, thus specifying the same set of
     tasks as in the previous example.  This method also has the
     advantage of making it easy to put fixed values before varying
     values, without the need for the 'eo$argPermute' option, discussed
     later.  For example:

     eachElem(s, '-', elementArgs=list(100, 1:4))

     is similar to the R statement:

     100 - 1:4

     Note that in simple examples like these, where the results are
     numeric values, the standard R 'unlist' function can be very
     useful for converting the resulting list into a vector.

     The 'eo' argument is a list that can be used to specify various
     options.  The current options are: 'eo$elementFunc',
     'eo$accumulator', 'eo$by', 'eo$chunkSize', 'eo$loadFactor',
     'eo$blocking', and 'eo$argPermute'.

     The 'eo$elementFunc' option can be used to specify a callback
     function that provides the varying arguments for 'fun' in place of
     'elementArgs' (ie. you can't specify both 'eo$elementFunc' and
     'elementArgs').  'eachElem' calls the 'eo$elementFunc' function to
     get a list of arguments for one invocation of 'fun', and will keep
     calling it until 'eo$elementFunc' signals that there are no more
     tasks to execute by calling the 'stop' function with no arguments.
     'eachElem' appends any values specified by 'fixedArgs' to the list
     returned by 'eo$elementFunc' just as if 'elementArgs' had been
     specified.

     'eachElem' passes the number of the desired task (starting from 1)
     as the first argument to 'eo$elementFunc', and the value of the
     'eo$by' option as the second argument.  Note that the use of the
     'eo$elementFunc' function is an advanced feature, but is very
     useful when executing a large number of tasks, or when the
     arguments are coming from a database query, for example.  For that
     reason, the 'eo$loadFactor' option should usually be used in
     conjunction with 'eo$elementFunc' (see description below).

     The 'eo$accumulator' option can be used to specify a callback
     function that will receive the results of the task execution as
     soon as they are complete, rather than returning all of the task
     results as a list when 'eachElem' completes.  In other words,
     'eachElem' will call the 'eo$accumulator' function with task
     results as soon as it receives them from the sleigh workers,
     rather than saving them in memory until all the tasks are
     complete.  Note that if the tasks are "chunked" (using the
     'eo$chunkSize' option described below), then the 'eo$accumulator'
     function will receive multiple task results, which is why the task
     results are always passed to the 'eo$accumulator' function in a
     list.

     The first argument to the 'eo$accumulator' function is a list of
     results, where the length of the list is equal to 'eo$chunkSize'.
     The second argument is a vector of task numbers, starting from 1,
     where the length of the vector is also equal to 'eo$chunkSize'.
     The task numbers are very important, because the results are not
     guaranteed to be returned in order.  'eo$accumulator' is another
     advanced feature, and like 'eo$elementFunc', is very useful when
     executing a large number of tasks.  It allows you to process each
     result as they finish, rather than forcing you to wait until all
     of the tasks are complete.  In conjunction with 'eo$elementFunc'
     and 'eo$loadFactor', you can set up a pipeline, allowing you to
     process an unlimited number of tasks efficiently.  Note that when
     'eo$accumulator' is specified, 'eachElem' returns NULL, not the
     list of results, since 'eachElem' doesn't save any of the results
     after passing them to the 'eo$accumulator' function.

     The 'eo$by' option specifies the iteration scheme to use for
     matrix and data frame elements in 'elementArgs'.  The default
     value is "row", but it can also be set to "column" or "cell". 
     Vectors and lists in 'elementArgs' are not effected by this
     option.

     The 'eo$chunkSize' option is a tuning parameter that specifies the
     number of tasks that sleigh workers should allocate at a time. 
     The default value is 1, but if the tasks are small, performance
     can be improved by specifying a larger value, which decreases the
     overhead per task.

     The 'eo$loadFactor' option is a tuning parameter that specifies
     the maximum number of tasks per worker that are submitted to the
     sleigh at the same time.  If set, no more than '(loadFactor *
     workerCount)' tasks will be submitted at the same time.  This
     helps to control the resource demands that are made on the
     NetWorkSpaces server, which is especially important if there are a
     large number of tasks.  Note that this option is ignored if
     'blocking' is set to 'TRUE', since the two options are
     incompatible with each other.

     The 'eo$blocking' option is used to indicate whether to wait for
     the results, or to return as soon as the tasks have been
     submitted.  If set to 'FALSE', 'eachElem' will return a
     'sleighPending' object that is used to monitor the status of the
     tasks, and to eventually retrieve the results.  You must wait for
     the results to be complete before executing any further tasks on
     the sleigh, or an exception will be raised.  The default value is
     'TRUE'.

     The 'eo$argPermute' option is used to reorder the arguments passed
     to 'fun'.  It is generally only useful if the 'fixedArgs' argument
     has been specified, and some of those arguments need to precede
     the arguments specified via 'elementArgs'.  Note that by using
     recycling of elements in 'elementArgs', the use of 'fixedArgs' and
     'argPermute' can often be avoided entirely.

     The 'DEBUG' argument is used call the 'browser' function upon
     entering 'eachElem'.  The default value is 'FALSE'.

_N_o_t_e:

     If 'elementArgs' or 'fixedArgs' isn't a list, 'eachElem' will
     automatically wrap it in a list.  This is a convenience that only
     works for passing in a single vector and matrix, however.

     If 'elementArgs' or 'fixedArgs' are named lists, then the names
     are used to map the values to the appropriate argument of 'fun'. 
     This can be used as another technique to avoid the use of
     'eo$argPermute'.

     The 'elementArgs' argument can be specified as a data frame. This
     works just like a named list, and therefore, the column names of
     the data frame must all correspond to arguments of 'fun'.  Note
     that if the data frame has many rows, the performance may not be
     good due to the overhead of subsetting data frames in R.

     If the 'fun' function executes very quickly, you may not be able
     to keep your workers busy, giving you poor performance.  In that
     case, consider setting the 'eo$chunkSize' option to a large enough
     number to increase the effective task execution time.

     If you have a huge number of tasks, consider using the
     'eo$elementFunc', 'eo$accumulator', and 'eo$loadFactor' options.

     If in doubt, set the 'eo$loadFactor' option to 10.  That will
     almost certainly avoid putting a big on the NetWorkSpaces server,
     and if that isn't enough to keep your workers busy, then you
     should be really be using the 'eo$chunkSize' option to give the
     workers more to do.

     If 'eo$elementFunc' returns a value that isn't a list, 'eachElem'
     will automatically wrap that value in a list.

     The 'eo$elementFunc' function doesn't have to define a second
     formal argument (the 'by' argument) if it's not needed.

     The 'eo$accumulator' function doesn't have to define a second
     formal argument (the 'taskVector' argument) if it's not needed.
     Just remember that the results are not guaranteed to come back in
     order.

_S_e_e _A_l_s_o:

     'eachWorker', 'sleighPending'

_E_x_a_m_p_l_e_s:

       ## Not run: 
     # create a sleigh
     s <- sleigh()

     # compute the list mean for each list element
     x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
     eachElem(s, mean, list(x))

     # median and quartiles for each list element
     eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))

     # use eo$elementFunc to supply 100 random values and eo$accumulator to
     # receive the results
     elementFunc <- function(i, by) {
       if (i <= 100) list(i=i, x=runif(1)) else stop()
     }
     accumulator <- function(resultList, taskVector) {
       if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
       cat(paste(resultList[[1]], collapse=' '), '\n')
     }
     eo <- list(elementFunc=elementFunc, accumulator=accumulator)
     eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
       ## End(Not run)

