![]() |
|
Data Fields | |
| apop_data * | data |
| struct loess_struct | lo_s |
| int | want_predict_ci |
| double | ci_level |
The code for the loess system is based on FORTRAN code from 1988, overhauled in 1992, linked in to Apophenia in 2009. The structure that does all the work, then, is a loess_struct that you should basically take as opaque.
The useful settings from that struct re-appear in the apop_loess_settings struct so you can set them directly, and then the settings init function will copy your preferences into the working struct.
The documentation for the elements is cut/pasted/modified from Cleveland, Grosse, and Shyu.
.data: Mandatory. Your input data set.
<tt>.lo_s.model.span</tt>: smoothing parameter. Default is 0.75.
<tt>.lo_s.model.degree</tt>: overall degree of locally-fitted polynomial. 1 is
locally-linear fitting and 2 is locally-quadratic fitting. Default is 2.
<tt>.lo_s.normalize</tt>: Should numeric predictors
be normalized? If 'y' - the default - the standard normalization
is used. If 'n', no normalization is carried out.
\c .lo_s.model.parametric: for two or more numeric predictors, this argument
specifies those variables that should be
conditionally-parametric. The argument should be a logical
vector of length p, specified in the order of the predictor
group ordered in x. Default is a vector of 0's of length p.
\c .lo_s.model.drop_square: for cases with degree = 2, and with two or more
numeric predictors, this argument specifies those numeric
predictors whose squares should be dropped from the set of
fitting variables. The method of specification is the same as
for parametric. Default is a vector of 0's of length p.
\c .lo_s.model.family: the assumed distribution of the errors. The values are
<tt>"gaussian"</tt> or <tt>"symmetric"</tt>. The first value is the default.
If the second value is specified, a robust fitting procedure is used.
\c lo_s.control.surface: determines whether the fitted surface is computed
<tt>"directly"</tt> at all points or whether an <tt>"interpolation"</tt>
method is used. The default, interpolation, is what most users should use
unless special circumstances warrant.
\c lo_s.control.statistics: determines whether the statistical quantities are
computed <tt>"exactly"</tt> or approximately, where <tt>"approximate"</tt>
is the default. The former should only be used for testing the approximation in
statistical development and is not meant for routine usage because computation
time can be horrendous.
\c lo_s.control.cell: if interpolation is used to compute the surface,
this argument specifies the maximum cell size of the k-d tree. Suppose k =
floor(n*cell*span) where n is the number of observations. Then a cell is
further divided if the number of observations within it is greater than or
equal to k. default=0.2
\c lo_s.control.trace_hat: Options are <tt>"approximate"</tt>, <tt>"exact"</tt>, and <tt>"wait.to.decide"</tt>.
When lo_s.control.surface is <tt>"approximate"</tt>, determines
the computational method used to compute the trace of the hat
matrix, which is used in the computation of the statistical
quantities. If "exact", an exact computation is done; normally
this goes quite fast on the fastest machines until n, the number
of observations is 1000 or more, but for very slow machines,
things can slow down at n = 300. If "wait.to.decide" is selected,
then a default is chosen in loess(); the default is "exact" for
n < 500 and "approximate" otherwise. If surface is "exact", an
exact computation is always done for the trace. Set trace_hat to
"approximate" for large dataset will substantially reduce the
computation time.
\c lo_s.model.iterations: if family is <tt>"symmetric"</tt>, the number of iterations
of the robust fitting method. Default is 0 for
lo_s.model.family = gaussian; 4 for family=symmetric.
That's all you can set. Here are some output parameters:
\c fitted_values: fitted values of the local regression model
\c fitted_residuals: residuals of the local regression fit
\c enp: equivalent number of parameters.
\c s: estimate of the scale of the residuals.
\c one_delta: a statistical parameter used in the computation of standard errors.
\c two_delta: a statistical parameter used in the computation of standard errors.
\c pseudovalues: adjusted values of the response when robust estimation is used.
\c trace_hat: trace of the operator hat matrix.
\c diagonal: diagonal of the operator hat matrix.
\c robust: robustness weights for robust fitting.
\c divisor: normalization divisor for numeric predictors. | double apop_loess_settings::ci_level |
If running a prediction, the level at which to calculate the confidence interval. default: 0.95
| int apop_loess_settings::want_predict_ci |
If 'y' (the default), calculate the confidence bands for predicted values