gpd                   package:VGAM                   R Documentation

_G_e_n_e_r_a_l_i_z_e_d _P_a_r_e_t_o _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Maximum likelihood estimation of the 2-parameter generalized 
     Pareto distribution (GPD).

_U_s_a_g_e:

     gpd(threshold = 0, lscale = "loge", lshape = "logoff",
         escale = list(),
         eshape = if(lshape=="logoff") list(offset=0.5) else
                     if(lshape=="elogit") list(min=-0.5, max=0.5) else NULL,
         percentiles = c(90, 95), iscale = NULL, ishape = NULL,
         tshape0=0.001, method.init=1, zero = 2)

_A_r_g_u_m_e_n_t_s:

threshold: Numeric of length 1. The threshold value.  Only values of
          the response which are greater than this value are kept. The
          response actually worked on internally is the difference.
          Only those observations greater than the threshold value are
          returned in the 'y' slot of the object.

  lscale: Parameter link function for the scale parameter sigma. See
          'Links' for more choices.

  lshape: Parameter link function for the shape parameter xi. See
          'Links' for more choices. The default constrains the
          parameter to be greater than -0.5 (the negative of 'Offset').
          This is because if xi <= -0.5 then Fisher scoring does not
          work. However, it can be a little more interpretable if
          'Offset=1'. See the Details section below for more
          information.

escale, eshape: Extra argument for the 'lscale' and 'lshape' arguments.
          See 'earg' in 'Links' for general information. For the shape
          parameter, if the 'logoff' link is chosen then the offset is
          called A below; and then the second linear/additive predictor
          is log(xi+A) which means that xi > -A. The working weight
          matrices are positive definite if A=0.5.

percentiles: Numeric vector of percentiles used for the fitted values.
          Values should be between 0 and 100. See the example below for
          illustration. However, if 'percentiles=NULL' then the mean mu
          + sigma / (1-xi) is returned; this is only defined if xi<1.

iscale, ishape: Numeric. Optional initial values for sigma and xi. The
          default is to use 'method.init' and compute a value
          internally for each parameter. Values of 'ishape' should be
          between -0.5 and 1. Values of 'iscale' should be positive.

 tshape0: Positive numeric. Threshold/tolerance value for resting
          whether xi is zero. If the absolute value of the estimate of
          xi is less than this value then it will be assumed zero and
          exponential distribution derivatives etc. will be used.

method.init: Method of initialization, either 1 or 2. The first is the
          method of moments, and the second is a variant of this.  If
          neither work try assigning values to arguments 'ishape'
          and/or 'iscale'.

    zero: An integer-valued vector specifying which linear/additive
          predictors are modelled as intercepts only. The value must be
          from the set {1,2} corresponding respectively to sigma and
          xi. It is often a good idea for the sigma parameter only to
          be modelled through a linear combination of the explanatory
          variables because the shape parameter is probably best left
          as an intercept only: 'zero=2'. Setting 'zero=NULL' means
          both parameters are modelled with explanatory variables.

_D_e_t_a_i_l_s:

     The distribution function of the GPD can be written

         G(y) = 1 - [1 +  xi (y-mu)/  sigma  ]_{+}^{- 1/  xi}

     where mu is the location parameter (known with value 'threshold'),
     sigma > 0 is the scale parameter, xi is the shape parameter, and
     h_+ = max(h,0). The function 1-G is known as the _survivor
     function_. The limit xi -> 0 gives the _shifted exponential_ as a
     special case:

                  G(y) = 1 -  exp[-(y-mu)/  sigma].

     The support is y>mu for xi>0, and mu < y <mu-sigma / xi for xi<0.

     Smith (1985) showed that if xi <= -0.5 then this is known as the
     nonregular case and problems/difficulties can arise both
     theoretically and numerically. For the (regular) case xi > -0.5
     the classical asymptotic theory of maximum likelihood estimators
     is applicable; this is the default because 'Offset=0.5'.

     Although for xi < -0.5 the usual asymptotic properties do not
     apply, the maximum likelihood estimator generally exists and is
     superefficient for -1 < xi < -0.5, so it is ``better'' than
     normal. To allow for -1 < xi < -0.5 set 'Offset=1'.  When xi < -1
     the maximum likelihood estimator generally does not exist as it
     effectively becomes a two parameter problem.

     The mean of Y does not exist unless xi < 1, and the variance does
     not exist unless xi < 0.5.  So if you want to fit a model with
     finite variance use 'lshape="elogit"'.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm' and 'vgam'. However,
     for this 'VGAM' family function, 'vglm' is probably preferred over
     'vgam' when there is smoothing.

_W_a_r_n_i_n_g:

     Fitting the GPD by maximum likelihood estimation can be
     numerically fraught. If 1 + xi*(y-mu)/sigma <= 0 then some crude
     evasive action is taken but the estimation process can still fail.
     This is particularly the case if 'vgam' with 's' is used. Then
     smoothing is best done with 'vglm' with regression splines ('bs'
     or 'ns') because 'vglm' implements half-stepsizing whereas 'vgam'
     doesn't. Half-stepsizing helps handle the problem of straying
     outside the parameter space.

_N_o_t_e:

     With functions 'rgpd', 'dgpd', etc., the argument 'location'
     matches with the argument 'threshold' here.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Coles, S. (2001) _An Introduction to Statistical Modeling of
     Extreme Values_. London: Springer-Verlag.

     Smith, R. L. (1985) Maximum likelihood estimation in a class of
     nonregular cases. _Biometrika_, *72*, 67-90.

_S_e_e _A_l_s_o:

     'rgpd', 'meplot', 'gev', 'pareto1'.

_E_x_a_m_p_l_e_s:

     # Simulated data from an exponential distribution (xi=0)
     y = rexp(n=3000, rate=2)
     fit = vglm(y ~ 1, gpd(threshold=0.5), trace=TRUE)
     fitted(fit)[1:5,]
     coef(fit, matrix=TRUE)   # xi should be close to 0
     Coef(fit)
     summary(fit)

     yt = y[y>fit@extra$threshold]  # Note the threshold is stored here
     all.equal(c(yt), c(fit@y))     # TRUE
     # Check the 90 percentile
     i = yt < fitted(fit)[1,"90%"]
     100*table(i)/sum(table(i))   # Should be 90

     # Check the 95 percentile
     i = yt < fitted(fit)[1,"95%"]
     100*table(i)/sum(table(i))   # Should be 95

     ## Not run: 
     plot(yt, col="blue", las=1, main="Fitted 90% and 95% quantiles")
     matlines(1:length(yt), fitted(fit), lty=2:3, lwd=2)
     ## End(Not run)

     # Another example
     nn = 2000; threshold = 0; x = runif(nn)
     xi = exp(-0.8)-0.5
     y = rgpd(nn, scale=exp(1+0.2*x), shape=xi)
     fit = vglm(y ~ x, gpd(threshold), trace=TRUE)
     Coef(fit)
     coef(fit, matrix=TRUE)

