gpd                   package:VGAM                   R Documentation

_G_e_n_e_r_a_l_i_z_e_d _P_a_r_e_t_o _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Maximum likelihood estimation of the 2-parameter generalized 
     Pareto distribution (GPD).

_U_s_a_g_e:

     gpd(threshold = 0, lscale = "loge", lshape = "logoff",
         escale = list(),
         eshape = if(lshape=="logoff") list(offset=0.5) else
                     if(lshape=="elogit") list(min=-0.5, max=0.5) else NULL,
         percentiles = c(90, 95), iscale = NULL, ishape = NULL,
         tshape0=0.001, method.init=1, zero=2)

_A_r_g_u_m_e_n_t_s:

threshold: Numeric, values are recycled if necessary. The threshold
          value(s), called mu below.

  lscale: Parameter link function for the scale parameter sigma. See
          'Links' for more choices.

  lshape: Parameter link function for the shape parameter xi. See
          'Links' for more choices. The default constrains the
          parameter to be greater than -0.5 because if xi <= -0.5 then
          Fisher scoring does not work. See the Details section below
          for more information.

escale, eshape: Extra argument for the 'lscale' and 'lshape' arguments.
          See 'earg' in 'Links' for general information. For the shape
          parameter, if the 'logoff' link is chosen then the offset is
          called A below; and then the second linear/additive predictor
          is log(xi+A) which means that xi > -A. The working weight
          matrices are positive definite if A=0.5.

percentiles: Numeric vector of percentiles used for the fitted values.
          Values should be between 0 and 100. See the example below for
          illustration. However, if 'percentiles=NULL' then the mean mu
          + sigma / (1-xi) is returned; this is only defined if xi<1.

iscale, ishape: Numeric. Optional initial values for sigma and xi. The
          default is to use 'method.init' and compute a value
          internally for each parameter. Values of 'ishape' should be
          between -0.5 and 1. Values of 'iscale' should be positive.

 tshape0: Positive numeric. Threshold/tolerance value for resting
          whether xi is zero. If the absolute value of the estimate of
          xi is less than this value then it will be assumed zero and
          exponential distribution derivatives etc. will be used.

method.init: Method of initialization, either 1 or 2. The first is the
          method of moments, and the second is a variant of this.  If
          neither work, try assigning values to arguments 'ishape'
          and/or 'iscale'.

    zero: An integer-valued vector specifying which linear/additive
          predictors are modelled as intercepts only. The value must be
          from the set {1,2} corresponding respectively to sigma and
          xi. It is often a good idea for the sigma parameter only to
          be modelled through a linear combination of the explanatory
          variables because the shape parameter is probably best left
          as an intercept only: 'zero=2'. Setting 'zero=NULL' means
          both parameters are modelled with explanatory variables.

_D_e_t_a_i_l_s:

     The distribution function of the GPD can be written

         G(y) = 1 - [1 +  xi (y-mu)/  sigma  ]_{+}^{- 1/  xi}

     where mu is the location parameter (known, with value
     'threshold'), sigma > 0 is the scale parameter, xi is the shape
     parameter, and h_+ = max(h,0). The function 1-G is known as the
     _survivor function_. The limit xi -> 0 gives the _shifted
     exponential_ as a special case:

                  G(y) = 1 -  exp[-(y-mu)/  sigma].

     The support is y>mu for xi>0, and mu < y <mu-sigma / xi for xi<0.

     Smith (1985) showed that if xi <= -0.5 then this is known as the
     nonregular case and problems/difficulties can arise both
     theoretically and numerically. For the (regular) case xi > -0.5
     the classical asymptotic theory of maximum likelihood estimators
     is applicable; this is the default.

     Although for xi < -0.5 the usual asymptotic properties do not
     apply, the maximum likelihood estimator generally exists and is
     superefficient for -1 < xi < -0.5, so it is ``better'' than
     normal. When xi < -1 the maximum likelihood estimator generally
     does not exist as it effectively becomes a two parameter problem.

     The mean of Y does not exist unless xi < 1, and the variance does
     not exist unless xi < 0.5.  So if you want to fit a model with
     finite variance use 'lshape="elogit"'.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm' and 'vgam'. However,
     for this 'VGAM' family function, 'vglm' is probably preferred over
     'vgam' when there is smoothing.

_W_a_r_n_i_n_g:

     Fitting the GPD by maximum likelihood estimation can be
     numerically fraught. If 1 + xi*(y-mu)/sigma <= 0 then some crude
     evasive action is taken but the estimation process can still fail.
     This is particularly the case if 'vgam' with 's' is used. Then
     smoothing is best done with 'vglm' with regression splines ('bs'
     or 'ns') because 'vglm' implements half-stepsizing whereas 'vgam'
     doesn't. Half-stepsizing helps handle the problem of straying
     outside the parameter space.

_N_o_t_e:

     The response in the formula of 'vglm' and 'vgam' is y. Internally,
     y-mu is computed.

     With functions 'rgpd', 'dgpd', etc., the argument 'location'
     matches with the argument 'threshold' here.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. and Stephenson, A. G. (2007) Vector generalized linear
     and additive extreme value models. _Extremes_, *10*, 1-19.

     Coles, S. (2001) _An Introduction to Statistical Modeling of
     Extreme Values_. London: Springer-Verlag.

     Smith, R. L. (1985) Maximum likelihood estimation in a class of
     nonregular cases. _Biometrika_, *72*, 67-90.

_S_e_e _A_l_s_o:

     'rgpd', 'meplot', 'gev', 'pareto1', 'vglm', 'vgam', 's'.

_E_x_a_m_p_l_e_s:

     # Simulated data from an exponential distribution (xi=0)
     threshold = 0.5
     y = threshold + rexp(n=3000, rate=2)
     fit = vglm(y ~ 1, gpd(threshold=threshold), trace=TRUE)
     fitted(fit)[1:5,]
     coef(fit, matrix=TRUE)   # xi should be close to 0
     Coef(fit)
     summary(fit)

     fit@extra$threshold  # Note the threshold is stored here

     # Check the 90 percentile
     i = fit@y < fitted(fit)[1,"90%"]
     100*table(i)/sum(table(i))   # Should be 90

     # Check the 95 percentile
     i = fit@y < fitted(fit)[1,"95%"]
     100*table(i)/sum(table(i))   # Should be 95

     ## Not run: 
     plot(fit@y, col="blue", las=1, main="Fitted 90% and 95% quantiles")
     matlines(1:length(fit@y), fitted(fit), lty=2:3, lwd=2)
     ## End(Not run)

     # Another example
     nn = 2000; threshold = 0; x = runif(nn)
     xi = exp(-0.8)-0.5
     y = rgpd(nn, scale=exp(1+0.2*x), shape=xi)
     fit = vglm(y ~ x, gpd(threshold), trace=TRUE)
     coef(fit, matrix=TRUE)

     ## Not run: 
     # Nonparametric fits
     yy = y + rnorm(nn, sd=0.1)
     fit1 = vgam(yy ~ s(x), gpd(threshold), trace=TRUE) # Not so recommended
     par(mfrow=c(2,1))
     plot(fit1, se=TRUE, scol="blue")
     fit2 = vglm(yy ~ bs(x), gpd(threshold), trace=TRUE) # More recommended
     plotvgam(fit2, se=TRUE, scol="blue")
     ## End(Not run)

