gumbel                 package:VGAM                 R Documentation

_G_u_m_b_e_l _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Maximum likelihood estimation of the 2-parameter Gumbel
     distribution.

_U_s_a_g_e:

     gumbel(llocation = "identity", lscale = "loge",
            elocation = list(), escale = list(),
            iscale = NULL, R = NA, percentiles = c(95, 99), mpv = FALSE,
            zero = NULL)
     egumbel(llocation = "identity", lscale = "loge",
             elocation = list(), escale = list(),
             iscale = NULL, R = NA, percentiles = c(95, 99), mpv = FALSE,
             zero = NULL)

_A_r_g_u_m_e_n_t_s:

llocation, lscale: Parameter link functions for mu and sigma. See
          'Links' for more choices.

elocation, escale: Extra argument for the 'llocation' and 'lscale'
          arguments. See 'earg' in 'Links' for general information.

  iscale: Numeric and positive.  Optional initial value for sigma.
          Recycled to the appropriate length. In general, a larger
          value is better than a smaller value. A 'NULL' means an
          initial value is computed internally. 

       R: Numeric. Maximum number of values possible. See *Details* for
          more details. 

percentiles: Numeric vector of percentiles used for the fitted values.
          Values should be between 0 and 100. This argument uses the
          argument 'R' if assigned. If 'percentiles=NULL' then the mean
          will be returned as the fitted values.

     mpv: Logical. If 'mpv=TRUE' then the _median predicted value_
          (MPV) is computed and returned as the (last) column of the
          fitted values. This argument is ignored if
          'percentiles=NULL'. See *Details* for more details. 

    zero: An integer-valued vector specifying which linear/additive
          predictors are modelled as intercepts only.  The value
          (possibly values) must be from the set {1,2} corresponding
          respectively to mu and sigma.  By default all linear/additive
          predictors are modelled as a linear combination of the
          explanatory variables.

_D_e_t_a_i_l_s:

     The Gumbel distribution is a generalized extreme value (GEV) 
     distribution with _shape_ parameter xi=0. Consequently it is more
     easily estimated than the GEV. See 'gev' for more details.

     The quantity R is the maximum number of observations possible, for
     example, in the Venice data below, the top 10 daily values are
     recorded for each year, therefore  R=365 because there are about
     365 days per year. The MPV is the value of the response such that
     the probability of obtaining a value greater than the MPV is 0.5
     out of R observations. For  the Venice data, the MPV is the sea
     level such that there is an even chance that the highest level for
     a particular year exceeds the MPV. When 'mpv=TRUE', the column
     labelled  '"MPV"' contains the MPVs when 'fitted()' is applied to
     the fitted object.

     The formula for the mean of a response Y is mu+sigma times Euler
     where Euler is a constant that has value approximately equal to
     0.5772. The formula for the percentiles are (if 'R' is not given)
     location- scale*log[-log(P/100)] where P is the 'percentile'
     argument value(s). If 'R' is given then the percentiles are
     location- scale*log[-log(R*(1-P/100))].

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm', and 'vgam'.

_W_a_r_n_i_n_g:

     When 'R' is not given (the default) the fitted percentiles are
     that of the data, and not of the overall population. For example,
     in the example below, the 50 percentile is approximately the
     running median through the data, however, the data are the highest
     sea level measurements recorded each year (it therefore equates to
     the median predicted value or MPV).

_N_o_t_e:

     'egumbel()' only handles a univariate response, and is preferred
     to 'gumbel()' because it is faster.

     'gumbel()' can handle a multivariate response, i.e., a matrix with
     more than one column. Each row of the matrix is sorted into
     descending order. Missing values in the response are allowed but
     require 'na.action=na.pass'. The response matrix needs to be
     padded with any missing values. With a multivariate response one
     has a matrix 'y', say, where 'y[,2]' contains the second order
     statistics etc.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. and Stephenson, A. G. (2007) Vector generalized linear
     and additive extreme value models. _Extremes_, *10*, 1-19.

     Smith, R. L. (1986) Extreme value theory based on the _r_ largest
     annual events. _Journal of Hydrology_, *86*, 27-43.

     Rosen, O. and Cohen, A. (1996) Extreme percentile regression. In:
     Haerdle, W. and Schimek, M. G. (eds.),  _Statistical Theory and
     Computational Aspects of Smoothing: Proceedings of the COMPSTAT
     '94 Satellite Meeting held in Semmering, Austria, 27-28 August
     1994_, pp.200-214, Heidelberg: Physica-Verlag.

     Coles, S. (2001) _An Introduction to Statistical Modeling of
     Extreme Values_. London: Springer-Verlag.

_S_e_e _A_l_s_o:

     'rgumbel', 'cgumbel', 'guplot', 'gev', 'egev', 'venice'.

_E_x_a_m_p_l_e_s:

     # Example 1: Simulated data
     y = rgumbel(n=1000, loc = 100, scale=exp(1))
     fit = vglm(y ~ 1, egumbel(perc=NULL), trace=TRUE)
     coef(fit, matrix=TRUE)
     Coef(fit)
     fitted(fit)[1:4,]
     mean(y)

     # Example 2: Venice data
     data(venice)
     (fit = vglm(cbind(r1,r2,r3,r4,r5) ~ year, data=venice,
                gumbel(R=365, mpv=TRUE), trace=TRUE))
     fitted(fit)[1:5,]
     coef(fit, mat=TRUE)
     vcov(summary(fit))  
     sqrt(diag(vcov(summary(fit))))   # Standard errors

     # Example 3: Try a nonparametric fit ---------------------
     # Use the entire data set, including missing values
     y = as.matrix(venice[,paste("r",1:10,sep="")])
     fit1 = vgam(y ~ s(year, df=3), gumbel(R=365, mpv=TRUE),
                 data=venice, trace=TRUE, na.action=na.pass)
     fit1@y[4:5,]  # NAs used to pad the matrix

     ## Not run: 
     # Plot the component functions
     par(mfrow=c(2,1), mar=c(5,4,.2,1)+0.1, xpd=TRUE)
     plot(fit1, se=TRUE, lcol="blue", scol="green", lty=1,
          lwd=2, slwd=2, slty="dashed")

     # Quantile plot --- plots all the fitted values
     par(mfrow=c(1,1), bty="l", mar=c(4,4,.2,3)+0.1, xpd=TRUE, las=1)
     qtplot(fit1, mpv=TRUE, lcol=c(1,2,5), tcol=c(1,2,5), lwd=2,
            pcol="blue", tadj=0.1, ylab="Sea level (cm)")

     # Plot the 99 percentile only
     par(mfrow=c(1,1), mar=c(3,4,.2,1)+0.1, xpd=TRUE)
     year = venice[["year"]]
     matplot(year, y, ylab="Sea level (cm)", type="n")
     matpoints(year, y, pch="*", col="blue")
     lines(year, fitted(fit1)[,"99%"], lwd=2, col="red")

     # Check the 99 percentiles with a smoothing spline.
     # Nb. (1-0.99) * 365 = 3.65 is approx. 4, meaning the 4th order 
     # statistic is approximately the 99 percentile.
     par(mfrow=c(1,1), mar=c(3,4,2,1)+0.1, xpd=TRUE, lwd=2)
     plot(year, y[,4], ylab="Sea level (cm)", type="n",
          main="Red is 99 percentile, Green is a smoothing spline")
     points(year, y[,4], pch="4", col="blue")
     lines(year, fitted(fit1)[,"99%"], lty=1, col="red")
     lines(smooth.spline(year, y[,4], df=4), col="darkgreen", lty=2)
     ## End(Not run)

