This sections introduces the Extreme Value Theory and presents the proposed methodology of this work.
Extreme value theory
Extreme Value Theory (EVT) is associated to the maximum sample (M_n = text {max} (X_1, ldots , X_n)), where ((X_1, ldots , X_n)) is a set of independent random variables with common distribution function F. In this case, the distribution of the maximum observation is given by (Pr(M_n < x) = F^n(x)). The hypothesis of independence when the X variables represent the wave height over a determined threshold is quite acceptable, because, for oceanographic data, it is common to adopt a POT scheme which selects extreme wave height events that are approximately independent25. Also, in26, authors affirm that “The maximum wave heights in successive sea states can be considered independent, in the sense that the maximum height is dependent only on the sea state parameters and not in the maximum height in adjacent sea states”. This (M_n) variable is described with one of the three following distributions: Gumbel, Frechet, and Weibull.
One methodology in EVT is to consider wave height time series with the annual maximum approach (AM)27, where X represents the wave height collected on regular periods of time of one year, and (M_n) is formed by the maximum values of each year. The statistical behaviour of AM can be described by the distribution of the maximum wave height in terms of Generalized Extreme Value (GEV) distribution:
$$begin{aligned}{}&G(x)= left{ begin{matrix} exp left{ -left[ 1 + xi left( frac{x – mu }{sigma } right) right] ^{frac{1}{xi }} right} , &{} xi ne 0,\ exp left{ -exp left( – left( frac{x – mu }{sigma } right) right) right} , &{} xi = 0, end{matrix} right. end{aligned}$$
(1)
where:
$$begin{aligned}{}&0< x < 1 + xi left( frac{x-mu }{sigma } right) , end{aligned}$$
(2)
where (-infty< mu < infty ), (sigma > 0) and (-infty< xi < infty ). As can be seen, the model has three parameters: location ((mu )), scale ((sigma )), and shape ((xi )).
The estimation of the return values, corresponding to the return period ((T_p)), are obtained by inverting Eq. (1):
$$ z_{p} = left{ {begin{array}{*{20}l} {mu – frac{sigma }{xi }left[ {1 – left{ { – log (1 – p)} right}^{{ – xi }} } right],} & {xi ne 0,} \ {mu – sigma log left{ { – log(1 – p)} right},} & {xi = 0,} \ end{array} } right}{text{ }} $$
(3)
where (G(z_p) = 1 – p). Then, (z_p) will be exceeded once per 1/p years, which corresponds to (T_p).
The alternative method in the EVT context is the Peak-Over-Threshold (POT), where all values over a threshold predefined by the user are selected to be statistically described instead of only the maximum values28,29. POT method has become a standard approach for these predictions13,29,30. Furthermore, several improvements over the basic approach have been proposed by various authors since then19,31,32.
The POT method is based on the fact that if the AM approach uses a GEV distribution (Eq. 1), the peaks over a high threshold should result in the related approximated distribution: the Generalized Pareto Distribution (GPD). The GPD fitted to the tail of the distribution gives the conditional non-exceedance probability (P(Xle x | X > u)), where u is the threshold level. The conditional distribution function can be calculated as:
$$ P(X le x|X{text{ > }}u) = left{ {begin{array}{*{20}l} {1 – left( {1 + xi ^{*} left( {frac{{x – u}}{{sigma ^{*} }}} right)} right)^{{frac{1}{{xi ^{*} }}}} ,} & {xi ^{*} ne 0,} \ {1 – exp left( { – left( {frac{{x – u}}{{sigma ^{*} }}} right)} right),} & {xi ^{*} = 0.} \ end{array} } right. $$
(4)
There is consistency between the GEV and GPD models, meaning that the parameters can be related to (xi ^* = epsilon ) and (sigma ^* = sigma + xi (u – mu )). The parameters (sigma ) and (xi ) are the scale and shape parameters, respectively. When (xi ge 0), the distribution is referred to as long tailed. When (xi < 0), the distribution is referred to as short tailed. The methods used to estimate the parameters of the GPD and the selection of the threshold will be now discussed.
The use of the GPD for modelling the tail of the distribution is also justified by asymptotic arguments in14. In this paper, author confirms that it is usually more convenient to interpret extreme value models in terms of return levels, rather than individual parameters. In order to obtain these return levels, the exceedance rates of thresholds have to be determined as (P(X>u)). In this way, using Eq. (4) ((P(X>x|X>u)=P(X>x)/P(X>u))) and considering that (z_N) is exceeded on average every N observations, we have:
$$begin{aligned}{}&P(X>u)left[ 1+xi ^*left( frac{z_N – u}{sigma ^*}right) right] ^{-frac{1}{xi ^*}}=frac{1}{N}. end{aligned}$$
(5)
Then, the N-year return level (z_N) is obtained as:
$$begin{aligned}{}&z_N = u + frac{sigma ^*}{xi ^*}left[ (N * P(X>u))^{xi ^ *} – 1 right] . end{aligned}$$
(6)
There are many techniques proposed for the estimation of the parameters of GEV and GPD. In19, authors applied the maximum likelihood methodology (ML) described in14. However, the use of this methodology for two parameter distributions (i.e. Weibull or Gamma) has a very important drawback: these distributions are very sensitive to the distance between the high threshold ((u_2)) and the first peak16. For this reason, ML could be used with two-parameter distribution when (u_2) reaches a peak. As this peak is excluded, the first value of the exceedance is as far from (u_2) as possible. A solution would be to use the three-parameter Weibull and Gamma distributions. However, ML estimation of such distributions is very difficult, and the algorithms usually fit two-parameter distributions inside a discrete range of location parameters33.
Proposed methodology
As stated before, in this paper, we present a new methodology to model this kind of time series considering not only extreme values but also the rest of observations. In this way, instead of selecting the maximum values per a period (usually a year) or defining thresholds in the distribution of these extreme wave heights, which has an appreciable subjective component, we model the distribution of all wave heights, considering that it is a mixture formed by a normal distribution and a uniform distribution. The motivation is that the uniform distribution is associated to regular wave height values which contaminate the normal distribution of extreme values. This theoretical mixed distribution is used then to fix the threshold for the estimation of the POT distributions. Thus, the determination of the threshold will be done in a much more objective and probabilistic way.
Let us consider as a sequence of independent random variables, ((X_1, ldots , X_n)) of wave height data. These data follow an unknown continuous distribution. We assume that this distribution is a mixture of two independent distributions: (Y_1 sim N(mu , sigma )), and (Y_2 sim U(mu – delta , mu + delta )), where (N(mu , sigma )) is a Gaussian distribution, (U(mu – delta , mu + delta )) is a uniform distribution, (mu > 0) is the common mean of both distributions, (sigma ) is the standard deviation of (Y_1), and (delta ) is the radius of (Y_2), being (mu – delta > 0). Then (f(x) = gamma f_1(x) + (1-gamma ) f_2(x)), being (gamma ) the probability that an observation comes from the normal distribution, and f(x), (f_1(x)) and (f_2(x)) are the probability density functions (pdf) of X, (Y_1) and (Y_2), respectively.
For the estimation of the values of the four above-mentioned parameters ((mu , sigma , delta , gamma )), the standard statistical theory considers the least squares methods, the method of moments and the maximum likelihood (ML) method. In this context, Mathiesen et al.13 found that the least squares methods are sensitive to outliers, although Goda34 recommended this method with modified plotting position formulae.
Clauset et al.35 show that methods based on least-squares fitting for the estimation of probability-distribution parameters can have many problems, and, usually, the results are biased. These authors propose the method of ML for fitting parametrized models such as power-law distributions to observed data, given that ML provably gives accurate parameter estimates in the limit of large sample size36. The ML method is commonly used in multiple applications, e.g. in metocean applications25, due to its asymptotic properties of being unbiased and efficient. In this regard, White et al.37 conclude that ML estimation outperforms the other fitting methods, as it always yields the lowest variance and bias of the estimator. This is not unexpected, as the ML estimator is asymptotically efficient37,38. Also, in Clauset et al.35, it is shown, among other properties, that under mild regularity conditions, the ML estimation ({hat{alpha }}) converges almost surely to the true (alpha ), when considering estimating the scaling parameter ((alpha )) of a power law in the case of continuous data. It is asymptotically Gaussian, with variance ((alpha -1)^2/n). However, the ML estimators do not achieve these asymptotic properties until they are applied to large sample sizes. Hosking and Wallis39 showed that the ML estimators are non-optimal for sample sizes up to 500, with higher bias and variance than other estimators, such as moments and probability weighted-moments estimators.
Deluca and Corral40 also presented the estimation of a single parameter (alpha ) associated with a truncated continuous power-law distribution. In order to find the ML estimator of the exponent, they proceed by directly maximizing the log-likelihood (l(alpha )). The reason is practical since their procedure is part of a more general method, valid for arbitrary distributions f(x), for which the derivative of (l(alpha )) can be challenging to evaluate. They claim that one needs to be cautious when the value of (alpha ) is very close to one in the maximization algorithm and replace (l(alpha )) by its limit at (alpha =1).
Furthermore, the use of ML estimation for two-parameter distributions such as Weibull and Gamma distributions has the drawback16 previously discussed. Besides, the ML estimation is known to provide poor results when the maximum is at the limit of the interval of validity of one of the parameters. On the other hand, the estimation of the GPD parameters is subject of ongoing research. A quantitative comparison of recent methods for estimating the parameters was presented by Kang and Song41. In our case, having to estimate four parameters, we have decided to use the method of moments, for its analytical simplicity. It is always an estimation method associated with sample and population moments. Besides, adequate estimations are obtained in multi-parametric estimation and with limited samples, as shown in this work.
Considering (phi ) as the pdf of a standard normal distribution N(0, 1), the pdf of (Y_1) is defined as:
$$begin{aligned}{}&f_1(x) = frac{1}{sigma }phi (z_x), text { } z_x = frac{x – mu }{sigma }, text { } x in {mathbb {R}}. end{aligned}$$
(7)
The pdf of (Y_2) is:
$$begin{aligned}{}&f_2(x) = frac{1}{2delta }, text { } x in (mu -delta ,mu +delta ). end{aligned}$$
(8)
Consequently, the pdf of X is:
$$begin{aligned}{}&f(x) = gamma f_1(x) + (1 – gamma ) f_2(x), text { } x in {mathbb {R}}. end{aligned}$$
(9)
To infer the values of the four parameters of the wave height time series ((mu ), (sigma ), (delta ), (gamma )), we define, for any symmetric random variable with respect to the mean (mu ) with pdf g and finite moments, a set of functions in the form:
$$begin{aligned}{}&U_k(x) = int _{|t – mu | ge x}|t – mu |^k g(t)dt, text { } x ge 0, text { } k = 1,2,3, ldots , end{aligned}$$
(10)
or because of its symmetry:
$$begin{aligned}{}&U_k(x) = 2int _{x+mu }^{infty }(t – mu )^k g(t)dt, text { } k = 1,2,3, ldots . end{aligned}$$
(11)
These functions are well defined for the same moments of the variable x, because:
$$begin{aligned}{}&U_k(x)< int _{-infty }^{infty }|t – mu |^k g(t)dt < infty , text { } k = 1,2,3, ldots . end{aligned}$$
(12)
Particularly, for the normal and uniform distributions, all the moments are finite, and the same happens for all the (U_k(x)) functions. This function measures, for each pair of values x and k, the bilateral tail from the value x of the moment with respect to the mean of order k of the variable. It is, therefore, a generalization of the concept of probability tail, which is obtained for (k = 0).
Now, if we denote the corresponding moments for the distributions (Y_1) and (Y_2) by (U_{k,1}(x)) and (U_{k,2}(x)), it is verified that:
$$begin{aligned}{}&U_k(x) = gamma U_{k,1}(x) + (1 – gamma ) U_{k,2}(x). end{aligned}$$
(13)
Then, to calculate the function (U_k(x)), we just need to calculate the functions (U_{k,1}(x)) and (U_{k,2}(x)).
Calculation (U_k) for the uniform distribution ((U_{k,2}))
From the definition of (f_2(x)) and (U_k(x)), if (mu > delta ):
$$begin{aligned}{}&U_{k,2}(x) = 2 int _{mu +x}^{mu +delta }(t-mu )^kfrac{1}{2delta }dt = left. frac{(t-mu )^{k+1}}{(k+1)delta } right| _{mu +x}^{mu +delta } = frac{delta ^{k+1} – x^{k+1}}{(k+1)delta }, end{aligned}$$
(14)
then,
$$begin{aligned} & U_{k,2}(x) =left{ begin{array}{ll} frac{delta ^{k+1} – x^{k+1}}{(k+1)delta } & 0 le x le delta ,\ quad quad 0 & quad x >delta . end{array}right. end{aligned}$$
(15)
Calculation (U_k) for the normal distribution ((U_{k,1}))
From the definition of the (f_1(x)) and (U_k(x)), we have:
$$begin{aligned}{}&U_{k,1}(x)=frac{2}{sigma }int _{mu +x}^infty (t-mu )^kphi left( frac{t-mu }{sigma } right) dt. end{aligned}$$
(16)
Let the variable u be in the form (u = frac{t-mu }{sigma }), then:
$$begin{aligned}{}&U_{k,1}(x)=2int _{frac{x}{sigma }}^infty (usigma )^kphi (u)du = sigma ^kUpsilon _kleft( frac{x}{sigma } right) , end{aligned}$$
(17)
where (Upsilon _k = 2 int _{x}^infty (u)^kphi (u)du). (Upsilon _k(z)) is the (U_k) function calculated for a N(0, 1) distribution, which will be then updated with values of (k=1,2,3).
Proposition I
The following equations are verified:
$$begin{aligned}{}&Upsilon _1(x) = 2 int _{x}^infty uphi (u)du = 2phi (x),end{aligned}$$
(18)
$$begin{aligned}{}&Upsilon _2(x) = 2 int _{x}^infty u^2phi (u)du = 2(1-Phi (x)+xphi (x)),end{aligned}$$
(19)
$$begin{aligned}{}&Upsilon _3(x) = 2 int _{x}^infty u^3phi (u)du = 2(2+x^2)phi (x), end{aligned}$$
(20)
where (Phi ) is the cumulative distribution function (CDF) of the N(0, 1) distribution. The demonstration is included below.
The three equations can be obtained using integration by parts, but it is easier to derive the functions (Upsilon _k(x)) to check the result. For the definition of the functions, for each value of k, we have:
$$begin{aligned}{}&Upsilon _k^{‘}(x) = frac{partial Upsilon _k(x)}{partial x} = -2x^kphi (x). end{aligned}$$
(21)
Taking into account that (frac{partial phi (x)}{partial x}=-xphi (x)), and (frac{partial Phi (x)}{partial x}=phi (x)):
$$begin{aligned} frac{partial 2phi (x)}{partial x}&= -2xphi (x)= Upsilon _1^{‘}(x), end{aligned}$$
(22)
$$begin{aligned} frac{partial (2(1-Phi (x)+xphi (x)))}{partial x}&= 2(-phi (x)+phi (x)-x^2phi (x)) = nonumber \&=-2x^2phi (x)= Upsilon _2^{‘}(x), end{aligned}$$
(23)
$$begin{aligned} frac{partial (2(2+x^2)phi (x))}{partial (x)}&=2(2xphi (x)-(2+x^2)xphi (x))=nonumber \&=-2x^3phi (x)=Upsilon _3^{‘}(x). end{aligned}$$
(24)
Therefore, the left and right sides of the previous equations differ in, at most, a constant. To verify that they are the same, we check the value (x=0):
$$begin{aligned}{}&Upsilon _1(0) = 2 int _0^infty u phi (u)du = sqrt{frac{2}{pi }},end{aligned}$$
(25)
$$begin{aligned}{}&Upsilon _2(0) = 2 int _0^infty u^2 phi (u)du = 1,end{aligned}$$
(26)
$$begin{aligned}{}&Upsilon _3(0) = 2 int _0^infty u^3 phi (u)du = 2 sqrt{frac{2}{pi }}, end{aligned}$$
(27)
which match with the right sides of Eqs. (18)–(20):
$$begin{aligned}{}&Upsilon _1(0) = 2 phi (0) = sqrt{frac{2}{pi }},end{aligned}$$
(28)
$$begin{aligned}{}&Upsilon _2(0) = 2 (1-Phi (0)) = 1,end{aligned}$$
(29)
$$begin{aligned}{}&Upsilon _3(0) = 2 (2)phi (0) = 2 sqrt{frac{2}{pi }}. end{aligned}$$
(30)
Substituting these results in Eq. (17) we have:
$$begin{aligned}{}&U_{1,1}=sigma Upsilon _1 left( frac{x}{sigma }right) = 2 sigma phi left( frac{x}{sigma }right) ,end{aligned}$$
(31)
$$begin{aligned}{}&U_{2,1}=sigma ^2 Upsilon _2 left( frac{x}{sigma }right) = 2sigma ^2 left( 1 – Phi left( frac{x}{sigma } right) + frac{x}{sigma }phi left( frac{x}{sigma } right) right) ,end{aligned}$$
(32)
$$begin{aligned}{}&U_{3,1}=sigma ^3 Upsilon _3 left( frac{x}{sigma }right) = 2 sigma ^3left( 2 + left( frac{x}{sigma } right) ^2 right) phi left( frac{x}{sigma }right) . end{aligned}$$
(33)
These functions will be the base to estimate the parameters of the distribution of variable X, except in the case of (mu ), as we will comment later. The estimates will be made with the corresponding (U_k) sample estimates, defined in the following Section.
Sample estimates of (U_k)
For each value of k and (x ge 0), the sample estimator of (U_k) obtained by the method of moments is:
$$begin{aligned}{}&u_k(x)=frac{1}{n} sum _{|x_i-mu |ge x}|x_i – mu |^k, end{aligned}$$
(34)
which has the properties described in the following propositions.
Proposition II
The estimator (u_k(x)) is an unbiased estimator of (U_k(x)). For the demonstration, we first rewrite (u_k) in the form:
$$begin{aligned}{}&u_k(x)=frac{1}{n} sum _{i=1}^n|x_i – mu |^kI{|x_i – mu |ge x}, end{aligned}$$
(35)
where I is the indicator function. Considering the previous expression, we check the condition of an unbiased estimator:
$$begin{aligned} E(u_k(x))&= frac{1}{n} sum _{i=1}^n E(|x_i – mu |^kI{|t – mu |ge x}) = nonumber \&=E(|t – mu |^kI{|t – mu |ge x} = nonumber \&= int _{|t – mu |ge x}|t-mu |^kg(t)dt = U_k(x). end{aligned}$$
(36)
Proposition III
The estimator (u_k(x)) is a consistent estimator of (U_k(x)). Considering again Eq. (35) for the variance of (u_k(x)) we have:
$$begin{aligned}{}&V(u_k(x)) = nonumber \&quad =frac{1}{n^2} sum _{i=1}^n V(|x_i – mu |^kI{|t – mu |ge x}) = frac{1}{n}V(|t – mu |^kI{|t – mu |ge x}) = nonumber \&quad =frac{1}{n}left( E (|t – mu |^{2k}I{|t – mu |ge x}) – E^2(|t – mu |^kI{|t – mu |ge x})right) =nonumber \&quad =frac{1}{n}(U_{2k}(x)-U_{k}^2(x))overset{nrightarrow infty }{rightarrow }0, end{aligned}$$
(37)
taking into account that (I^2{.} = I{.}).
Parameter estimation of the mixed distribution of X
The estimates are based on the (u_k(0)) values, for (k=1,2,3), which estimate the corresponding population parameters.
Estimation of (mu ) Given that the mean value of both distributions (uniform and normal) is the same, this value is not affected by the mixture. Therefore, the natural estimator is
$$begin{aligned}{}&{hat{mu }} = {bar{x}} = frac{1}{n} sum _{i=1}^n x_i.&end{aligned}$$
(38)
Estimation of (sigma ), (delta ), and (gamma ) parameters
Applying the method of moments, we have the following three-equation system:
$$begin{aligned}{}&U_k(0)=u_k(0) text {, } k=1,2,3.&end{aligned}$$
(39)
The reason for choosing the origin is that it has the maximum amount of information about the (u_k(x)) functions defined in Eq. (34). If a nonzero x value is chosen, the estimate will discard all observations in the interval ((mu – x, mu + x)). Substituting Eqs. (15), (31), (32) and (33) in Eq. (13), the resulting equation system is:
$$begin{aligned}{}&gamma U_{1,1}(0) + (1-gamma )U_{1,2}(0)=gamma sigma sqrt{frac{2}{pi }}+(1-gamma )frac{delta }{2}=u_1(0),end{aligned}$$
(40)
$$begin{aligned}{}&gamma U_{2,1}(0) + (1-gamma )U_{2,2}(0) = gamma sigma ^2 + (1-gamma )frac{delta ^2}{3} = u_2(0),end{aligned}$$
(41)
$$begin{aligned}{}&gamma U_{3,1}(0) + (1-gamma )U_{3,2}(0)=gamma sigma ^3 2sqrt{frac{2}{pi }}+(1-gamma )frac{delta ^3}{4}=u_3(0), end{aligned}$$
(42)
where the solution must satisfy: ({hat{sigma }},{hat{delta }} > 0) and (gamma in [0,1]).
Adjustment to the mixed distribution
To contrast if the obtained estimators are valid, we could see if the set of observations ({x_1,ldots ,x_n}) fit the pdf of the final distribution:
$$begin{aligned}{}&{hat{f}}(x) = {hat{gamma }} hat{f_1}(x)+ (1-{hat{gamma }})hat{f_2}(x) text {, } x in {mathbb {R}},&end{aligned}$$
(43)
where:
$$begin{aligned}{}&hat{f_1}(x) = frac{1}{{hat{sigma }}}phi left( frac{x – {hat{mu }}}{{hat{sigma }}}right) text {, } x in {mathbb {R}},&end{aligned}$$
(44)
and:
$$begin{aligned}{}&hat{f_2}(x)=frac{1}{2{hat{delta }}} text {, } x in ({hat{mu }} – {hat{delta }}, {hat{mu }} + {hat{delta }}).&end{aligned}$$
(45)
For this purpose, a test that can be used is the Kolmogorov-Smirnov test. The one-sample Kolmogorov-Smirnov test42 is commonly used to examine whether samples come from a specific distribution function by comparing the observed cumulative distribution function with an assumed theoretical distribution. The Kolmogorov-Smirnov statistic Z is computed from the largest difference (in absolute value) between the observed and theoretical cumulative distribution. In this way, Z is the greatest vertical distance between empirical distribution function S(x) and the specified hypothesized distribution function (F^*(x)), which can be calculated as:
$$begin{aligned}{}&Z=max _x|F^*(x) – S(x)|,&end{aligned}$$
(46)
where the null hypothesis is (H_0: F(x) = F^*(x)) for all (-infty< x < infty ), and the alternative hypothesis is (H_1: F(x) ne F^*(x)) for at least one value of x, F(x) being the true distribution. If Z exceeds the (1text {-}alpha ) quantile value ((Q(1-alpha ))), then we reject (H_0) at the level of significance of (alpha ). When the number of observations n is large, the (Q(1-alpha )) value can be approximated as43:
$$begin{aligned}{}&Q(1-alpha ) = frac{sqrt{-0.5 log {(frac{alpha }{2})}}}{sqrt{n}}.&end{aligned}$$
(47)
Using the theoretical mixed distribution to fix the threshold of the POT approaches
In this paper, when the mixed distribution is estimated, we use it to set the threshold for estimating the POT distributions. We assume that using the points which are situated over a percentile of the theoretical mixed distribution is more reliable than using a threshold value predefined by a trial and error procedures. Identifying extreme values when studying a phenomenon is supported by the determination of a limit value or a probability threshold. Since the consideration of extreme is determined by an unusual deviation from the central values of the distribution of the phenomenon under investigation, we understand that the probabilistic approach is preferred. In our work, we consider the 95%, 97.5% and 99% percentiles as possible thresholds.
In this way, a new sample of independent random variables is defined by (Z = (z_1, z_2, ldots , z_M)), where (Z = X > u), u being the threshold and M being the number of exceedances. In this work, three distributions are fitted for the threshold exceedance distribution:
-
The first one is the GPD44, whose cumulative function is defined in Eq. (4).
-
The second distribution is the Gamma distribution, with the following cumulative function:
$$begin{aligned} F(z;xi ,sigma ) = frac{gamma (xi ,frac{z}{sigma })}{Gamma (xi )}, end{aligned}$$
(48)
where (gamma ) is the lower incomplete gamma function, and (Gamma ) is the Gamma function.
-
Finally, the Weibull distribution is also considered:
$$begin{aligned} F(z;xi ,sigma ) = 1 – exp left[ -left( frac{z}{sigma } right) ^xi right] . end{aligned}$$
(49)
These three distributions are adjusted to the exceedances using the Maximum Likelihood Estimator (MLE)13. After that, we select the best fit based on two objective criteria: BIC17 and AIC18. On the one hand, BIC minimizes the bias between the fitted model and the unknown true model:
$$begin{aligned} text {BIC} = -2ln L + k_p ln M, end{aligned}$$
(50)
where L is the likelihood of the fit, M is the sample size (in our case, the number of exceedances) and (k_p) the number of parameters of the distribution. On the other hand, AIC gives the model providing the best compromise between bias and variance:
$$begin{aligned} text {AIC} = -2ln L + 2 k_p. end{aligned}$$
(51)
Both criteria need to be minimized.
When the best-fitted distribution is obtained, the return period T ((Hs_T)) is calculated, and then the confidence intervals are computed. As can be seen in the experimental section, the GPD is the best distribution for all cases. The quantile for the GPD is:
$$begin{aligned} Hs_T = mu + frac{sigma }{xi }left[ 1 – (lambda T)^{-xi } right] , end{aligned}$$
(52)
where (lambda ) is the number of exceedances per year.
Finally, confidence intervals are also computed. For that, many authors use the classical asymptotic method14. However, Mathiesen et al. advocate the use of Monte-Carlo (MC) simulation techniques. Also, Mackay and Johanning26 proposed a storm-based MC method for calculating return periods of individual wave and crest heights. In the MC method, a random realisation of the maximum wave height in each sea state is simulated from the metocean parameter time series, and the GPD is fitted to storm peak wave heights exceeding some threshold. Mackay and Johanning26 showed that using (n=1000) is sufficient to obtain a stable estimation, although in our case, we have considered (n=100000) following the work of16. In16, as in our work, authors used the MC simulation method, and, after 100000 iterations, the (90%) confidence interval is obtained using the percentiles [(Hs_{T,5%};Hs_{T,95%})] of the 100000 (Hs_T) values obtained with the procedure.

