Introduction
A common way to determine the quality of forecasts and further to
compare them to other forecasts is often done by calculating scalar
summary measures. For probabilistic forecasts there are a few
verification scores which are generally used
(e.g. [Wilks, 2011]). The Brier Score (BS) can be
chosen for binary events (e.g. rain/no rain) whereas analyzing discrete
multiple-category events (e.g. below normal/normal/above normal
precipitation) the Ranked Probability Score (RPS) is preferably used. In
the continuous case (infinite number of predictand classes of
infinitesimal width) the RPS is extended to the Continuous Ranked
Probability Score (CRPS). In addition to the mentioned verification
scores, the Plugin is able to calculate the associated skill scores
comparing two forecasts or comparing a forecast to the climatological
reference. Since the scores are biased to finite ensemble size,
additionally ensemble-size-corrected “Fair” (skill) scores
[Ferro, 2013] are also calculated. Unless otherwise
mentioned, all available scores used in this Plugin are based on the
SpecsVerification
R
-Package from
[Siegert, 2014]. Moreover the plugin is able to
calculate the MSE, the MSESS, the Pearson and the Spearman correlation
coefficient.
Note
For correlation, no skill scores are calculated. Thus, the skill scores are set to missing value.
Sections Input Parameter and Output Parameter explain the input respectively the output parameters of the ProblEMS Plugin. In section Scientific Background, the methods of pre-processing and score calculations are described. Additional information can be found at the end of the documentation.
User Manual
At first, you have to choose the variable
you want to analyze. The
input data for Forecast (“1” as suffix) and Reference (“2”, optional)
can be selected from the data browser via the options project
,
product
, institute
, model
, experiment
,
time_frequency
and ensemble
. In case the data is a decadal
experiment set is_decadal
to True. If you want to compare the
Forecast to the climatology, leave all entries of Reference (suffix “2”)
empty.
Note
In case you want to compare a decadal experiment against a
historical experiment you have to define the decadal experiment
as the Forecast and the historical experiment as the Reference.
In observation
you can choose the Observation/Reanalysis data
to compare. You can select observational and reanalysis data based
on monthly time frequency (time_frequency=mon) or you can specify
a netcdf file with any time frequency.
In case of selected decadal experiments you have to specify the
decadals
(starting/initialization years of decadal experiments) and
leadtimes
you want to analyze. Otherwise (e.g. only historical
experiments) leave empty and insert years you want to process into the
first_last_year
parameter. Further, define the temporal resolution
of the resulting output time series you want to analyze
(time_frequency_output
). Additionally, you can choose specific
months (mon
) which will be processed.
Note
If you choose seas
and you want to analyze decadal experiments,
the winter season contains only the months January and February.
In combination with the timemean
operator it is possible to define
extended seasons with months of interest.
In the next parameter (level
) you can select a level for variables
which contain vertical levels. Leave empty for 2D files. In region
you can choose a region box (W,E,S,N) you want to analyze. In grid
you have the option to specify a regular lon-lat grid to remap all of
the input data to the same grid before processing. Recommended in case
you want to analyze data with different grid sizes. Instead of a
gridwise (map), a field-mean-based (fieldmean) analysis can be
chosen with the output_dimension
operator.
Next, you have to specify metrics
you want to calculate. In case of
RPSS/BSS you have to configure the parameters threshold
and
optionally threshold_check
, otherwise leave empty. In case you want
to do a stratified verification (e.g. useful to evaluate experiments
during different initialization times/states (e.g. ENSO phases)), you
can input different sets of years. For this, you can input a .dat file
where every column (at least 1 column) contains the set of years
defining a certain initialization state. Leave empty to evaluate the
Forecast experiment in the default way.
With the option mean_or_median
you can specify whether an ensemble
mean or median should be used for the calculation of MSESS or
correlation. Moreover you can specify with time_mean_or_median
if
the climatology for anomaly calculation should be determined from the
temporal mean or median.
By using the operator statistic_method
, the aggregation method to
get from input time frequency to the output time frequency can be
changed. Using avg to processing missing values correctly is
recommended. Further, it is possible to compute the anomalies
(anomaly
) of time series before the scores will be calculated.
Calculation of significances can be enabled by entering a number in
bootstraps
. In case you want to re-produce the bootstrap sample used
for significance calculation, you can define a fixed seed
for the
random number generation.
Finally, you can specify the number of parallel-processing tasks
(ntask
), the output (outputdir
) and cache (cachedir
)
directories and you have the option to remove the cache directories
(cacheclear
).
Input Parameter
freva plugin problems --doc Plugin to calculate and plot verification scores of two ensemble forecasts primarily based on the SPECS project. If one leaves the reference experiment blank, the forecast is compared to the climatology instead. Available scores are: RPS/FairRPS, RPSS/FairRPSS BS/FairBS, BSS/FairBSS EnsCRPS/FairEnsCRPS, EnsCRPSS/FairEnsCRPSS, EnsCRPS Decomposition into REL, RES and UNC GaussCRPS, GaussCRPSS and GaussCRPS Decomposition into REL, RES and UNC GevCRPS, GevCRPSS, MSESS, MSE, Spearman and Pearson Correlation Coefficient. Options: variable (default: <undefined>) [mandatory] Choose variable you want to analyze (check data- browser) project1 (default: <undefined>) [mandatory] Choose project e.g. CMIP5, Reanalysis, your data, etc. (check data-browser) product1 (default: <undefined>) [mandatory] Choose product (check data-browser) institute1 (default: <undefined>) [mandatory] Choose institute (check data-browser) model1 (default: <undefined>) [mandatory] Choose model (check data-browser) experiment1 (default: <undefined>) [mandatory] Choose experiment (check data-browser) time_frequency1 (default: <undefined>) [mandatory] Choose time_frequency (check data-browser) ensemble1 (default: <undefined>) [mandatory] Choose ensemble member - multi selection possible (comma separated), use * for all members is_decadal1 (default: True) Set "True" in case forecast experiment is a decadal experiment. project2 (default: <undefined>) Choose project e.g. CMIP5, Reanalysis, your data, etc. (check data-browser) product2 (default: <undefined>) Choose product (check data-browser) institute2 (default: <undefined>) Choose institute (check data-browser) model2 (default: <undefined>) Choose model (check data-browser) experiment2 (default: <undefined>) Choose experiment (check data-browser) time_frequency2 (default: <undefined>) Choose time_frequency (check data-browser) ensemble2 (default: <undefined>) Choose ensemble member - multi selection possible (shell: comma separated), use * for all members is_decadal2 (default: False) Set "True" in case forecast experiment is a decadal experiment. observation (default: <undefined>) [mandatory] Observation to compare to. Choose observational or reanalysis experiment with time_frequency=mon (check data-browser). OR specify an observational netcdf file with any time_frequency -> path/2/file.nc. obs_complete (default: 90) percentage of observation data which needs to be available for calculation of scores. Default: 90 (%) decadals (default: <undefined>) Comma separated lists of decadal experiments: 1960,1961,etc. OR use a sequence like 1960:2000 for all years in between leadtimes (default: <undefined>) Leadtimes to analyze, if a decadal experiment is selected. Use e.g. "1,3" for separate lead times or "1-3" for averaging. first_last_year (default: <undefined>) First and last year of the analysis: 1960:2000 for all years in between. NOTE(!): Only available if the "is_decadal" option is False. time_frequency_output (default: year) [mandatory] Temporal resolution of the output time series you want to analyze. Valid options are "year", "seas", and "mon". If you choose "mon", your selected months will be individually analyzed mon (default: <undefined>) Months to be processed. You also can set "timemean" below "True" if you want to calculate the yearly average over these months (e.g. To get time series of yearly averaged extended summer season MJJAS insert 5,6,7,8,9). Only available if "time_frequency_output" is set "mon". timemean (default: False) Option to calculate the yearly mean of selected month above to get the time series. Only available if youe insert some months in "mon" and time_frequency_output is set to "mon" level (default: <undefined>) Choose level [in Pa], e.g. 50000. Only available for files containing levels (check data-browser). Leave empty for 2D files. region (default: -180,180,-90,90) Region box to be processed. The format have to be W,E,S,N (e.g. -180,180,0,90 for NH). grid (default: r72x36) Gridfile or a grid description like r72x36. Required in case of different grid resolutions of input data. output_dimension (default: map) [mandatory] Option to calculate input as a map (default) or areamean of the region box above (or region of file). metrics (default: rpss) [mandatory] Choose scores you want to calculate. RPSS: RPS/RPSS/FairRPS/FairRPSS BSS: BS/BSS/FairBS/FairBSS EnsCRPSS: CRPS/CRPSS/FairCRPS/FairCRPSS for ensemble forecast as well as score decomposition into Resolution, Reliability and Uncertainty. GaussCRPSS: CRPS/CRPSS for Gaussian forecast as well as score decomposition into Resolution, Reliability and Uncertainty GevCRPSS: CRPS/CRPSS for generalized extreme value (GEV) distribution forecast. MSESS: MSE and MSESS. SP-Cor: Spearman Correlation Coefficient (if the reference forecast is the climatology the skill score is set to the correlation coefficient between forecast and observations, otherwise the difference between the correlation coefficients between forecast and observations and that between forecast and reference is used instead of the skill score). PE-Cor: Pearson Correlation Coefficient (if the reference forecast is the climatology the skill score is set to the correlation coefficient between forecast and observations, otherwise the difference between the correlation coefficients between forecast and observations and that between forecast and reference is used instead of the skill score). mean_or_median (default: mean) [mandatory] Option whether to calculate MSES(S) or correlation with ensemble mean or median. time_mean_or_median (default: mean) [mandatory] Option whether to calculate anomalies with respect to climatology (if anomaly flag is set to TRUE) either with mean or median over time. threshold (default: perc_1/3,2/3) Threshold(s) for seperating categories following the format OPTION_THRESHOLD1,THRESHOLD2. Constant values (OPTION=const) or percentile values (OPTION=perc) are possible. Use 1 threshold for BS(S) and at least 2 for RPS(S) calculation. Leave empty for CRPSS metric. E.g. insert perc_1/3,2/3 to separate data into 3 categories based on the percentiles 33.3% and 66.7%. threshold_check (default: <undefined>) Option to define intervals between chosen thresholds following the format VALUE4TH1,VALUE4THS. If conditions are not fulfilled the score will be set to NA. Only valid for BSS/RPSS with percentile option. Leave empty for no threshold restrictions. E.g. if "1,2" is set, the first threshold value must be greater equal 1 and the gap between the following thresholds must be at least 2. Otherwise the score is set to NA. years_stratified_verification (default: <undefined>) Option to input a .dat-file containing at least one column of years (YYYY). The following evaluation will be stratified on the given years in the file. Useful for stratified verification (e.g. years with a El Nino vs. years with La Nina; early time period vs. late time period). Leave blank to evaluate the forecast experiment in the default way (based on all available years). NOTE: Stratified verification is in a test phase! statistic_method (default: avg) [mandatory] Statistical method used to get from input time_frequency to output time_frequency. "avg" is recommended. "max" could be useful for analyzing extreme values (e.g. GevCRPS) anomaly (default: False) Option to calculate anomalies of time series before calculating scores. significance_method (default: Bootstrap) [mandatory] Select method for significance estimation. bootstraps (default: <undefined>) Number of bootstrap runs for calculation of significance levels if Bootstrap is selected via significance_method. Leave empty if you do not want to calculate significance levels with bootstrap. WARNING: This could take a lot of time when enabled! seed (default: <undefined>) Specify the seed to initialize the random number generator used to generate the bootstrap sample. blocklength (default: 5) If bootstrap is selected via significance_method: Length of block for bootstrapping. If jackknife is selected: Length of block that will be deleted for resampling. Usually 5 for decadals and 1 for seasonal or monthly forecasts. ntask (default: 6) [mandatory] Choose number of tasks on machine, default set by Freva for batchmode. outputdir (default: $USER_OUTPUT_DIR/$SYSTEM_DATETIME) [mandatory] Choose your output directory. Default is set by Freva. cachedir (default: $USER_CACHE_DIR) [mandatory] Choose your cache directory. Default is set by Freva. cacheclear (default: True) Option switch to NOT clear the cache. debug (default: False) Option for debugging purposes extra_scheduler_options (default: --ntasks-per-node=20) Set additional options for the job submission to the workload manager (, seperated). Note: batchmode and web only.
Output Parameter
The processed files can be found in the selected outputdir
.
Pre-processed NetCDF files of Forecast/Reference/Observation time series
are stored in fc/ref/obs
directories containing the variable
with
dimension of [LON; LAT; DATE; ENSEMBLE]. NetCDF files [LON; LAT; DATE]
of calculated scores can be found in a separated directory depending on
your chosen metrics
. Output figures showing the gridwise skill score
(or the fair version if available) of your chosen metrics
will be
saved in the plot/
directory (only available if output_dimension
is
set to maps/
).
In case the Plugin is used in the stratified verification setup, some
additional output files as well as figures will be stored in associated
sub folders stratveri/
. The following file prefixes exist (see
section Stratified Verification for more information):
bcw<i>_
: Individual contribution of time series subset i to the total skill score\[\frac{N_i}{N} \cdot \frac{\left( \operatorname{RPS_{ref}^\textit{i}} - \operatorname{RPS_{fc}^\textit{i}} \right)}{\operatorname{RPS_{ref}^{tot}}}\]bc<i>_
: Coefficient of individual contribution of time series subset i to the total skill score\[\frac{\left( \operatorname{RPS_{ref}^\textit{i}} - \operatorname{RPS_{fc}^\textit{i}} \right)}{\operatorname{RPS_{ref}^{tot}}}\]bcw<i>-reference_
: Expected individual contribution of time series subset i based on total skill score (expected: every individual time series subset has the same contribution to the total skill score)\[\frac{ \operatorname{RPSS_{tot}} }{ \text{number of individual time series subsets} }\]bc<i>-reference_
: Expected coefficient of individual contribution of time series subset i based on total skill score (expected: every individual time series subset has the same value as the total skill score)\[\operatorname{RPSS_{tot}}\]absdiff-bcw<i>-reference_
: Difference between individual contribution of time series subset i and expected individual contribution of time series subset i\[\frac{N_i}{N} \cdot \frac{\left( \operatorname{RPS_{ref}^\textit{i}} - \operatorname{RPS_{fc}^\textit{i}} \right)}{\operatorname{RPS_{ref}^{tot}}} - \frac{ \operatorname{RPSS_{tot}} }{ \text{number of individual parts} }\]absdiff-bc<i>-reference_
: Difference between coefficient of individual contribution of time series subset i and expected coefficient of individual contribution of time series subset i\[\frac{\left( \operatorname{RPS_{ref}^\textit{i}} - \operatorname{RPS_{fc}^\textit{i}} \right)}{\operatorname{RPS_{ref}^{tot}}} - \operatorname{RPSS_{tot}}\]diff-bcw1-bcw2_
: Difference between individual contribution of time series subset 1 and time series subset 2\[\frac{N_1}{N} \cdot \frac{\left( \operatorname{RPS_{ref}^1} - \operatorname{RPS_{fc}^1} \right)}{\operatorname{RPS_{ref}^{tot}}} - \frac{N_2}{N} \cdot \frac{\left( \operatorname{RPS_{ref}^2} - \operatorname{RPS_{fc}^2} \right)}{\operatorname{RPS_{ref}^{tot}}}\]diff-bc1-bc2_
: Difference between coefficient of individual contribution of time series subset 1 and time series subset 2\[\frac{\left( \operatorname{RPS_{ref}^1} - \operatorname{RPS_{fc}^1} \right)}{\operatorname{RPS_{ref}^{tot}}} - \frac{\left( \operatorname{RPS_{ref}^2} - \operatorname{RPS_{fc}^2} \right)}{\operatorname{RPS_{ref}^{tot}}}\]rpss<i>_
: Skill score of individual time series subset i\[\frac{\left( \operatorname{RPS_{ref}^1} - \operatorname{RPS_{fc}^1} \right)}{\operatorname{RPS_{ref}^{1}}}\]w<i>_
: Time step weighting factor of time series subset i\[\frac{N_i}{N}\]d<i>_
: Weighting term of reference forecast for time series subset i vs. total time series\[\frac{ \operatorname{RPS_{ref}^i} }{ \operatorname{RPS_{ref}^{tot}} }\]
Previous Applications and Test
The Plugin was used to verify global and regional hindcast data in a probabilistic way using the fair version of the RPSS (see forecast web page).
Scientific Background
Pre-Processing
Before a calculation of verification scores can be done, the input data have to be pre-processed beforehand. The general pre-processing procedure includes a yearly, monthly or seasonally averaging of every year to finally get yearly time series. In case of decadal experiments, time series will be created according to the Leadtime-Selector Plugin. Afterward, a remapping to a chosen regular longitude-latitude grid is applied in case the user enter a grid to remap. Fig. 2 additionally shows the schematic processing procedure of the ProblEMS Plugin.
Note
Every of the following score calculation procedures is individually done for every gridpoint, month (season) and/or lead-year.
For the calculation of BSS and RPSS metrics a definition of \(n_{th}\) threshold(s) is necessary to classify the data into discrete categories. The data will be divided by thresholds \(X_{th}\) into \(n_{th}+1\) categories \(Y\) following the rule
Hence, e.g. assume that two thresholds are selected; category \(Y_{1}\) will contain data below the 1st threshold \(X_1\), category \(Y_2\) will contain data equal or greater than \(X_1\) and below \(X_2\) and category \(Y_3\) will contain data equal or greater than the 2nd threshold \(X_2\). In case of percentile-based categorization, threshold values will be individually received for Observation, Forecast and Reference by separate percentile calculations beforehand.
Metrics and Scores
Brier Skill Score (BSS-Metric)
For binary events (e.g. rain/no rain separated by a threshold) the Brier Score (BS) essentially gives information about the mean squared error of the probabilistic forecast with ensemble size \(M\) and is defined as
where \(o\) stands for the observed event (if \(o=1\) then the event occur, if \(o=0\) the event does not occur), \(y\) for the related forecast probability of the \(t\)-th forecast-event pair and \(T\) for the number of cases (in context of this Plugin, \(T\) is related to the number of years within a given time period). In context of ensemble forecast the probabilistic forecast \(y\) is based on the relative number \(\frac{e}{M}\) of ensemble members \(e\) predicting that the event occurs. The BS is bounded between 0 and 1 with a perfect forecast \(\operatorname{BS}=0\). To evaluate the performance of one forecast against a reference forecast, the calculation of skill scores is a good choice. The associated skill score for binary events is the Brier Skill Score
where BSfc is the Brier Score of the Forecast and \(\operatorname{BS_{ref}}\) the Brier Score of the Reference. If no Reference forecast experiment is selected in the Plugin, the climatology is used as the Reference instead. Since scores are biased for finite ensemble size (positive bias for small ensemble size), the Brier Score will be adjusted following [Ferro, 2013] to be fair with comparing ensemble forecasts of different ensemble sizes. Thus, the Fair Brier Score FairBS for one forecast-event pair \(t\) is defined as
According to the ensemble-size adjustment of the Bier Score, the Fair Brier Skill Score can also be calculated, whereas the Brier Scores of Forecast (\(\operatorname{BS_{fc}}\)) and Reference (\(\operatorname{BS_{ref}}\)) will be calculated related to equation (1).
Note
In the Fair-Score-background of this Plugin the climatology will get a hypothetical ensemble size which is equal to the number of observations.
Ranked Probability Skill Score (RPSS-Metric)
In case you want to divide data into more than 2 groups having discrete events (e.g. dry/near-normal/wet precipitation conditions) the Ranked Probability Score (RPS) can be chosen. The RPS is basically an extension of the Brier Score to multiple-event situations and is based on squared errors regarding to the cumulative probabilities in forecasts \(Y\) and observations \(O\). The RPS is defined as
where \(K\) is the number of discrete categories. According to the Brier Skill Score, the Ranked Probability Skill Score
can be derived. With respect to ensemble-size-adjustment the Fair Ranked Probability Score FairRPS for one forecast-event pair \(t\) follows
where \(E\) is the cumulative number of ensemble members. In this context the calculation of the Fair Ranked Probability Skill Score is similar to the computation of FairBSS.
Continuous Ranked Probability Skill Score (CRPSS-Metric)
The Continuous Ranked Probability Score (CRPS) is essentially the extension of the RPS to the continuous case. The number of predictand classes will be replaced by an infinite number of infinitesimal width, so that the summations in eq. (2) is substitute with the integrals.
where \(F_Y\) is the cumulative distribution function (CDF) from the forecast and \(F_O\) is the Heaviside function jumping from 0 to 1 when the forecast variable \(Y\) is equal to the observation.
EnsCRPSS Metric
Following [Hersbach, 2000] and
[Siegert, 2014] respectively the CRPS derived from
ensemble prediction systems (enscrpss
Metric option in the ProblEMS
Plugin) can be written as
The corresponding Continuous Ranked Probability Skill Score is defined as
According to the ensemble-size-adjustment, the Fair Continuous Ranked Probability Score \(\operatorname{FairCRPS}\) follows
The calculation of the Fair Continuous Ranked Probability Skill Score is analogous to equation (3).
GaussCRPSS Metric
Besides the described score version for ensemble forecast systems above,
a parametric version of the CRPS can also be defined for a Gaussian
forecast distribution (gausscrpss
Metric option in the ProblEMS
Plugin). Following [Gneiting et al., 2005] and
[Wilks, 2011] the \(\operatorname{CRPS_{Gauss}}\)
can be written as.
where \(\mu_t\) (\({\sigma_t}^2\)) is the mean (variance) of the ensemble at time step \(t\) and \(\Phi\) (\(\phi\)) the CDF (PDF) of the standard Gaussian distribution.
GevCRPSS Metric
In case of a generalized extreme value (GEV) distribution forecast (e.g.
yearly maximum temperatures), the parametric gevcrpss
option in the
ProblEMS Plugin can be applied. At first, for each forecast time series
the GEV parameters location \(\mu\), scale \(\sigma\) and shape
\(\xi\) following the distribution \(F_{\operatorname{GEV}}\)
will be estimated based on the ismev
R
-Package
[Coles, 2001, Gilleland, 2016]. After that, the estimated
parameters are used for the parametric CRPS calculation of generalized
extreme value (GEV) distribution forecasts following
[Friederichs and Thorarinsdottir, 2012] and the scoringRules
R
-Package from [Jordan et al., 2017], respectively.
For \(\xi \neq 0\) the CRPS is given by
and for \(\xi = 0\), the CRPS is written as
where \(C \approx 0.5772\) is the Euler-Mascheroni constant, \(Ei(x) =\int^{x}_{-\infty} (e^t/t)dt\) the exponential integral, \(\Gamma\) denotes the gamma function and \(\Gamma_{l}\) the lower incomplete gamma function. More details of the \(\operatorname{CRPS_{GEV}}\) calculation can be found in the cited references above.
CRPS Decomposition
Details of the decomposition of the CRPS into Reliability, Resolution
and Uncertainty can be found for the Gaussian CRPS in
[Siegert, 2014] and [Siegert, 2017]
and for the Ensemble CRPS in [Hersbach, 2000] eq.
(3), (35-37) and the verification
R
-Package
[NCAR - Research Applications Laboratory, 2015].
MSES(S)
The Mean Squared Error Skill Score \(MSESS\) is calculated as
where \(MSE_F\) and \(MSE_R\) are the Mean Squared Errors of the forecast and the reference with
with \(f_t\) and \(o_t\) are the forecast and the observation for time step \(t\). Please note, that \(f_t\) and \(o_t\) are anomalies calculated within a leave-one-out cross-validation setup (i.e. the year for which the anomaly is calculated does not contribute to the calculation of climatology). Moreover, the climatology which could be used as reference forecast is also calculated via a leave-one-out cross-validation. See [Goddard et al., 2013] for further information.
Correlation
The Pearson and Spearman Correlation are calculated with the R-function cor. Here, also cross-validated anomalies were used for model and observations.
Significance
The information about the significance of skill scores is based on the block-bootstrap method applied in [Goddard et al., 2013] but without re-sampling ensemble members. For this, time series of length \(n\) is divided into (\(n-l+1\)) overlapping blocks of length l (here set to five), where the \(k\)-th block contains the time series values \(x_i\) (\(i\) goes from \(i=k\) to \(k+l-1\)). After that, a random selection with replacing is applied to generate a new time series of Forecast-Reference-Observation cases and further to calculate the skill score. This procedure is done e.g. 1000 times to build up a sampling distribution of the considered skill score (e.g. [Mason, 2008]). A positive (negative) skill score is significantly different from zero if the 5%-percentile (95%-percentile) is greater (less) than zero.
Stratified Verification
Note
Stratified verification is in a test phase and under construction: It is not recommended for operational use!
The forecast skill of decadal predictions is assumed not only to vary
with lead time but also with low-frequency climate modes such as the El
Niño-Southern Oscillation (ENSO) or the Atlantic Multidecadal
Variability (AMV). The state/phase of these variability modes could have
an influence during the initialization procedure of decadal predictions
and therefore also on the overall forecast skill. With the option of
stratified verification (Plugin parameter:
years_stratified_verification
) the total Skill Score can be
decomposed into multiple parts to estimate the skill contribution
stratified along a given set of initialization years. E.g., the total
Ranked Probability Skill Score (RPSStot) can be decomposed into
two parts initialized during different years (periods) as follow
Significant differences between individual parts are also based on the bootstrap method (but without using the “block” option) mentioned in section Significance considering the given time steps of every individual part.
Additional Information, Remarks and Installation
Climatology as Reference
Climatologies in ProbLEMS are generated as an artificial ensemble Reference experiment containing the same size of ensemble members as number of time steps exists in the observational data. The generated climatologies are based on all time steps of the Observation; a leave-one-out-method is not used here.
In general, every time step of the “climatological experiment” contains
all time steps of the Observation as an artificial ensemble. Hence, each
time step has the same ensemble values based on the observational data.
In case of BS and RPS, the fraction of ensemble members for every
category is derived from the probability of occurrence based on the
user-given threshold
parameter.
Handling of Missing Values
General Preparation of Input Data (Temporal Aggregation):
- Statistic Method: avgGridwise calculation of the annual monthly/seasonal/yearly average separately for each Forecast/Reference/Observation experiment. In case at least one time step of the input data contains a missing value, the output time step of the processed year is also set to missing value. The same behavior is valid for moving average calculation over multiple lead-years.
- Statistic Method: mean/min/maxGridwise calculation of the annual monthly/seasonal/yearly mean/minimum/maximum separately for each Forecast/Reference/Observation experiment. The output time step of the processed year is only set to missing value in case that all time steps of input data contain missing values. The same behavior is valid for moving mean/minimum/maximum calculation over multiple lead-years.
Areamean:
Gridwise calculation of overall monthly/seasonal/yearly average separately for each Forecast/Reference/Observation experiment. In case at least one time step (year) of the input time series contains a missing value, the output time step is also set to missing value. The result is an individual missing-value-mask for every Forecast/Reference/Observation experiment (separately for each month/season if chosen).
Gridwise combination of Forecast-Reference-Observation cases. In case at least one individual experiment contains a missing value, the resulting grid point is also set to missing value. Hence, the missing-value-mask is the same for each experiment (separately for each month/season if chosen).
Calculation of area mean over all non-missing-value grid points.
- Score Calculation:At least 90% of Forecast-Reference-Observation cases (years) must be completely (all experiments must contain non-missing-value in a given year) available within the prepared time series (gridwise for every month/season if chosen) to calculate the resulting score.
- Significances:Since missing values can also be picked to generate random time series used for the bootstrap method, the behavior described in 3. is also valid for each randomly generated time series. The 95%-level will be achieved by counting the relative number based on scores, which are not set to missing values. That means, the number of score values describing the sample distribution can be smaller than the number of bootstraps is given by the user.
Installation
Software:
CDO (>=1.7)
NetCDF4
NCO
plot_engine (2.1.16)
R (>=3.2)
relevant
R
-packages plus dependencies:ncdf4, SpecsVerification, abind, fields, RColorBrewer, foreach, SNOW, ismev, verification
Installation Steps:
Get the Plugin (incl. the git submodule
leadtimeselektor
) from the git repository:git clone --recursive https://gitlab.dkrz.de/freva/plugins4freva/problems.git cd problems
make sure the latest version from
leadtimeselektor
submodule is available inproblems/tools4problems/leadtimeselector
(branchto_new_freva
). If not, updateleadtimeselektor
with the following:git submodule sync git submodule update --init --recursive cd tools4problems/leadtimeselector git checkout to_new_freva git pull # in case it is needed
Create the conda environment:
you will need to have a suitable version of
conda
active previously, for example by loading one Freva instance, e.g.:
module load clint <your-projects-freva>
run the
Makefile
:
make all
To activate/deactivate the environment including own CDO (not necessary to run the plugin via Freva):
source plugin_env/bin/activate # activate
conda deactivate # deactivate
Updating the Plugin with git:
cd <PATH2PLUGIN>
git checkout -b <new_branch>
(commit changes in a new branch<new_branch>
)git pull -u <new_branch>
(pull changes upstream to<new_branch>
)pull/merge request into
main/master
branch (at e.g. gitlab), wait for reviews, merge