Appendix II: Transitioning guide
The following serves as a guide to transition an existing Freva instance (within the python2 frame work) to the new (python3 based) version.
Transition to new Database
We have created a small command line interface (freva-migrate
) that
helps migrating content of an existing Freva framework to the new one.
The freva-migrate
command has currently one sub commands:
The new system has witnessed small changes to the database structure. The database
sub-command of the freva-migrate
command helps to transition to this new
database structure. To migrate a database of an old installation of the Freva
system to a freshly deployed Freva instance use the following command:
usage: freva-migrate database [-h] [--old-port OLD_PORT] [--old-db OLD_DB] [--old-pw OLD_PW] [--old-user OLD_USER]
new_hostname old_hostname cert-file
Freva database migration
positional arguments:
new_hostname The hostname where the new database is deployed.
old_hostname Hostname of the old database.
cert-file Path to the public certificate file.
options:
-h, --help show this help message and exit
--old-port OLD_PORT The port where the old database server is running on. (default: 3306)
--old-db OLD_DB The name of the old database (default: evaluationsystem)
--old-pw OLD_PW The passowrd to the old database (default: None)
--old-user OLD_USER The old database user (default: evaluationsystem)
The cert-file
positional argument refers to the public certificate file that was
created during the deployment process and is needed to establish a connection to
the new database (via the vault). You can either use the one that has been
saved by the deployment or use it from the freva config directory. By default
the certificate file resides within freva
path of the deployment root_dir
for example /mnt/project/freva/project.crt
. Also don’t forget to set the domain
name for your institution as a unique identifier.
After the command has been applied the new database with its “old” content from the previous Freva instance will be ready for use.
Transition to new DRS Config
In the old version the DRS (Date Reference Syntax) File configuration,
that is the definitions of dataset metadata, was hard coded into the module
evaluation_system.model.file
. In the new version this configuration
is saved in a designated toml file (drs_config.toml).
The ingestion of data is done by the new freva-ingest
software, which is
written in rust. More information on this configuration and usage of the
ingestion software can be found on the
README.
Transitioning of the Plugins
The Freva plugins are an essential part of Freva. Most likely the transitioning from the old python2 to the new python3 based system, needs special care. A complete rewrite of the plugin manager is planned. This section should therefore be seen as intermediate solutions for plugin transitioning.
Currently we recommend creating an anaconda environment for each plugin. This approach has several advantages:
reproducible as each plugin will get a anaconda environment file.
no version and dependency conflicts occur.
once set up easy to maintain.
These are the disadvantages of this method:
an anaconda environment file has to be created for each plugin.
Transitioning steps:
There are multiple ways of how you can get your old plugin back to the new Freva system. We do recommend a deployment strategy involving conda. While this strategy is not strictly necessary it offers the best as possible reproducibility and is host system independent. Meaning your plugins can be easily transitioned to other institutions and are more likely to work after major system updates on your host system. Here we briefly cover the steps that are needed to bring your old plugin back to life in the new Freva system. We will also discuss alternatives to using conda. Regardless of choice on using conda or not the first two steps will be necessary.
Common steps:
clone the repository of a plugin, change into the directory and create a new branch.
export the user plugin variable:
export EVALUATION_SYSTEM_PLUGINS=$PWD/path_to_wrapper_file,class_name
freva plugin -l
Most certainly, the plugin manager will output a warning that the plugin could not be loaded. If it does, change the plugin accordingly to make the warning messages go away.
a. Using conda:
As mentioned above this step has the advantage that you increase the reproducibility of your plugin. Transitioning your plugin to other institutions is also easy because all libraries are encapsulated from the host system and hence independent. While not strictly necessary it is a good idea to familiarise yourself with anaconda.
Download the conda environment file template and the Makefile template
Add all dependencies to the
plugin-env.yml
file. In doubt search anaconda.org for dependencies.If needed adjust the
build
step in theMakefile
for compiling plugin dependencies, e.g. fortran dependencies.Execute
make all
to install the conda environment and build the plugin dependencies.Execute the plugin and check if everything goes well.
Format the plugin using black:
black -t py310 path_to_plugin.py
b. Using the environment of freva
If your plugin doesn’t need many libraries you can simply try to use everything the comes with freva. This is the easiest way as you don’t have to do anything. Simply try to execute all commands that come with your plugin and see what happens.
c. Using software of the host system
You can also make use of the software installed on the host system. For
example via spack. Many HPC systems offer the module
command. Using this
approach will result in a plugin that is tailored around the current host system
you are using. Future updates may break usage and you defiantly won’t be able
to use your plugin at other institutions.
Transitioning python2
plugins
Python plugins (especially python2) need special care. The recommended strategy is to convert the plugin content to python3. If this is not possible an anaconda python2 environment should be created.
If in the original plugin the plugin code is directly executed in the run_rool
method (formerly named runTool
) this code has to be split from the new run_tool
method. The
transition strategy is gathering the essential information in a json
file that
is passed to the actual core part of the plugin. The code below shows a simple
python2 plugin:
def runTool(self, config_dict={}):
"""This is the old `runTool` method.
The plugin configuration is passed into this method and the code
is directly executed in this method."""
from src.plugin import calculate
from evaluation_system.model.file import DRSFile
search_kw = dict()
search_kw["variable"] = config_dict["variable"]
search_kw["model"] = config_dict["model"]
search_kw["experiment"] = config_dict["experiment"]
search_kw["ensemble"] = config_dict["ensemble"]
search_kw["project"] = config_dict["project"]
search_kw["product"] = config_dict["product"]
search_kw["time_frequency"] = config["time_frequency"]
files = files = sorted(DRSFile.solr_search(path_only=True, **search_kw))
calculate(search_kw["variable"], files, config_dict["output_dir"])
return self.prepareOutput(config_dict["output_dir"])
The above code should be split into two components, one that makes use of
evaluation_system
to gather the data. And one that executes the actual plugin
code.
def run_tool(self, config_dict={}):
"""This is the wrapper API part of the plugin.
It gathers the plugin information and passes the needed information
to the actual plugin code that is split into another python file."""
import json
from evaluation_system.model.file import DRSFile
from tempfile import NamedTemporaryFile
from pathlib import Path
search_kw = dict()
search_kw["variable"] = config_dict["variable"]
search_kw["model"] = config_dict["model"]
search_kw["experiment"] = config_dict["experiment"]
search_kw["ensemble"] = config_dict["ensemble"]
search_kw["project"] = config_dict["project"]
search_kw["product"] = config_dict["product"]
search_kw["time_frequency"] = config["time_frequency"]
files = sorted(DRSFile.solr_search(path_only=True, **search_kw))
compute_kw = dict(variable=search_kw["variable"],
files=files,
output_dir=config_dict["output_dir"]
)
with NamedTemporaryFile(suffix=".json") as tf:
with open(tf.name, "w"):
json.dump(**compute_kw, f, indent=3)
self.call(f"python2 {Path(__file__).parent / 'compute_python2'} {tf.name}")
return self.prepare_output(config_dict["output_dir"])
The below code demonstrates the usage of the above created json
file.
"""This file is the python2 plugin part.
This file calls the calculations of the python2 plugin. The configuration
is passed via a json file into this part.
"""
if __name__ == "__main__":
from src.plugin import calculate
import sys
import json
try:
with open(sys.argv[1]) as f:
config = json.load(f)
except IndexError:
raise ValueError("Usage: {} path_to_json_file.json".format(sys.argv[0]))
calculate(config["variable"], config["files"], config["output_dir"])
If you want to use the json file in a bash script you must install the jq
json parser. Simply add jq
to your plugin-env.yml
file and read the
docs of jq.
After conda deployment: Increasing reproducibility of your plugin
If you have successfully deployed your plugin environment using conda you can increase the reproducibility by “freezing” all packages that have been installed by conda. This will increase the reproducibility of your package because the versions of your packages will exactly match the one you are using at this point in time. Every time you re-install the plugin environment you will have the same dependency versions. To fix the package versions execute the following command:
./plugin_env/bin/conda list --explicit > spec-file.txt
Add and commit the created spec-file.txt
to your repository. Afterwards
you can replace
conda env create --prefix ./plugin_env -f plugin-env.yml --force
in the conda
section of your Makefile
by:
conda env create --prefix ./plugin_env -f spec-file.txt --force
And you’re done.
Problem: conda doesn’t finish resolving dependencies
Sometimes conda is unable/won’t finish to solve all dependencies. You have a
couple of options in that case. First you can try replacing the conda
command
by mamba
in your Makefile. mamba
is written in C and comes with a
different dependency solver. Most of the time switching from conda
to mamba
solves the problem. If the issues persists with mamba
you can try identifying
the problematic package(s). Usually by guessing which package(s) might be the
offending ones and removing them from the plugin-env.yml
file. Once the
package(s) have been circled you can create an additional conda environment.
For example by adding the following line into the conda
section of the
Makefile
. For example:
conda create -c conda-forge -p ./plugin_env2 cdo netcdf-fortran
``
If you want to use resources from this environment in your plugin you need
to modify the environment in your API wrapper. Following the example
above, this modification could look like this:
```python
env = os.environ.copy()
this_dir = os.path.dirname(os.path.abspath(__file__))
env["PATH"] = env["PATH"] + ":" + os.path.join(this_dir, "plugin_env2", "bin")
env["LD_LIBRARY_PATH"] = os.path.join(this_dir, "plugin_env2", "lib")
self.call(your_command_here, env=env)
If you need to compile additional libraries you probably also want to adjust
the PATH
and LD_LIBRARY_PATH
variable in the file that executes the compile
step to be able to pick up the lib
and bin
folders in the plugin_env2
conda environment.