# Appendix II: Transitioning guide The following serves as a guide to transition an existing Freva instance (within the python*2* frame work) to the new (python*3* based) version. ## Transition to new Database We have created a small command line interface (`freva-migrate`) that helps migrating content of an existing Freva framework to the new one. The `freva-migrate` command has currently one sub commands: The new system has witnessed small changes to the database structure. The `database` sub-command of the `freva-migrate` command helps to transition to this new database structure. To migrate a database of an old installation of the Freva system to a freshly deployed Freva instance use the following command: ``` usage: freva-migrate database [-h] [--old-port OLD_PORT] [--old-db OLD_DB] [--old-pw OLD_PW] [--old-user OLD_USER] new_hostname old_hostname cert-file Freva database migration positional arguments: new_hostname The hostname where the new database is deployed. old_hostname Hostname of the old database. cert-file Path to the public certificate file. options: -h, --help show this help message and exit --old-port OLD_PORT The port where the old database server is running on. (default: 3306) --old-db OLD_DB The name of the old database (default: evaluationsystem) --old-pw OLD_PW The passowrd to the old database (default: None) --old-user OLD_USER The old database user (default: evaluationsystem) ``` The `cert-file` positional argument refers to the public certificate file that was created during the deployment process and is needed to establish a connection to the new database (via the vault). You can either use the one that has been saved by the deployment or use it from the freva config directory. By default the certificate file resides within `freva` path of the deployment `root_dir` for example `/mnt/project/freva/project.crt`. Also don't forget to set the domain name for your institution as a unique identifier. After the command has been applied the new database with its "old" content from the previous Freva instance will be ready for use. ## Transition to new DRS Config In the old version the DRS (Date Reference Syntax) File configuration, that is the definitions of dataset metadata, was hard coded into the module `evaluation_system.model.file`. In the new version this configuration is saved in a designated [toml](https://toml.io/en/) file (drs_config.toml). The ingestion of data is done by the new `freva-ingest` software, which is written in rust. More information on this configuration and usage of the ingestion software can be found on the [README](https://gitlab.dkrz.de/freva/freva-ingest). ## Transitioning of the Plugins The Freva plugins are an essential part of Freva. Most likely the transitioning from the old python2 to the new python3 based system, needs special care. A complete rewrite of the plugin manager is planned. This section should therefore be seen as intermediate solutions for plugin transitioning. Currently we recommend creating an anaconda environment for each plugin. This approach has several advantages: - reproducible as each plugin will get a anaconda environment file. - no version and dependency conflicts occur. - once set up easy to maintain. These are the disadvantages of this method: - an anaconda environment file has to be created for each plugin. ### Transitioning steps: There are multiple ways of how you can get your old plugin back to the new Freva system. We do recommend a deployment strategy involving conda. While this strategy is not strictly necessary it offers the best as possible reproducibility and is host system independent. Meaning your plugins can be easily transitioned to other institutions and are more likely to work after major system updates on your host system. Here we briefly cover the steps that are needed to bring your old plugin back to life in the new Freva system. We will also discuss alternatives to using conda. Regardless of choice on using conda or not the first two steps will be necessary. #### Common steps: 1. clone the repository of a plugin, change into the directory and create a new branch. 2. export the user plugin variable: ```console export EVALUATION_SYSTEM_PLUGINS=$PWD/path_to_wrapper_file,class_name freva plugin -l ``` Most certainly, the plugin manager will output a warning that the plugin could not be loaded. If it does, change the plugin accordingly to make the warning messages go away. #### a. Using conda: As mentioned above this step has the advantage that you increase the reproducibility of your plugin. Transitioning your plugin to other institutions is also easy because all libraries are encapsulated from the host system and hence independent. While not strictly necessary it is a good idea to familiarise yourself with [anaconda](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html). 3. Download the [conda environment file template](https://swift.dkrz.de/v1/dkrz_3d3c7abc-1681-4012-b656-3cc1058c52a9/k204230/freva-transition/plugin-env.yml) and the [Makefile template](https://swift.dkrz.de/v1/dkrz_3d3c7abc-1681-4012-b656-3cc1058c52a9/k204230/freva-transition/Makefile) 4. Add all dependencies to the `plugin-env.yml` file. In doubt search [anaconda.org](https://anaconda.org) for dependencies. 5. If needed adjust the `build` step in the `Makefile` for compiling plugin dependencies, e.g. fortran dependencies. 6. Execute `make all` to install the conda environment and build the plugin dependencies. 7. Execute the plugin and check if everything goes well. 8. Format the plugin using black: `black -t py310 path_to_plugin.py` #### b. Using the environment of freva If your plugin doesn't need many libraries you can simply try to use everything the comes with freva. This is the easiest way as you don't have to do anything. Simply try to execute all commands that come with your plugin and see what happens. #### c. Using software of the host system You can also make use of the software installed on the host system. For example via spack. Many HPC systems offer the `module` command. Using this approach will result in a plugin that is tailored around the current host system you are using. Future updates may break usage and you defiantly won't be able to use your plugin at other institutions. ### Transitioning `python2` plugins Python plugins (especially python2) need special care. The recommended strategy is to convert the plugin content to python3. If this is not possible an anaconda python2 environment should be created. If in the original plugin the plugin code is directly executed in the `run_rool` method (formerly named `runTool`) this code has to be split from the new `run_tool` method. The transition strategy is gathering the essential information in a `json` file that is passed to the actual core part of the plugin. The code below shows a simple python2 plugin: ```python def runTool(self, config_dict={}): """This is the old `runTool` method. The plugin configuration is passed into this method and the code is directly executed in this method.""" from src.plugin import calculate from evaluation_system.model.file import DRSFile search_kw = dict() search_kw["variable"] = config_dict["variable"] search_kw["model"] = config_dict["model"] search_kw["experiment"] = config_dict["experiment"] search_kw["ensemble"] = config_dict["ensemble"] search_kw["project"] = config_dict["project"] search_kw["product"] = config_dict["product"] search_kw["time_frequency"] = config["time_frequency"] files = files = sorted(DRSFile.solr_search(path_only=True, **search_kw)) calculate(search_kw["variable"], files, config_dict["output_dir"]) return self.prepareOutput(config_dict["output_dir"]) ``` The above code should be split into two components, one that makes use of `evaluation_system` to gather the data. And one that executes the actual plugin code. ```python def run_tool(self, config_dict={}): """This is the wrapper API part of the plugin. It gathers the plugin information and passes the needed information to the actual plugin code that is split into another python file.""" import json from evaluation_system.model.file import DRSFile from tempfile import NamedTemporaryFile from pathlib import Path search_kw = dict() search_kw["variable"] = config_dict["variable"] search_kw["model"] = config_dict["model"] search_kw["experiment"] = config_dict["experiment"] search_kw["ensemble"] = config_dict["ensemble"] search_kw["project"] = config_dict["project"] search_kw["product"] = config_dict["product"] search_kw["time_frequency"] = config["time_frequency"] files = sorted(DRSFile.solr_search(path_only=True, **search_kw)) compute_kw = dict(variable=search_kw["variable"], files=files, output_dir=config_dict["output_dir"] ) with NamedTemporaryFile(suffix=".json") as tf: with open(tf.name, "w"): json.dump(**compute_kw, f, indent=3) self.call(f"python2 {Path(__file__).parent / 'compute_python2'} {tf.name}") return self.prepare_output(config_dict["output_dir"]) ``` The below code demonstrates the usage of the above created `json` file. ```python """This file is the python2 plugin part. This file calls the calculations of the python2 plugin. The configuration is passed via a json file into this part. """ if __name__ == "__main__": from src.plugin import calculate import sys import json try: with open(sys.argv[1]) as f: config = json.load(f) except IndexError: raise ValueError("Usage: {} path_to_json_file.json".format(sys.argv[0])) calculate(config["variable"], config["files"], config["output_dir"]) ``` If you want to use the json file in a bash script you must install the `jq` json parser. Simply add `jq` to your `plugin-env.yml` file and read the [docs of jq](https://stedolan.github.io/jq/tutorial/). ### After conda deployment: Increasing reproducibility of your plugin If you have successfully deployed your plugin environment using conda you can increase the reproducibility by "freezing" all packages that have been installed by conda. This will increase the reproducibility of your package because the versions of your packages will exactly match the one you are using at this point in time. Every time you re-install the plugin environment you will have the same dependency versions. To fix the package versions execute the following command: ``` ./plugin_env/bin/conda list --explicit > spec-file.txt ``` Add and commit the created `spec-file.txt` to your repository. Afterwards you can replace `conda env create --prefix ./plugin_env -f plugin-env.yml --force` in the `conda` section of your `Makefile` by: ``` conda env create --prefix ./plugin_env -f spec-file.txt --force ``` And you're done. ### Problem: conda doesn't finish resolving dependencies Sometimes conda is unable/won't finish to solve all dependencies. You have a couple of options in that case. First you can try replacing the `conda` command by `mamba` in your Makefile. `mamba` is written in C and comes with a different dependency solver. Most of the time switching from `conda` to `mamba` solves the problem. If the issues persists with `mamba` you can try identifying the problematic package(s). Usually by guessing which package(s) might be the offending ones and removing them from the `plugin-env.yml` file. Once the package(s) have been circled you can create an additional conda environment. For example by adding the following line into the `conda` section of the `Makefile`. For example: ``` conda create -c conda-forge -p ./plugin_env2 cdo netcdf-fortran `` If you want to use resources from this environment in your plugin you need to modify the environment in your API wrapper. Following the example above, this modification could look like this: ```python env = os.environ.copy() this_dir = os.path.dirname(os.path.abspath(__file__)) env["PATH"] = env["PATH"] + ":" + os.path.join(this_dir, "plugin_env2", "bin") env["LD_LIBRARY_PATH"] = os.path.join(this_dir, "plugin_env2", "lib") self.call(your_command_here, env=env) ``` If you need to compile additional libraries you probably also want to adjust the `PATH` and `LD_LIBRARY_PATH` variable in the file that executes the compile step to be able to pick up the `lib` and `bin` folders in the `plugin_env2` conda environment.