Skip to contents

Overview

This document delineates the workflow — currently under development — for modeling the distribution and level of invasion (species richness) of invasive alien plant species (IAS) across Europe. The IAS prototype Digital Twin (IAS-pDT) is a component of the European BioDT project, which seeks to establish a Digital Twin framework for biodiversity in Europe. For a detailed exposition of the IAS-pDT, refer to Khan, El-Gabbas, et al. (2024). The complete IAS-pDT workflow is documented at Zenodo.

The IAS-pDT leverages the IASDT.R R package to execute a comprehensive workflow encompassing model fitting, post-processing, and data preparation for the Shiny application. This package facilitates the preparation of abiotic data (e.g., climate and land cover) and biotic data (i.e., species distribution). Model outputs from the IAS-pDT are made publicly available to end-users and stakeholders through an OPeNDAP cloud server, with prediction maps retrievable directly in R via the IASDT.R package (currently in development).


Models

Species distribution models are constructed using the Hierarchical Modelling of Species Communities (HMSC) R package, a hierarchical Bayesian framework that incorporates spatial autocorrelation and species associations. Spatial autocorrelation is modeled via the Gaussian Predictive Process (GPP; Tikhonov et al., 2019), offering a flexible and computationally efficient approach to capturing spatial dependencies. Given the substantial computational demands of fitting these spatial models at a European scale, we utilize the HMSC-HPC extension (Rahman et al., 2024) to leverage GPU-based processing for enhanced efficiency.

Models are fitted at the habitat level, with a distinct model fitted for each habitat type, incorporating only those invasive alien species (IAS) associated with the respective habitat type. We employ the habitat classification delineated by Pyšek et al. (2022). We fitted the models at eight habitat types (see table below). For each habitat type, predictions of individual species distributions and species richness are generated across multiple climate scenarios; further details are provided in the abiotic data processing section. Model performance is assessed using spatial block cross-validation to ensure spatial independence between training and testing datasets.

Abbreviation Habitat Type Description
1 Forests closed vegetation dominated by deciduous or evergreen trees
2 Open forests woodlands with canopy openings created by environmental stress or disturbance, including forest edges
3 Scrub shrublands maintained by environmental stress (aridity) or disturbance
4a Natural grasslands grasslands maintained by climate (aridity, unevenly distributed precipitation), herbivores or environmental stress (aridity, instability or toxicity of substrate)
4b Human-maintained grasslands grasslands dependent on regular human-induced management (mowing, grazing by livestock, artificial burning)
10 Wetlands sites with the permanent or seasonal influence of moisture, ranging from oligotrophic to eutrophic
12a Ruderal habitats anthropogenically disturbed or eutrophicated sites, where the anthropogenic disturbance or fertilization is typically a side-product and not the aim of the management
12b Agricultural habitats synanthropic habitats directly associated with growing of agricultural products, thus dependent on specific type of management (ploughing, fertilization)

Environment variables

The workflow necessitates the configuration of multiple environment variables to ensure proper execution. Certain functions within the IASDT.R package include an EnvFile argument, which defaults to .env. The table below enumerates the environment variables essential to the workflow, accompanied by their descriptions and default values.


Variable Description Default value
DP_R_BioReg_interim Directory path to biogeographical regions interim data datasets/interim/biogeoregions
DP_R_BioReg_processed Directory path to biogeographical regions processed data datasets/processed/biogeoregions
DP_R_BioReg_raw Directory path to biogeographical regions raw data datasets/raw/biogeoregions
DP_R_BioReg_url URL for downloading biogeographical regions data https://www.eea.europa.eu/en/datahub/datahubitem-view/11db8d14-f167-4cd5-9205-95638dfd9618
DP_R_CHELSA_links Directory path containing CHELSA download links references/chelsa/DwnLinks
DP_R_CHELSA_processed Directory path to processed CHELSA data datasets/processed/chelsa
DP_R_CHELSA_raw Directory path to raw CHELSA data datasets/raw/chelsa
DP_R_CHELSA_url Base URL for CHELSA data https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL
DP_R_CLC_crosswalk Path to a text file containing custom cross-walk between CLC values at level 3 and their corresponding values for EUNIS and SynHab habitat types references/CrossWalk.txt
DP_R_CLC_processed Directory path to processed CLC data datasets/processed/corine
DP_R_CLC_tif Path to the input CLC tif file datasets/raw/corine/u2018_clc2018_v2020_20u1_raster100m/DATA/U2018_CLC2018_V2020_20u1.tif
DP_R_Hmsc_ve_win Directory path to the Hmsc-HPC Python virtual environment under Windows operating system D:/Hmsc-HPC
DP_R_LUMI_cpu LUMI project number for CPU computations project_465000915
DP_R_LUMI_gpu LUMI project number for GPU computations project_465001588
DP_R_LUMI_gpu_check File path to a python script for reporting if the GPU was used in the running SLURM job references/LUMI_Check_GPU.py
DP_R_Countrycodes Path to a file containing countries ISO codes references/CountryCodes.csv
DP_R_EUBound Path to RData file containing country boundaries references/Bound_sf_Eur.RData
DP_R_Model_path Directory path for model fitting datasets/processed/model_fitting
DP_R_Railways_interim Directory path to interim railways data datasets/interim/railways
DP_R_Railways_processed Directory path to processed railways data datasets/processed/railways
DP_R_Railways_raw Directory path to raw railways data datasets/raw/railways
DP_R_Railways_url URL for railways data https://download.geofabrik.de/
DP_R_Grid_processed Directory path for reference grid (resulted from processing CLC data) datasets/processed/grid
DP_R_Grid_raw Directory path for reference grid (original) references/grid
DP_R_Rivers_interim Directory path to interim rivers data datasets/interim/rivers
DP_R_Rivers_processed Directory path to processed rivers data datasets/processed/rivers
DP_R_Rivers_raw Directory path to raw rivers data datasets/raw/rivers
DP_R_Rivers_zip Path to zip file containing river data datasets/raw/rivers/EU_hydro_gpkg_eu.zip
DP_R_Roads_interim Directory path to interim roads data datasets/interim/roads
DP_R_Roads_processed Directory path to processed roads data datasets/processed/roads
DP_R_Roads_raw Directory path to raw roads data datasets/raw/roads
DP_R_Roads_url URL for the Global Roads Inventory Project (GRIP) data https://dataportaal.pbl.nl/downloads/GRIP4/GRIP4_Region4_vector_fgdb.zip
DP_R_Efforts_interim Directory path to interim sampling efforts data datasets/interim/SamplingEfforts
DP_R_Efforts_processed Directory path to processed sampling efforts data datasets/processed/SamplingEfforts
DP_R_Efforts_raw Directory path to raw sampling efforts data datasets/raw/SamplingEfforts
DP_R_EASIN_interim Directory path to EASIN data datasets/interim/EASIN
DP_R_EASIN_processed Directory path to processed EASIN data datasets/processed/EASIN
DP_R_EASIN_summary Directory path to summary of processed EASIN data datasets/processed/EASIN/Summary
DP_R_EASIN_taxa_url URL for EASIN API for downloading taxa list https://easin.jrc.ec.europa.eu/apixg/catxg
DP_R_EASIN_data_url URL for EASIN API for downloading species data https://easin.jrc.ec.europa.eu/apixg/geoxg
DP_R_eLTER_processed Path to RData containing processed eLTER presence-absence data datasets/processed/IAS_PA/eLTER_IAS.RData
DP_R_eLTER_raw Path to rds file containing processed and standardized eLTER data references/elter_data_gbif_2024-02-07.rds
DP_R_GBIF_interim Directory path to interim GBIF data datasets/interim/GBIF
DP_R_GBIF_processed Directory path to processed GBIF data datasets/processed/GBIF
DP_R_GBIF_raw Directory path to raw GBIF data datasets/raw/GBIF
DP_R_PA Directory path to species-specific presence-absence data datasets/processed/IAS_PA
DP_R_HabAff Path to rds file containing species affinity to habitat types references/taxon-habitat-combined_2024-02-07.rds
DP_R_Taxa_country Path to excel file containing the number of grid cells per species and country references/cell_count_per_species_and_country_2024-01-30-IA-VK-MG.xlsx
DP_R_Taxa_easin Path to rds file containing EASIN standardized taxonomy references/easin_taxon-list-gbif_2024-02-07.rds
DP_R_OPeNDAP_url Path to OpENDAP server for the IAS-pDT http://opendap.biodt.eu/ias-pdt/

Next articles:
2. Processing abiotic data
3. Processing biotic data
4. Model fitting
5. Model post-processing