IAS-pDT modelling workflow — 1. Overview
Source:vignettes/workflow_1_overview.Rmd
workflow_1_overview.Rmd
Overview
This document delineates the workflow — currently under development —
for modeling the distribution and level of invasion (species richness)
of invasive alien plant species (IAS) across Europe. The IAS prototype
Digital Twin (IAS-pDT
) is a component of the European
BioDT
project, which seeks to establish a Digital Twin framework for
biodiversity in Europe. For a detailed exposition of the
IAS-pDT
, refer to Khan, El-Gabbas, et al.
(2024).
The complete IAS-pDT
workflow is documented at
Zenodo.
The IAS-pDT
leverages the
IASDT.R
R package to execute a comprehensive workflow encompassing model
fitting, post-processing, and data preparation for the
Shiny
application. This package facilitates the preparation of abiotic
data (e.g., climate and land cover) and biotic data (i.e., species
distribution). Model outputs from the IAS-pDT
are made
publicly available to end-users and stakeholders through an
OPeNDAP
cloud server, with prediction maps retrievable directly in R via the
IASDT.R
package (currently in development).
Models
Species distribution models are constructed using the Hierarchical
Modelling of Species Communities
(HMSC)
R package, a hierarchical Bayesian framework that incorporates spatial
autocorrelation and species associations. Spatial autocorrelation is
modeled via the Gaussian Predictive Process (GPP; Tikhonov et
al.,
2019),
offering a flexible and computationally efficient approach to capturing
spatial dependencies. Given the substantial computational demands of
fitting these spatial models at a European scale, we utilize the
HMSC-HPC
extension (Rahman et al.,
2024)
to leverage GPU-based processing for enhanced efficiency.
Models are fitted at the habitat level, with a distinct model fitted for each habitat type, incorporating only those invasive alien species (IAS) associated with the respective habitat type. We employ the habitat classification delineated by Pyšek et al. (2022). We fitted the models at eight habitat types (see table below). For each habitat type, predictions of individual species distributions and species richness are generated across multiple climate scenarios; further details are provided in the abiotic data processing section. Model performance is assessed using spatial block cross-validation to ensure spatial independence between training and testing datasets.
Abbreviation | Habitat Type | Description |
---|---|---|
1 | Forests | closed vegetation dominated by deciduous or evergreen trees |
2 | Open forests | woodlands with canopy openings created by environmental stress or disturbance, including forest edges |
3 | Scrub | shrublands maintained by environmental stress (aridity) or disturbance |
4a | Natural grasslands | grasslands maintained by climate (aridity, unevenly distributed precipitation), herbivores or environmental stress (aridity, instability or toxicity of substrate) |
4b | Human-maintained grasslands | grasslands dependent on regular human-induced management (mowing, grazing by livestock, artificial burning) |
10 | Wetlands | sites with the permanent or seasonal influence of moisture, ranging from oligotrophic to eutrophic |
12a | Ruderal habitats | anthropogenically disturbed or eutrophicated sites, where the anthropogenic disturbance or fertilization is typically a side-product and not the aim of the management |
12b | Agricultural habitats | synanthropic habitats directly associated with growing of agricultural products, thus dependent on specific type of management (ploughing, fertilization) |
Environment variables
The workflow necessitates the configuration of multiple environment
variables to ensure proper execution. Certain functions within the
IASDT.R
package include an EnvFile
argument,
which defaults to .env
. The table below enumerates the
environment variables essential to the workflow, accompanied by their
descriptions and default values.
Variable | Description | Default value |
---|---|---|
|
Directory path to biogeographical regions interim data | datasets/interim/biogeoregions |
|
Directory path to biogeographical regions processed data | datasets/processed/biogeoregions |
|
Directory path to biogeographical regions raw data | datasets/raw/biogeoregions |
|
URL for downloading biogeographical regions data | https://www.eea.europa.eu/en/datahub/datahubitem-view/11db8d14-f167-4cd5-9205-95638dfd9618 |
|
Directory path containing CHELSA download links | references/chelsa/DwnLinks |
|
Directory path to processed CHELSA data | datasets/processed/chelsa |
|
Directory path to raw CHELSA data | datasets/raw/chelsa |
|
Base URL for CHELSA data | https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL |
|
Path to a text file containing custom cross-walk between CLC values at level 3 and their corresponding values for EUNIS and SynHab habitat types | references/CrossWalk.txt |
|
Directory path to processed CLC data | datasets/processed/corine |
|
Path to the input CLC tif file | datasets/raw/corine/u2018_clc2018_v2020_20u1_raster100m/DATA/U2018_CLC2018_V2020_20u1.tif |
|
Directory path to the Hmsc-HPC Python virtual environment under Windows operating system | D:/Hmsc-HPC |
|
LUMI project number for CPU computations | project_465000915 |
|
LUMI project number for GPU computations | project_465001588 |
|
File path to a python script for reporting if the GPU was used in the running SLURM job | references/LUMI_Check_GPU.py |
|
Path to a file containing countries ISO codes | references/CountryCodes.csv |
|
Path to RData file containing country boundaries
|
references/Bound_sf_Eur.RData |
|
Directory path for model fitting | datasets/processed/model_fitting |
|
Directory path to interim railways data | datasets/interim/railways |
|
Directory path to processed railways data | datasets/processed/railways |
|
Directory path to raw railways data | datasets/raw/railways |
|
URL for railways data | https://download.geofabrik.de/ |
|
Directory path for reference grid (resulted from processing CLC data) | datasets/processed/grid |
|
Directory path for reference grid (original) | references/grid |
|
Directory path to interim rivers data | datasets/interim/rivers |
|
Directory path to processed rivers data | datasets/processed/rivers |
|
Directory path to raw rivers data | datasets/raw/rivers |
|
Path to zip file containing river data | datasets/raw/rivers/EU_hydro_gpkg_eu.zip |
|
Directory path to interim roads data | datasets/interim/roads |
|
Directory path to processed roads data | datasets/processed/roads |
|
Directory path to raw roads data | datasets/raw/roads |
|
URL for the Global Roads Inventory Project (GRIP) data | https://dataportaal.pbl.nl/downloads/GRIP4/GRIP4_Region4_vector_fgdb.zip |
|
Directory path to interim sampling efforts data | datasets/interim/SamplingEfforts |
|
Directory path to processed sampling efforts data | datasets/processed/SamplingEfforts |
|
Directory path to raw sampling efforts data | datasets/raw/SamplingEfforts |
|
Directory path to EASIN data | datasets/interim/EASIN |
|
Directory path to processed EASIN data | datasets/processed/EASIN |
|
Directory path to summary of processed EASIN data | datasets/processed/EASIN/Summary |
|
URL for EASIN API for downloading taxa list | https://easin.jrc.ec.europa.eu/apixg/catxg |
|
URL for EASIN API for downloading species data | https://easin.jrc.ec.europa.eu/apixg/geoxg |
|
Path to RData containing processed eLTER presence-absence
data
|
datasets/processed/IAS_PA/eLTER_IAS.RData |
|
Path to rds file containing processed and standardized
eLTER data
|
references/elter_data_gbif_2024-02-07.rds |
|
Directory path to interim GBIF data | datasets/interim/GBIF |
|
Directory path to processed GBIF data | datasets/processed/GBIF |
|
Directory path to raw GBIF data | datasets/raw/GBIF |
|
Directory path to species-specific presence-absence data | datasets/processed/IAS_PA |
|
Path to rds file containing species affinity to habitat
types
|
references/taxon-habitat-combined_2024-02-07.rds |
|
Path to excel file containing the number of grid cells per species and country | references/cell_count_per_species_and_country_2024-01-30-IA-VK-MG.xlsx |
|
Path to rds file containing EASIN standardized taxonomy
|
references/easin_taxon-list-gbif_2024-02-07.rds |
|
Path to OpENDAP server for the IAS-pDT
|
http://opendap.biodt.eu/ias-pdt/ |
Next
articles: