IAS-pDT modelling workflow — 4. Model fitting
Source:vignettes/workflow_4_model_fitting.Rmd
workflow_4_model_fitting.Rmd
This article outlines the preparation of input data for model fitting
and the subsequent process of fitting these models on GPUs within the
IAS-pDT
workflow.
Model input data
The primary function for preparing model-fitting data and
initializing models is Mod_Prep4HPC()
-
: prepare input data for modeling, with key arguments including:Mod_PrepData()
|
abbreviation of a single habitat type to be modeled |
|
directory path for storing all model files |
|
minimum number of vascular plant species per grid cell required for
inclusion in model fitting. This reflects the total count of vascular
plant species (including native species) recorded in GBIF across Europe,
as computed during the
sampling
effort preparation step (Efforts_Process() ). This
argument filters out grid cells with insufficient sampling effort
|
|
whether to exclude countries with cultivated or casual observations for each species |
|
whether to exclude grid cells with zero habitat coverage of the respective habitat type |
|
minimum number of presence grid cells required for a species to be
included in the models, calculated after excluding grid cells with low
sampling effort ( |
-
Mod_GetCV()
: prepare and visualize options for spatial-block cross-validation. In the CV_Dist strategy, block size is governed by the CV_NGrids argument, whereas in the CV_Large strategy, the study area is partitioned into larger blocks based on the CV_NR and CV_NC arguments.
|
number of cross-validation folds |
|
number of grid cells in each directions for the CV_Dist cross-validation strategy (default: 20, yielding 20 × 20 grid cell blocks). |
|
number of rows and columns defining in the CV_Large cross-validation strategy, partitioning the study area into large blocks (default: CV_NR = CV_NC = 2, resulting in four blocks divided at median coordinates). |
-
Mod_PrepKnots()
: prepare and visualize knot locations for Gaussian Predictive Process (GPP) models, as described by Tikhonov et al. (2019).
|
whether to incorporate spatial random effects using the Gaussian Predictive Process (GPP) |
|
distance (in kilometers) specifying both the spacing between knots and the minimum distance between a knot and the nearest sampling point |
|
whether to plot the coordinates of sampling units and knots |
|
minimum and maximum number of latent factors to be include |
|
prior specification for the alpha parameter |
-
Mod_SLURM()
: generate SLURM scripts to facilitate model fitting on GPUs using theHmsc-HPC
extension.
|
name assigned to the SLURM job |
|
number of tasks to execute |
|
Number of CPUs and GPUs allocated per node |
|
memory allocation per CPU |
|
maximum duration for job execution |
|
name of the HPC partition |
|
number of jobs within each SLURM script |
Other arguments:
- selection of predictors:
|
names of CHELSA variables to include in the model |
|
names of variables for which quadratic terms are incorporated |
|
whether to include the (log10-transformed) sampling effort as a predictor |
|
whether to include the (log10-transformed) summed road and railway intensity as a predictor |
|
whether to include the (log10-transformed) percentage coverage of the respective habitat type per grid cell as a predictor |
|
whether to include the (log10-transformed) total river length per grid cell as a predictor |
- model fitting options
|
number of MCMC chains |
|
thinning value(s) in MCMC sampling |
|
number of MCMC samples per chain |
|
transient multiplication factor. The value of transient will equal the
multiplication of |
|
interval at which MCMC sampling progress is reported |
|
floating-point precision mode for Hmsc-HPC sampling |
-
NspPerGrid : minimum number of IAS per grid cell for a grid cell to be included in the analysis -
ModelCountry : fit the model for a specific country or countries - whether or not to use phylogenetic trees:
PhyloTree andNoPhyloTree -
Path_Hmsc : directory path toHmsc-HPC
extension installation -
PrepSLURM : whether to prepare SLURM script for model fitting on GPU viaMod_SLURM()
Model fitting on GPUs
Following the preparation of model input data and initialization of
models, the subsequent phase involves fitting these models on GPUs. For
each habitat type, the Mod_Prep4HPC()
function
produces:
- python commands (
Commands2Fit.txt ) for fitting model chains across all model variants on GPUs, with each line corresponding to a single chain.
- one or more SLURM script files (
Bash_Fit.slurm ) designed to submit all model-fitting commands (Commands2Fit.txt ) as batch jobs on a high-performance computing (HPC) system.
Batch jobs for model fitting can be submitted using the
sbatch
command, for example:
Previous
articles:
Next articles: