IASDT modelling workflow — 4. Model fitting
Source:vignettes/workflow_4_model_fitting.Rmd
workflow_4_model_fitting.Rmd
This article outlines the preparation of input data for model fitting
and the subsequent process of fitting these models on GPUs within the
IASDT
workflow.
Model input data
The primary function for preparing model-fitting data and
initialising models is mod_prepare_HPC()
-
: prepare input data for modelling, with key arguments including:mod_prepare_data()
|
abbreviation of a single habitat type to be modelled |
|
directory path for storing all model files |
|
minimum number of vascular plant species per grid cell required for
inclusion in model fitting. This reflects the total count of vascular
plant species (including native species) recorded in GBIF across Europe,
as computed during the
sampling
effort preparation step (efforts_process() ). This
argument filters out grid cells with insufficient sampling effort
|
|
whether to exclude countries with cultivated or casual observations for each species |
|
whether to exclude grid cells with zero habitat coverage of the respective habitat type |
|
minimum number of presence grid cells required for a species to be
included in the models, calculated after excluding grid cells with low
sampling effort ( |
-
mod_CV_prepare()
: prepare and visualise options for spatial-block cross-validation. In the CV_Dist strategy, block size is governed by the CV_n_grids argument, whereas in the CV_Large strategy, the study area is partitioned into larger blocks based on the CV_n_rows and CV_n_columns arguments.
|
number of cross-validation folds |
|
number of grid cells in each directions for the CV_Dist cross-validation strategy (default: 20, yielding 20 × 20 grid cell blocks). |
|
number of rows and columns defining in the CV_Large cross-validation strategy, partitioning the study area into large blocks (default: CV_n_rows = CV_n_columns = 2, resulting in four blocks divided at median coordinates). |
-
prepare_knots()
: prepare and visualise knot locations for Gaussian Predictive Process (GPP) models, as described by Tikhonov et al. (2019).
|
whether to incorporate spatial random effects using the Gaussian Predictive Process (GPP) |
|
distance (in kilometres; controlled by the min_distance
argument of prepare_knots() ) specifying both the spacing
between knots and the minimum distance between a knot and the nearest
sampling point
|
|
whether to plot the coordinates of sampling units and knots |
|
minimum and maximum number of latent factors to be include |
|
prior specification for the alpha parameter |
-
mod_SLURM()
: generate SLURM scripts to facilitate model fitting on GPUs using theHmsc-HPC
extension.
|
name assigned to the SLURM job |
|
number of tasks to execute |
|
Number of CPUs and GPUs allocated per node |
|
memory allocation per CPU |
|
maximum duration for job execution |
|
name of the HPC partition |
|
number of jobs within each SLURM script |
Other arguments:
- selection of predictors:
|
names of CHELSA variables to include in the model |
|
names of variables for which quadratic terms are incorporated |
|
whether to include the (log10-transformed) sampling effort as a predictor |
|
whether to include the (log10-transformed) summed road and railway intensity as a predictor |
|
whether to include the (log10-transformed) percentage coverage of the respective habitat type per grid cell as a predictor |
|
whether to include the (log10-transformed) total river length per grid cell as a predictor |
- model fitting options
|
number of MCMC chains |
|
thinning value(s) in MCMC sampling |
|
number of MCMC samples per chain |
|
transient multiplication factor. The value of transient will equal the
multiplication of |
|
interval at which MCMC sampling progress is reported |
|
floating-point precision mode for Hmsc-HPC sampling |
-
n_species_per_grid : minimum number of IAS per grid cell for a grid cell to be included in the analysis -
model_country : fit the model for a specific country or countries - whether or not to use phylogenetic trees:
use_phylo_tree andno_phylo_tree -
path_Hmsc : directory path toHmsc-HPC
extension installation -
SLURM_prepare : whether to prepare SLURM script for model fitting on GPU viamod_SLURM()
Model fitting on GPUs
Following the preparation of model input data and initialisation of
models, the subsequent phase involves fitting these models on GPUs. For
each habitat type, the mod_prepare_HPC()
function
produces:
- python commands (
Commands2Fit.txt ) for fitting model chains across all model variants on GPUs, with each line corresponding to a single chain.
- one or more SLURM script files (
Bash_Fit.slurm ) designed to submit all model-fitting commands (Commands2Fit.txt ) as batch jobs on a high-performance computing (HPC) system.
Batch jobs for model fitting can be submitted using the
sbatch
command, for example:
Previous
articles:
Next articles: