Species Distribution Modelling Workflow for Single-Species Models
Source:R/mod_ssdm_fit.R
fit_sdm_models.Rd
This comprehensive workflow implements single-species species distribution
models (sSDMs) for invasive alien plant species in Europe at the habitat
level. It orchestrates the entire process from data preparation to model
fitting, evaluation, and prediction across current and future climate
scenarios. The workflow employs the sdm
R package for model fitting and
handles cross-validation, parallel processing, and various environmental
predictors.
Usage
fit_sdm_models(
sdm_method = NULL,
model_settings = NULL,
model_dir = NULL,
hab_abb = NULL,
cv_type = "CV_Dist",
n_cores = 8L,
future_max_size = 2000L,
selected_species = NULL,
excluded_species = NULL,
env_file = ".env",
clamp_pred = TRUE,
fix_efforts = "q90",
fix_rivers = "q90",
climate_models = "all",
climate_scenarios = "all",
climate_periods = "all",
copy_maxent_html = TRUE
)
Arguments
- sdm_method
Character. A single SDM algorithm to use for fitting models. Valid values: "glm", "glmpoly", "gam", "glmnet", "mars", "gbm", "rf", "ranger", "cart", "rpart", "maxent", "mlp", "rbf", "svm", "mda", and "fda". These correspond to selected methods supported by the
sdm
package. For details and supported options, seesdm::getmethodNames()
.- model_settings
List or NULL. List of model-specific settings. If
NULL
, defaults to custom settings defined within the workflow.- model_dir
Character. Path to the directory containing model data and where outputs and results will be saved. Model data are prepared using the
mod_prepare_HPC()
andmod_prepare_data()
functions.- hab_abb
Character. Abbreviation for a single SynHab habitat type. Valid values: "0", "1", "2", "3", "4a", "4b", "10", "12a", "12b". See
mod_prepare_HPC()
for more details.- cv_type
Character. Cross-validation type. One of
CV_Dist
(default) orCV_Large
. Seemod_CV_fit()
for more details.- n_cores
Integer. Number of CPU cores for parallel processing. Default is 8.
- future_max_size
Numeric. Maximum allowed total size (in megabytes) of global variables identified. See
ecokit::set_parallel()
andfuture.globals.maxSize
argument offuture::future.options()
for more details.- selected_species, excluded_species
Character vector or NULL. Names of species to include or exclude for modelling.
- env_file
Character. Path to a file with environment variable definitions for spatial datasets. Default is
".env"
.- clamp_pred
Logical. Should clamping be applied to sampling efforts and river length predictors for prediction? Default is
TRUE
.- fix_efforts, fix_rivers
Character or numeric (length 1). Method or fixed value for sampling effort and river length (both at log-scale) when clamping is enabled (
clamp_pred = TRUE
). Valid methods: "identity" (use observed, with no clamping), summary statistics for the sampling efforts layer ("median", "mean", "max", or "q90" (default; 90th percentile)), or a single numeric value within observed range.- climate_models
Character vector or "all". Which climate change models to use for future projections. Valid values (case-sensitive): "GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL", or "all" (default, meaning all available models). If a subset, must be a subset of the listed valid models.
- climate_scenarios
Character vector or "all". Which climate change scenarios to use for future projections. Valid values: "ssp126", "ssp370", "ssp585", or "all" (default, meaning all available scenarios). If a subset, must be a subset of the listed valid scenarios.
- climate_periods
Character vector or "all". Time periods for prediction. Valid values are "2011-2040", "2041-2070", "2071-2100", or "all" (default), or subset of supported periods.
- copy_maxent_html
Logical. Whether to copy the directory containing HTML results from Maxent to the modelling directory. Default is
TRUE
.
Value
A tibble summarizing model results for each species, including:
Evaluation metrics for training and testing data (AUC, TSS, Kappa, etc.)
Variable importance scores
Response curves for each environmental variable
Prediction summaries for current and future climate scenarios
Paths to generated model files and prediction rasters
Additionally, the function saves various outputs to disk for future use:
Fitted model objects (as .RData files)
Extracted model information (evaluation metrics, variable importance, etc.)
Prediction rasters for each species, cross-validation fold, and climate scenario
Summary statistics across CV folds (mean, weighted mean, SD, and coefficient of variation)
Species richness maps for each climate scenario
Details
The fit_sdm_models
function orchestrates a comprehensive workflow
that handles all aspects of single-species distribution modelling for
invasive alien plant species in Europe. The workflow integrates several
internal components that manage different stages of the modelling process:
Overall workflow:
Input validation: Checks all parameters for validity
Data preparation: Loads and processes model data
Parallel processing setup: Configures computational resources
Model fitting and prediction: For each species and CV fold
Results summarization: Compiles metrics, variable importance, and predictions
Species richness calculation: Across all modelled species
Core capabilities:
Data preparation: The workflow validates and prepares necessary input data including modelling data, environmental predictors, and prediction datasets. It handles species selection, data loading, and preprocessing of spatial predictors (including clamping of sampling efforts and river length when required).
Model parameterization: The function provides carefully selected default settings for various SDM algorithms, ensuring consistent parameterization across models.
Model information extraction: After fitting, the workflow automatically extracts key information from fitted SDM objects, including evaluation metrics, variable importance, and response curves.
Model optimization: Technical improvements like optimizing SDM model object size by setting formula environments to the base environment address known issues in the sdm package.
Parallel prediction: The workflow efficiently generates predictions for each species and cross-validation fold, handling model fitting, information extraction, prediction, and file saving in parallel.
Statistical summarization: Summary statistics are calculated across cross-validation folds, including mean, weighted mean (by test AUC), standard deviation, and coefficient of variation of predictions.