Model pipeline for post-processing fitted Hmsc models
Source:R/mod_postprocess.R
, R/mod_postprocess_CV.R
Mod_postprocessing.Rd
These functions post-process fitted Hmsc models on both CPU and GPU. The main
functions in the pipeline includes mod_postprocess_1_CPU
, mod_prepare_TF
,
and mod_postprocess_2_CPU
for full models without cross-validation, as well
as mod_postprocess_CV_1_CPU
and mod_postprocess_CV_2_CPU
for
cross-validated models. See details for more information.
Usage
mod_postprocess_1_CPU(
model_dir = NULL,
hab_abb = NULL,
n_cores = 8L,
strategy = "multisession",
env_file = ".env",
path_Hmsc = NULL,
memory_per_cpu = "64G",
job_runtime = NULL,
from_JSON = FALSE,
GPP_dist = NULL,
use_trees = "Tree",
MCMC_n_samples = 1000L,
MCMC_thin = NULL,
n_omega = 1000L,
CV_name = c("CV_Dist", "CV_Large"),
n_grid = 50L,
use_TF = TRUE,
TF_use_single = FALSE,
LF_n_cores = n_cores,
LF_temp_cleanup = TRUE,
LF_check = FALSE,
temp_cleanup = TRUE,
TF_environ = NULL,
clamp_pred = TRUE,
fix_efforts = "q90",
fix_rivers = "q90",
pred_new_sites = TRUE,
n_cores_VP = 10L,
width_omega = 26,
height_omega = 22.5,
width_beta = 25,
height_beta = 35
)
mod_prepare_TF(
process_VP = TRUE,
process_LF = TRUE,
n_batch_files = 210L,
env_file = ".env",
working_directory = NULL,
partition_name = "small-g",
LF_runtime = "01:00:00",
model_prefix = NULL,
VP_runtime = "02:00:00"
)
mod_postprocess_2_CPU(
model_dir = NULL,
hab_abb = NULL,
n_cores = 8L,
strategy = "multisession",
env_file = ".env",
GPP_dist = NULL,
use_trees = "Tree",
MCMC_n_samples = 1000L,
MCMC_thin = NULL,
use_TF = TRUE,
TF_environ = NULL,
TF_use_single = FALSE,
LF_n_cores = n_cores,
LF_check = FALSE,
LF_temp_cleanup = TRUE,
temp_cleanup = TRUE,
n_grid = 50L,
CC_models = c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0",
"UKESM1-0-LL"),
CC_scenario = c("ssp126", "ssp370", "ssp585"),
RC_n_cores = 8L,
clamp_pred = TRUE,
fix_efforts = "q90",
fix_rivers = "q90",
pred_new_sites = TRUE,
tar_predictions = TRUE,
RC_prepare = TRUE,
RC_plot = TRUE,
VP_prepare = TRUE,
VP_plot = TRUE,
predict_suitability = TRUE,
plot_predictions = TRUE,
plot_LF = TRUE,
plot_internal_evaluation = TRUE
)
mod_postprocess_CV_1_CPU(
model_dir = NULL,
CV_names = NULL,
n_cores = 8L,
strategy = "multisession",
env_file = ".env",
from_JSON = FALSE,
use_TF = TRUE,
TF_use_single = FALSE,
TF_environ = NULL,
LF_n_cores = n_cores,
LF_only = TRUE,
LF_temp_cleanup = TRUE,
LF_check = FALSE,
LF_runtime = "01:00:00",
temp_cleanup = TRUE,
n_batch_files = 210L,
working_directory = NULL,
partition_name = "small-g"
)
mod_postprocess_CV_2_CPU(
model_dir = NULL,
CV_names = NULL,
n_cores = 8L,
strategy = "multisession",
env_file = ".env",
use_TF = TRUE,
TF_use_single = FALSE,
temp_cleanup = TRUE,
LF_temp_cleanup = TRUE,
TF_environ = NULL,
LF_n_cores = n_cores,
LF_check = FALSE
)
Arguments
- model_dir
Character. Path to the root directory of the fitted model.
- hab_abb
Character. Habitat abbreviation indicating the specific SynHab habitat type. Valid values:
0
,1
,2
,3
,4a
,4b
,10
,12a
,12b
. See Pysek et al. for details.- n_cores
Integer. Number of CPU cores to use for parallel processing. Default: 8.
- strategy
Character. The parallel processing strategy to use. Valid options are "sequential", "multisession" (default), "multicore", and "cluster". See
future::plan()
andecokit::set_parallel()
for details.- env_file
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- path_Hmsc
Character. Path to the Hmsc-HPC installation.
- memory_per_cpu
Character. Memory allocation per CPU core. Example: "32G" for 32 gigabytes. Defaults to "64G".
- job_runtime
Character. Maximum allowed runtime for the job. Example: "01:00:00" for one hour. Required — if not provided, the function throws an error.
- from_JSON
Logical. Whether to convert loaded models from JSON format before reading. Defaults to
FALSE
.- GPP_dist
Integer. Distance in kilometres between knots for the selected model.
- use_trees
Character. Whether a phylogenetic tree was used in the selected model. Accepts "Tree" (default) or "NoTree".
- MCMC_thin, MCMC_n_samples
Integer. Thinning value and the number of MCMC samples of the selected model.
- n_omega
Integer. The number of species to be sampled for the
Omega
parameter transformation. Defaults to 100.- CV_name
Character. Cross-validation strategy. Valid values are
CV_Dist
,CV_Large
, orCV_SAC
.- n_grid
Integer. Number of points along the gradient for continuous focal variables. Higher values result in smoother curves. Default: 50. See Hmsc::constructGradient for details.
- use_TF
Logical. Whether to use
TensorFlow
for calculations. Defaults toTRUE
.- TF_use_single
Logical. Whether to use single precision for the
TensorFlow
calculations. Defaults toFALSE
.- LF_n_cores
Integer. Number of cores to use for parallel processing of latent factor prediction. Defaults to 8L.
- LF_temp_cleanup
Logical. Whether to delete temporary files in the
temp_dir
directory after finishing the LF predictions.- LF_check
Logical. If
TRUE
, the function checks if the output files are already created and valid. IfFALSE
, the function will only check if the files exist without checking their integrity. Default isFALSE
.- temp_cleanup
Logical. Whether to clean up temporary files. Defaults to
TRUE
.- TF_environ
Character. Path to the Python environment. This argument is required if
use_TF
isTRUE
under Windows. Defaults toNULL
.- clamp_pred
Logical indicating whether to clamp the sampling efforts at a single value. If
TRUE
(default), thefix_efforts
argument must be provided.- fix_efforts
Numeric or character. When
clamp_pred = TRUE
, fixes the sampling efforts predictor at this value during predictions. If numeric, uses the value directly (on log10 scale). If character, must be one ofmedian
,mean
,max
, orq90
(90% quantile). Usingmax
may reflect extreme sampling efforts from highly sampled locations, whileq90
captures high sampling areas without extremes. Required ifclamp_pred = TRUE
.- fix_rivers
Numeric, character, or
NULL
. Similar tofix_efforts
, but for the river length predictor. IfNULL
, the river length is not fixed. Default:q90
.- pred_new_sites
Logical. Whether to predict suitability at new sites. Default:
TRUE
.- n_cores_VP
Integer. Number of cores to use for processing variance partitioning. Defaults to 10L.
- width_omega, height_omega, width_beta, height_beta
Integer. The width and height of the generated heatmaps of the Omega and Beta parameters in centimetres.
- process_VP
Logical. Whether to prepares batch scripts for variance partitioning GPU computations on GPUs. Defaults to
TRUE
.- process_LF
Logical. Whether to prepares batch scripts for latent factor predictions GPU computations on GPUs. Defaults to
TRUE
.- n_batch_files
Integer. Number of output batch files to create. Must be less than or equal to the maximum job limit of the HPC environment.
- working_directory
Character. Optionally sets the working directory in batch scripts to this path. If
NULL
, the directory remains unchanged.- partition_name
Character. Name of the partition to submit the SLURM jobs to. Default is
small-g
.- LF_runtime, VP_runtime
Character. Time limit for latent factor prediction and variance partitioning processing jobs, respectively. Defaults are
01:00:00
and02:00:00
respectively.- model_prefix
Character. Prefix for the model name. A directory named
model_prefix_TF
is created in themodel_dir
to store theTensorFlow
running commands. Defaults toNULL
. This can not beNULL
.- CC_models
Character vector. Climate models for future predictions. Available options are
c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL")
(default).- CC_scenario
Character vector. Climate scenarios for future predictions. Available options are:
c("ssp126", "ssp370", "ssp585")
(default).- RC_n_cores
Integer. The number of cores to use for response curve prediction. Defaults to
8
.- tar_predictions
Logical. Whether to compress the add files into a single
*.tar
file (without compression). Default:TRUE
.- RC_prepare
Logical. Whether to prepare the data for response curve prediction (using resp_curv_prepare_data). Defaults to
TRUE
.- RC_plot
Logical. Whether to plot the response curves as JPEG files (using resp_curv_plot_SR, resp_curv_plot_species, and resp_curv_plot_species_all). Defaults to
TRUE
.- VP_prepare
Logical. Whether to prepare the data for variance partitioning (using variance_partitioning_compute). Defaults to
TRUE
.- VP_plot
Logical. Whether to plot the variance partitioning results (using variance_partitioning_plot). Defaults to
TRUE
.- predict_suitability
Logical. Whether to predict habitat suitability across different climate options (using predict_maps). Defaults to
TRUE
.- plot_predictions
Logical. Whether to plot species and species richness predictions as JPEG files (using plot_prediction). Defaults to
TRUE
.- plot_LF
Logical. Whether to plot latent factors as JPEG files (using plot_latent_factor). Defaults to
TRUE
.- plot_internal_evaluation
Logical. Whether to compute and visualise model internal evaluation (explanatory power) using plot_evaluation. Defaults to
TRUE
.- CV_names
Character vector. Names of cross-validation strategies to merge, matching those used during model setup. Defaults to
c("CV_Dist", "CV_Large")
. The names should be one ofCV_Dist
,CV_Large
, orCV_SAC
. Applies only tomod_merge_chains_CV
.- LF_only
Logical. Whether to predict only the latent factor. This is useful for distributing processing load between GPU and CPU. When
LF_only = TRUE
, latent factor prediction needs to be computed separately on GPU. When computations are finished on GPU, the function can later be rerun withLF_only = FALSE
(default) to predict habitat suitability using the already-computed latent factor predictions.
Details
mod_postprocess_1_CPU
This function performs the initial post-processing step for habitat-specific fitted models, automating the following tasks:
check unsuccessful models: mod_SLURM_refit
merge chains and save R objects (fitted model object and coda object) to
qs2
orRData
files: mod_merge_chainsvisualise the convergence of all model variants fitted convergence_plot_all
visualise the convergence of selected model, including plotting Gelman-Rubin-Brooks plot_gelman and convergence_plot for model convergence diagnostics of the
rho
,alpha
,omega
, andbeta
parameters.extract and save model summary: mod_summary
plotting model parameters: mod_heatmap_omega, mod_heatmap_beta
prepare data for cross-validation and fit initial cross-validated models: mod_CV_fit
Prepare scripts for GPU processing, including:
predicting latent factors of the response curves: resp_curv_prepare_data
predicting latent factors for new sampling units: predict_maps
computing variance partitioning: variance_partitioning_compute
mod_prepare_TF
After running mod_postprocess_1_CPU
for all habitat types, this function
prepares batch scripts for GPU computations of all habitat types:
for variance partitioning, the function matches all files with the pattern
"VP_.+Command.txt"
(created by variance_partitioning_compute and merges their contents into a single file (model_prefix_TF/VP_Commands.txt
). Then, it prepares a SLURM script for variance partitioning computations (model_prefix_TF/VP_SLURM.slurm
).for latent factor predictions, the function matches all files with the pattern
"^LF_NewSites_Commands_.+.txt|^LF_RC_Commands_.+txt"
and split their contents into multiple scripts at themodel_prefix_TF
directory for processing as a batch job. The function prepares a SLURM script for latent factor predictions (LF_SLURM.slurm
).
This function is tailored for the LUMI HPC environment and assumes that the
tensorflow
module is installed and correctly configured with all required
Python packages. On other HPC systems, users may need to modify the function
to load a Python virtual environment or install the required dependencies for
TensorFlow
and related packages.
mod_postprocess_2_CPU
This function continues running the analysis pipeline for post-processing Hmsc by automating the following steps:
process and visualise response curves: response_curves
predict habitat suitability across different climate options: predict_maps
plot species & SR predictions as JPEG: plot_prediction
plot latent factors as JPEG: plot_latent_factor
process and visualise variance partitioning: variance_partitioning_compute and variance_partitioning_plot
compute and visualizing model internal evaluation (explanatory power): plot_evaluation
initiate post-processing of fitted cross-validated models: prepare commands for latent factor predictions on GPU — Ongoing
This function should be run after:
completing
mod_postprocess_1_CPU
andmod_prepare_TF
on CPU,running
VP_SLURM.slurm
andLF_SLURM.slurm
on GPU to process response curves and latent factor predictions (both scripts are generated bymod_prepare_TF
).submitting SLURM jobs for cross-validated model fitting.
mod_postprocess_CV_1_CPU
This function is similar to mod_postprocess_1_CPU
, but it is specifically
designed for cross-validated models. It automates merging fitted
cross-validated model chains into Hmsc
model objects and prepare scripts
for latent factor prediction on TensorFlow
using predict_maps_CV.
mod_postprocess_CV_2_CPU
The function 1) processes *.feather
files resulted from Latent Factor
predictions (using TensorFlow
) and saves LF predication to disk; 2)
predicts species-specific mean habitat suitability at testing
cross-validation folds and calculates testing evaluation metrics; 3)
generates plots of the evaluation metrics.