Model pipeline for post-processing fitted Hmsc models
Source:R/Mod_Postprocess.R
Mod_postprocessing.Rd
These functions post-process fitted Hmsc models on both CPU and GPU. The
pipeline is under active development and may change in future updates.
Currently, there are three main functions in this script:
Mod_Postprocess_1_CPU()
, Mod_Prep_TF()
, and Mod_Postprocess_2_CPU()
.
See details for more information.
Usage
Mod_Postprocess_1_CPU(
ModelDir = NULL,
Hab_Abb = NULL,
NCores = 8L,
EnvFile = ".env",
Path_Hmsc = NULL,
MemPerCpu = NULL,
Time = NULL,
FromJSON = FALSE,
GPP_Dist = NULL,
Tree = "Tree",
Samples = 1000L,
Thin = NULL,
NOmega = 1000L,
CVName = c("CV_Dist", "CV_Large"),
N_Grid = 50L,
UseTF = TRUE,
TF_use_single = FALSE,
LF_NCores = NCores,
LF_Temp_Cleanup = TRUE,
LF_Check = FALSE,
Temp_Cleanup = TRUE,
TF_Environ = NULL,
Pred_Clamp = TRUE,
Fix_Efforts = "q90",
Fix_Rivers = "q90",
Pred_NewSites = TRUE,
NCores_VP = 3,
PlotWidth_Omega = 26,
PlotHeight_Omega = 22.5,
PlotWidth_Beta = 25,
PlotHeight_Beta = 35
)
Mod_Prep_TF(
NumFiles = 210L,
EnvFile = ".env",
WD = NULL,
Partition_Name = "small-g",
LF_Time = "01:00:00",
VP_Time = "01:30:00"
)
Mod_Postprocess_2_CPU(
ModelDir = NULL,
Hab_Abb = NULL,
NCores = 8L,
EnvFile = ".env",
GPP_Dist = NULL,
Tree = "Tree",
Samples = 1000L,
Thin = NULL,
UseTF = TRUE,
TF_Environ = NULL,
TF_use_single = FALSE,
LF_NCores = NCores,
LF_Check = FALSE,
LF_Temp_Cleanup = TRUE,
Temp_Cleanup = TRUE,
N_Grid = 50L,
CC_Models = c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0",
"UKESM1-0-LL"),
CC_Scenario = c("ssp126", "ssp370", "ssp585"),
RC_NCores = 8L,
Pred_Clamp = TRUE,
Fix_Efforts = "q90",
Fix_Rivers = "q90",
Pred_NewSites = TRUE,
Tar = TRUE
)
Arguments
- ModelDir
Character. Path to the root directory of the fitted model.
- Hab_Abb
Character. Habitat abbreviation indicating the specific SynHab habitat type for which data will be prepared. Valid values are
0
,1
,2
,3
,4a
,4b
,10
,12a
,12b
. For more details, see Pysek et al..- NCores
Integer. Number of CPU cores to use for parallel processing. Default: 8.
- EnvFile
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- Path_Hmsc
Character. Path to the Hmsc-HPC installation.
- MemPerCpu
Character. Memory allocation per CPU core. Example: "32G" for 32 gigabytes. Required — if not provided, the function throws an error.
- Time
Character. Maximum allowed runtime for the job. Example: "01:00:00" for one hour. Required — if not provided, the function throws an error.
- FromJSON
Logical. Whether to convert loaded models from JSON format before reading. Defaults to
FALSE
.- GPP_Dist
Integer. Distance in kilometers between knots for the selected model.
- Tree
Character. Whether a phylogenetic tree was used in the selected model. Accepts "Tree" (default) or "NoTree".
- Thin, Samples
Integer. Thinning value and the number of MCMC samples of the selected model.
- NOmega
Integer. The number of species to be sampled for the
Omega
parameter transformation. Defaults to 100.- CVName
Character vector. Column name(s) in the model input data to be used to cross-validate the models (see Mod_PrepData and Mod_GetCV). The function allows the possibility of using more than one way of assigning grid cells into cross-validation folders. If multiple names are provided, separate cross-validation models will be fitted for each cross-validation type. Currently, there are three cross-validation strategies:
CV_SAC
,CV_Dist
, andCV_Large
. Defaults toc("CV_Dist", "CV_Large")
.- N_Grid
Integer. Number of points along the gradient for continuous focal variables. Higher values result in smoother curves. Default: 50. See Hmsc::constructGradient for details.
- UseTF
Logical. Whether to use TensorFlow for calculations. Defaults to
TRUE
.- TF_use_single
Logical. Whether to use single precision for the TensorFlow calculations. Defaults to
FALSE
.- LF_NCores
Integer. Number of cores to use for parallel processing of latent factor prediction. Defaults to 8L.
- LF_Temp_Cleanup
Logical. Whether to delete temporary files in the
Temp_Dir
directory after finishing the LF predictions.- LF_Check
Logical. If
TRUE
, the function checks if the output files are already created and valid. IfFALSE
, the function will only check if the files exist without checking their integrity. Default isFALSE
.- Temp_Cleanup
Logical. Whether to clean up temporary files. Defaults to
TRUE
.- TF_Environ
Character. Path to the Python environment. This argument is required if
UseTF
isTRUE
under Windows. Defaults toNULL
.- Pred_Clamp
Logical indicating whether to clamp the sampling efforts at a single value. If
TRUE
(default), theFix_Efforts
argument must be provided.- Fix_Efforts
Numeric or character. If
Pred_Clamp = TRUE
, the sampling efforts predictor with values U+02264Fix_Efforts
is fixed atFix_Efforts
during predictions. If numeric, the value is directly used (log10 scale). If character, it can be one ofmedian
,mean
,max
, orq90
(90% Quantile). Usingmax
can reflect extreme values caused by rare, highly sampled locations (e.g., urban centers or popular natural reserves). While using 90% quantile avoid such extreme grid cells while still capturing areas with high sampling effort. This argument is mandatory whenPred_Clamp
is set toTRUE
.- Fix_Rivers
Numeric or character. Similar to
Fix_Efforts
, but for fixing the length of rivers. If numeric, the value is directly used (log10 scale). If character, it can be one ofmedian
,mean
,max
,q90
(90% quantile). It can be alsoNULL
for not fixing the river length predictor. Defaults toq90
.- Pred_NewSites
Logical. Whether to predict habitat suitability at new sites. Default:
TRUE
. Note: This parameter is temporary and will be removed in future updates.- NCores_VP
Integer. Number of cores to use for variance partitioning. Defaults to 3.
- PlotWidth_Omega, PlotHeight_Omega, PlotWidth_Beta, PlotHeight_Beta
Integer. The width and height of the generated heatmaps of the Omega and Beta parameters in centimeters.
- NumFiles
Integer. Number of output batch files to create. Must be less than or equal to the maximum job limit of the HPC environment.
- WD
Character. Optionally sets the working directory in batch scripts to this path. If
NULL
, the directory remains unchanged.- Partition_Name
Character. Name of the partition to submit the SLURM jobs to. Default is
small-g
.- LF_Time, VP_Time
Character. Time limit for latent factor prediction and variance partitioning processing jobs, respectively. Default is
01:00:00
.- CC_Models
Character vector. Climate models for future predictions. Available options are
c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL")
(default).- CC_Scenario
Character vector. Climate scenarios for future predictions. Available options are:
c("ssp126", "ssp370", "ssp585")
(default).- RC_NCores
Integer. The number of cores to use for response curve prediction. Defaults to
8
.- Tar
Logical. Whether to compress the add files into a single
*.tar
file (without compression). Default:TRUE
.
Details
Mod_Postprocess_1_CPU
This function performs the initial post-processing step for habitat-specific fitted models, automating the following tasks:
check unsuccessful models: Mod_SLURM_Refit
merge chains and save R objects (fitted model object and coda object) to
qs2
orRData
files: Mod_Merge_Chainsvisualize the convergence of all model variants fitted Convergence_Plot_All
visualize the convergence of selected model, including plotting Gelman-Rubin-Brooks PlotGelman and Convergence_Plot for model convergence diagnostics of the
rho
,alpha
,omega
, andbeta
parameters.extract and save model summary: Mod_Summary
plotting model parameters: Mod_Heatmap_Omega, Mod_Heatmap_Beta
prepare data for cross-validation and fit initial cross-validated models: Mod_CV_Fit
Prepare scripts for GPU processing, including:
predicting latent factors of the response curves: RespCurv_PrepData
predicting latent factors for new sampling units: Predict_Maps
computing variance partitioning: VarPar_Compute
Mod_Prep_TF
After running Mod_Postprocess_1_CPU
for all habitat types, this function
prepares batch scripts for GPU computations of all habitat types:
for variance partitioning, the function matches all files with the pattern
"VP_.+Command.txt"
(created by VarPar_Compute and merges their contents into a single file (TF_postprocess/VP_Commands.txt
). Then, it prepares a SLURM script for variance partitioning computations (TF_postprocess/VP_SLURM.slurm
).for latent factor predictions, the function matches all files with the pattern
"^LF_NewSites_Commands_.+.txt|^LF_RC_Commands_.+txt"
and split their contents into multiple scripts at theTF_postprocess
directory for processing as a batch job. The function prepares a SLURM script for latent factor predictions (LF_SLURM.slurm
).
This function is tailored for the LUMI HPC environment and assumes that the
tensorflow
module is installed and correctly configured with all required
Python packages. On other HPC systems, users may need to modify the function
to load a Python virtual environment or install the required dependencies for
TensorFlow and related packages.
Mod_Postprocess_2_CPU
This function continues running the analysis pipeline for post-processing Hmsc by automating the following steps:
process and visualize response curves: Response_curves
predict habitat suitability across different climate options: Predict_Maps
plot species & SR predictions as JPEG: Mod_Predict_Plot
plot latent factors as JPEG: Mod_Plot_LF
process and visualize variance partitioning: VarPar_Compute and VarPar_Plot
compute and visualizing model internal evaluation (explanatory power): Mod_Eval_Plot
initiate post-processing of fitted cross-validated models: prepare commands for latent factor predictions on GPU — Ongoing
This function should be run after:
completing
Mod_Postprocess_1_CPU
andMod_Prep_TF
on CPU,running
VP_SLURM.slurm
andLF_SLURM.slurm
on GPU to process response curves and latent factor predictions (both scripts are generated byMod_Prep_TF
).submitting SLURM jobs for cross-validated model fitting.