Skip to contents

These functions post-process fitted Hmsc models on both CPU and GPU. The pipeline is under active development and may change in future updates. Currently, there are three main functions in this script: Mod_Postprocess_1_CPU(), Mod_Prep_TF(), and Mod_Postprocess_2_CPU(). See details for more information.

Usage

Mod_Postprocess_1_CPU(
  ModelDir = NULL,
  Hab_Abb = NULL,
  NCores = 8L,
  EnvFile = ".env",
  Path_Hmsc = NULL,
  MemPerCpu = NULL,
  Time = NULL,
  FromJSON = FALSE,
  GPP_Dist = NULL,
  Tree = "Tree",
  Samples = 1000L,
  Thin = NULL,
  NOmega = 1000L,
  CVName = c("CV_Dist", "CV_Large"),
  N_Grid = 50L,
  UseTF = TRUE,
  TF_use_single = FALSE,
  LF_NCores = NCores,
  LF_Temp_Cleanup = TRUE,
  LF_Check = FALSE,
  Temp_Cleanup = TRUE,
  TF_Environ = NULL,
  Pred_Clamp = TRUE,
  Fix_Efforts = "q90",
  Fix_Rivers = "q90",
  Pred_NewSites = TRUE,
  NCores_VP = 3,
  PlotWidth_Omega = 26,
  PlotHeight_Omega = 22.5,
  PlotWidth_Beta = 25,
  PlotHeight_Beta = 35
)

Mod_Prep_TF(
  NumFiles = 210L,
  EnvFile = ".env",
  WD = NULL,
  Partition_Name = "small-g",
  LF_Time = "01:00:00",
  VP_Time = "01:30:00"
)

Mod_Postprocess_2_CPU(
  ModelDir = NULL,
  Hab_Abb = NULL,
  NCores = 8L,
  EnvFile = ".env",
  GPP_Dist = NULL,
  Tree = "Tree",
  Samples = 1000L,
  Thin = NULL,
  UseTF = TRUE,
  TF_Environ = NULL,
  TF_use_single = FALSE,
  LF_NCores = NCores,
  LF_Check = FALSE,
  LF_Temp_Cleanup = TRUE,
  Temp_Cleanup = TRUE,
  N_Grid = 50L,
  CC_Models = c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0",
    "UKESM1-0-LL"),
  CC_Scenario = c("ssp126", "ssp370", "ssp585"),
  RC_NCores = 8L,
  Pred_Clamp = TRUE,
  Fix_Efforts = "q90",
  Fix_Rivers = "q90",
  Pred_NewSites = TRUE,
  Tar = TRUE
)

Arguments

ModelDir

Character. Path to the root directory of the fitted model.

Hab_Abb

Character. Habitat abbreviation indicating the specific SynHab habitat type for which data will be prepared. Valid values are 0, 1, 2, 3, 4a, 4b, 10, 12a, 12b. For more details, see Pysek et al..

NCores

Integer. Number of CPU cores to use for parallel processing. Default: 8.

EnvFile

Character. Path to the environment file containing paths to data sources. Defaults to .env.

Path_Hmsc

Character. Path to the Hmsc-HPC installation.

MemPerCpu

Character. Memory allocation per CPU core. Example: "32G" for 32 gigabytes. Required — if not provided, the function throws an error.

Time

Character. Maximum allowed runtime for the job. Example: "01:00:00" for one hour. Required — if not provided, the function throws an error.

FromJSON

Logical. Whether to convert loaded models from JSON format before reading. Defaults to FALSE.

GPP_Dist

Integer. Distance in kilometers between knots for the selected model.

Tree

Character. Whether a phylogenetic tree was used in the selected model. Accepts "Tree" (default) or "NoTree".

Thin, Samples

Integer. Thinning value and the number of MCMC samples of the selected model.

NOmega

Integer. The number of species to be sampled for the Omega parameter transformation. Defaults to 100.

CVName

Character vector. Column name(s) in the model input data to be used to cross-validate the models (see Mod_PrepData and Mod_GetCV). The function allows the possibility of using more than one way of assigning grid cells into cross-validation folders. If multiple names are provided, separate cross-validation models will be fitted for each cross-validation type. Currently, there are three cross-validation strategies: CV_SAC, CV_Dist, and CV_Large. Defaults to c("CV_Dist", "CV_Large").

N_Grid

Integer. Number of points along the gradient for continuous focal variables. Higher values result in smoother curves. Default: 50. See Hmsc::constructGradient for details.

UseTF

Logical. Whether to use TensorFlow for calculations. Defaults to TRUE.

TF_use_single

Logical. Whether to use single precision for the TensorFlow calculations. Defaults to FALSE.

LF_NCores

Integer. Number of cores to use for parallel processing of latent factor prediction. Defaults to 8L.

LF_Temp_Cleanup

Logical. Whether to delete temporary files in the Temp_Dir directory after finishing the LF predictions.

LF_Check

Logical. If TRUE, the function checks if the output files are already created and valid. If FALSE, the function will only check if the files exist without checking their integrity. Default is FALSE.

Temp_Cleanup

Logical. Whether to clean up temporary files. Defaults to TRUE.

TF_Environ

Character. Path to the Python environment. This argument is required if UseTF is TRUE under Windows. Defaults to NULL.

Pred_Clamp

Logical indicating whether to clamp the sampling efforts at a single value. If TRUE (default), the Fix_Efforts argument must be provided.

Fix_Efforts

Numeric or character. If Pred_Clamp = TRUE, the sampling efforts predictor with values U+02264 Fix_Efforts is fixed at Fix_Efforts during predictions. If numeric, the value is directly used (log10 scale). If character, it can be one of median, mean, max, or q90 (90% Quantile). Using max can reflect extreme values caused by rare, highly sampled locations (e.g., urban centers or popular natural reserves). While using 90% quantile avoid such extreme grid cells while still capturing areas with high sampling effort. This argument is mandatory when Pred_Clamp is set to TRUE.

Fix_Rivers

Numeric or character. Similar to Fix_Efforts, but for fixing the length of rivers. If numeric, the value is directly used (log10 scale). If character, it can be one of median, mean, max, q90 (90% quantile). It can be also NULL for not fixing the river length predictor. Defaults to q90.

Pred_NewSites

Logical. Whether to predict habitat suitability at new sites. Default: TRUE. Note: This parameter is temporary and will be removed in future updates.

NCores_VP

Integer. Number of cores to use for variance partitioning. Defaults to 3.

PlotWidth_Omega, PlotHeight_Omega, PlotWidth_Beta, PlotHeight_Beta

Integer. The width and height of the generated heatmaps of the Omega and Beta parameters in centimeters.

NumFiles

Integer. Number of output batch files to create. Must be less than or equal to the maximum job limit of the HPC environment.

WD

Character. Optionally sets the working directory in batch scripts to this path. If NULL, the directory remains unchanged.

Partition_Name

Character. Name of the partition to submit the SLURM jobs to. Default is small-g.

LF_Time, VP_Time

Character. Time limit for latent factor prediction and variance partitioning processing jobs, respectively. Default is 01:00:00.

CC_Models

Character vector. Climate models for future predictions. Available options are c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL") (default).

CC_Scenario

Character vector. Climate scenarios for future predictions. Available options are: c("ssp126", "ssp370", "ssp585") (default).

RC_NCores

Integer. The number of cores to use for response curve prediction. Defaults to 8.

Tar

Logical. Whether to compress the add files into a single *.tar file (without compression). Default: TRUE.

Details

Mod_Postprocess_1_CPU

This function performs the initial post-processing step for habitat-specific fitted models, automating the following tasks:


Mod_Prep_TF

After running Mod_Postprocess_1_CPU for all habitat types, this function prepares batch scripts for GPU computations of all habitat types:

  • for variance partitioning, the function matches all files with the pattern "VP_.+Command.txt" (created by VarPar_Compute and merges their contents into a single file (TF_postprocess/VP_Commands.txt). Then, it prepares a SLURM script for variance partitioning computations (TF_postprocess/VP_SLURM.slurm).

  • for latent factor predictions, the function matches all files with the pattern "^LF_NewSites_Commands_.+.txt|^LF_RC_Commands_.+txt" and split their contents into multiple scripts at the TF_postprocess directory for processing as a batch job. The function prepares a SLURM script for latent factor predictions (LF_SLURM.slurm).

This function is tailored for the LUMI HPC environment and assumes that the tensorflow module is installed and correctly configured with all required Python packages. On other HPC systems, users may need to modify the function to load a Python virtual environment or install the required dependencies for TensorFlow and related packages.


Mod_Postprocess_2_CPU

This function continues running the analysis pipeline for post-processing Hmsc by automating the following steps:

  • process and visualize response curves: Response_curves

  • predict habitat suitability across different climate options: Predict_Maps

  • plot species & SR predictions as JPEG: Mod_Predict_Plot

  • plot latent factors as JPEG: Mod_Plot_LF

  • process and visualize variance partitioning: VarPar_Compute and VarPar_Plot

  • compute and visualizing model internal evaluation (explanatory power): Mod_Eval_Plot

  • initiate post-processing of fitted cross-validated models: prepare commands for latent factor predictions on GPU — Ongoing

This function should be run after:

  • completing Mod_Postprocess_1_CPU and Mod_Prep_TF on CPU,

  • running VP_SLURM.slurm and LF_SLURM.slurm on GPU to process response curves and latent factor predictions (both scripts are generated by Mod_Prep_TF).

  • submitting SLURM jobs for cross-validated model fitting.

Author

Ahmed El-Gabbas