Skip to contents

These functions post-process fitted Hmsc models on both CPU and GPU. The main functions in the pipeline includes mod_postprocess_1_CPU, mod_prepare_TF, and mod_postprocess_2_CPU for full models without cross-validation, as well as mod_postprocess_CV_1_CPU and mod_postprocess_CV_2_CPU for cross-validated models. See details for more information.

Usage

mod_postprocess_1_CPU(
  model_dir = NULL,
  hab_abb = NULL,
  n_cores = 8L,
  strategy = "multisession",
  env_file = ".env",
  path_Hmsc = NULL,
  memory_per_cpu = "64G",
  job_runtime = NULL,
  from_JSON = FALSE,
  GPP_dist = NULL,
  use_trees = "Tree",
  MCMC_n_samples = 1000L,
  MCMC_thin = NULL,
  n_omega = 1000L,
  CV_name = c("CV_Dist", "CV_Large"),
  n_grid = 50L,
  use_TF = TRUE,
  TF_use_single = FALSE,
  LF_n_cores = n_cores,
  LF_temp_cleanup = TRUE,
  LF_check = FALSE,
  temp_cleanup = TRUE,
  TF_environ = NULL,
  clamp_pred = TRUE,
  fix_efforts = "q90",
  fix_rivers = "q90",
  pred_new_sites = TRUE,
  n_cores_VP = 10L,
  width_omega = 26,
  height_omega = 22.5,
  width_beta = 25,
  height_beta = 35
)

mod_prepare_TF(
  process_VP = TRUE,
  process_LF = TRUE,
  n_batch_files = 210L,
  env_file = ".env",
  working_directory = NULL,
  partition_name = "small-g",
  LF_runtime = "01:00:00",
  model_prefix = NULL,
  VP_runtime = "02:00:00"
)

mod_postprocess_2_CPU(
  model_dir = NULL,
  hab_abb = NULL,
  n_cores = 8L,
  strategy = "multisession",
  env_file = ".env",
  GPP_dist = NULL,
  use_trees = "Tree",
  MCMC_n_samples = 1000L,
  MCMC_thin = NULL,
  use_TF = TRUE,
  TF_environ = NULL,
  TF_use_single = FALSE,
  LF_n_cores = n_cores,
  LF_check = FALSE,
  LF_temp_cleanup = TRUE,
  temp_cleanup = TRUE,
  n_grid = 50L,
  CC_models = c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0",
    "UKESM1-0-LL"),
  CC_scenario = c("ssp126", "ssp370", "ssp585"),
  RC_n_cores = 8L,
  clamp_pred = TRUE,
  fix_efforts = "q90",
  fix_rivers = "q90",
  pred_new_sites = TRUE,
  tar_predictions = TRUE,
  RC_prepare = TRUE,
  RC_plot = TRUE,
  VP_prepare = TRUE,
  VP_plot = TRUE,
  predict_suitability = TRUE,
  plot_predictions = TRUE,
  plot_LF = TRUE,
  plot_internal_evaluation = TRUE
)

mod_postprocess_CV_1_CPU(
  model_dir = NULL,
  CV_names = NULL,
  n_cores = 8L,
  strategy = "multisession",
  env_file = ".env",
  from_JSON = FALSE,
  use_TF = TRUE,
  TF_use_single = FALSE,
  TF_environ = NULL,
  LF_n_cores = n_cores,
  LF_only = TRUE,
  LF_temp_cleanup = TRUE,
  LF_check = FALSE,
  LF_runtime = "01:00:00",
  temp_cleanup = TRUE,
  n_batch_files = 210L,
  working_directory = NULL,
  partition_name = "small-g"
)

mod_postprocess_CV_2_CPU(
  model_dir = NULL,
  CV_names = NULL,
  n_cores = 8L,
  strategy = "multisession",
  env_file = ".env",
  use_TF = TRUE,
  TF_use_single = FALSE,
  temp_cleanup = TRUE,
  LF_temp_cleanup = TRUE,
  TF_environ = NULL,
  LF_n_cores = n_cores,
  LF_check = FALSE
)

Arguments

model_dir

Character. Path to the root directory of the fitted model.

hab_abb

Character. Habitat abbreviation indicating the specific SynHab habitat type. Valid values: 0, 1, 2, 3, 4a, 4b, 10, 12a, 12b. See Pysek et al. for details.

n_cores

Integer. Number of CPU cores to use for parallel processing. Default: 8.

strategy

Character. The parallel processing strategy to use. Valid options are "sequential", "multisession" (default), "multicore", and "cluster". See future::plan() and ecokit::set_parallel() for details.

env_file

Character. Path to the environment file containing paths to data sources. Defaults to .env.

path_Hmsc

Character. Path to the Hmsc-HPC installation.

memory_per_cpu

Character. Memory allocation per CPU core. Example: "32G" for 32 gigabytes. Defaults to "64G".

job_runtime

Character. Maximum allowed runtime for the job. Example: "01:00:00" for one hour. Required — if not provided, the function throws an error.

from_JSON

Logical. Whether to convert loaded models from JSON format before reading. Defaults to FALSE.

GPP_dist

Integer. Distance in kilometres between knots for the selected model.

use_trees

Character. Whether a phylogenetic tree was used in the selected model. Accepts "Tree" (default) or "NoTree".

MCMC_thin, MCMC_n_samples

Integer. Thinning value and the number of MCMC samples of the selected model.

n_omega

Integer. The number of species to be sampled for the Omega parameter transformation. Defaults to 100.

CV_name

Character. Cross-validation strategy. Valid values are CV_Dist, CV_Large, or CV_SAC.

n_grid

Integer. Number of points along the gradient for continuous focal variables. Higher values result in smoother curves. Default: 50. See Hmsc::constructGradient for details.

use_TF

Logical. Whether to use TensorFlow for calculations. Defaults to TRUE.

TF_use_single

Logical. Whether to use single precision for the TensorFlow calculations. Defaults to FALSE.

LF_n_cores

Integer. Number of cores to use for parallel processing of latent factor prediction. Defaults to 8L.

LF_temp_cleanup

Logical. Whether to delete temporary files in the temp_dir directory after finishing the LF predictions.

LF_check

Logical. If TRUE, the function checks if the output files are already created and valid. If FALSE, the function will only check if the files exist without checking their integrity. Default is FALSE.

temp_cleanup

Logical. Whether to clean up temporary files. Defaults to TRUE.

TF_environ

Character. Path to the Python environment. This argument is required if use_TF is TRUE under Windows. Defaults to NULL.

clamp_pred

Logical indicating whether to clamp the sampling efforts at a single value. If TRUE (default), the fix_efforts argument must be provided.

fix_efforts

Numeric or character. When clamp_pred = TRUE, fixes the sampling efforts predictor at this value during predictions. If numeric, uses the value directly (on log10 scale). If character, must be one of median, mean, max, or q90 (90% quantile). Using max may reflect extreme sampling efforts from highly sampled locations, while q90 captures high sampling areas without extremes. Required if clamp_pred = TRUE.

fix_rivers

Numeric, character, or NULL. Similar to fix_efforts, but for the river length predictor. If NULL, the river length is not fixed. Default: q90.

pred_new_sites

Logical. Whether to predict suitability at new sites. Default: TRUE.

n_cores_VP

Integer. Number of cores to use for processing variance partitioning. Defaults to 10L.

width_omega, height_omega, width_beta, height_beta

Integer. The width and height of the generated heatmaps of the Omega and Beta parameters in centimetres.

process_VP

Logical. Whether to prepares batch scripts for variance partitioning GPU computations on GPUs. Defaults to TRUE.

process_LF

Logical. Whether to prepares batch scripts for latent factor predictions GPU computations on GPUs. Defaults to TRUE.

n_batch_files

Integer. Number of output batch files to create. Must be less than or equal to the maximum job limit of the HPC environment.

working_directory

Character. Optionally sets the working directory in batch scripts to this path. If NULL, the directory remains unchanged.

partition_name

Character. Name of the partition to submit the SLURM jobs to. Default is small-g.

LF_runtime, VP_runtime

Character. Time limit for latent factor prediction and variance partitioning processing jobs, respectively. Defaults are 01:00:00 and 02:00:00 respectively.

model_prefix

Character. Prefix for the model name. A directory named model_prefix_TF is created in the model_dir to store the TensorFlow running commands. Defaults to NULL. This can not be NULL.

CC_models

Character vector. Climate models for future predictions. Available options are c("GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL") (default).

CC_scenario

Character vector. Climate scenarios for future predictions. Available options are: c("ssp126", "ssp370", "ssp585") (default).

RC_n_cores

Integer. The number of cores to use for response curve prediction. Defaults to 8.

tar_predictions

Logical. Whether to compress the add files into a single *.tar file (without compression). Default: TRUE.

RC_prepare

Logical. Whether to prepare the data for response curve prediction (using resp_curv_prepare_data). Defaults to TRUE.

RC_plot

Logical. Whether to plot the response curves as JPEG files (using resp_curv_plot_SR, resp_curv_plot_species, and resp_curv_plot_species_all). Defaults to TRUE.

VP_prepare

Logical. Whether to prepare the data for variance partitioning (using variance_partitioning_compute). Defaults to TRUE.

VP_plot

Logical. Whether to plot the variance partitioning results (using variance_partitioning_plot). Defaults to TRUE.

predict_suitability

Logical. Whether to predict habitat suitability across different climate options (using predict_maps). Defaults to TRUE.

plot_predictions

Logical. Whether to plot species and species richness predictions as JPEG files (using plot_prediction). Defaults to TRUE.

plot_LF

Logical. Whether to plot latent factors as JPEG files (using plot_latent_factor). Defaults to TRUE.

plot_internal_evaluation

Logical. Whether to compute and visualise model internal evaluation (explanatory power) using plot_evaluation. Defaults to TRUE.

CV_names

Character vector. Names of cross-validation strategies to merge, matching those used during model setup. Defaults to c("CV_Dist", "CV_Large"). The names should be one of CV_Dist, CV_Large, or CV_SAC. Applies only to mod_merge_chains_CV.

LF_only

Logical. Whether to predict only the latent factor. This is useful for distributing processing load between GPU and CPU. When LF_only = TRUE, latent factor prediction needs to be computed separately on GPU. When computations are finished on GPU, the function can later be rerun with LF_only = FALSE (default) to predict habitat suitability using the already-computed latent factor predictions.

Details

mod_postprocess_1_CPU

This function performs the initial post-processing step for habitat-specific fitted models, automating the following tasks:


mod_prepare_TF

After running mod_postprocess_1_CPU for all habitat types, this function prepares batch scripts for GPU computations of all habitat types:

  • for variance partitioning, the function matches all files with the pattern "VP_.+Command.txt" (created by variance_partitioning_compute and merges their contents into a single file (model_prefix_TF/VP_Commands.txt). Then, it prepares a SLURM script for variance partitioning computations (model_prefix_TF/VP_SLURM.slurm).

  • for latent factor predictions, the function matches all files with the pattern "^LF_NewSites_Commands_.+.txt|^LF_RC_Commands_.+txt" and split their contents into multiple scripts at the model_prefix_TF directory for processing as a batch job. The function prepares a SLURM script for latent factor predictions (LF_SLURM.slurm).

This function is tailored for the LUMI HPC environment and assumes that the tensorflow module is installed and correctly configured with all required Python packages. On other HPC systems, users may need to modify the function to load a Python virtual environment or install the required dependencies for TensorFlow and related packages.



mod_postprocess_2_CPU

This function continues running the analysis pipeline for post-processing Hmsc by automating the following steps:

This function should be run after:

  • completing mod_postprocess_1_CPU and mod_prepare_TF on CPU,

  • running VP_SLURM.slurm and LF_SLURM.slurm on GPU to process response curves and latent factor predictions (both scripts are generated by mod_prepare_TF).

  • submitting SLURM jobs for cross-validated model fitting.


mod_postprocess_CV_1_CPU

This function is similar to mod_postprocess_1_CPU, but it is specifically designed for cross-validated models. It automates merging fitted cross-validated model chains into Hmsc model objects and prepare scripts for latent factor prediction on TensorFlow using predict_maps_CV.



mod_postprocess_CV_2_CPU

The function 1) processes *.feather files resulted from Latent Factor predictions (using TensorFlow) and saves LF predication to disk; 2) predicts species-specific mean habitat suitability at testing cross-validation folds and calculates testing evaluation metrics; 3) generates plots of the evaluation metrics.

Author

Ahmed El-Gabbas