Skip to contents


Post-processing of fitted models in the IASDT workflow involves multiple steps, utilising both CPU and GPU computations to optimize performance and manage memory constraints effectively.


Step 1: CPU

The mod_postprocess_1_CPU() function begins the post-processing phase for each habitat type by automating the following tasks:

mod_SLURM_refit() checks for unsuccessful model fits.
mod_merge_chains() merges MCMC chains and saves the fitted model and coda objects to .qs2 or .RData files.
convergence_plot_all() visualises the convergence of rho, alpha, omega, and beta parameters across all model variants. This function is particularly useful for comparing convergence across models with different thinning values, with and without phylogenetic relationships, or varying GPP knot distances. It is unnecessary when only a single model variant is considered.
convergence_plot() displays the convergence of rho, alpha, omega, and beta parameters for a selected model, providing a more detailed view compared to convergence_plot_all().
plot_gelman() visualises the Gelman-Rubin-Brooks diagnostics for the selected model.
mod_summary() extracts and saves a summary of the model.
mod_heatmap_beta() generates heatmaps of the beta parameters.
mod_heatmap_omega() generates heatmaps of the omega parameter, which represents residual species associations.
mod_CV_fit() prepares the necessary data for fitting cross-validated models.
  • output files are saved in the Model_Fitting_CV subdirectory.
  • the type of cross-validation strategy is controlled by the CV_name argument, which defaults to both CV_Dist and CV_Large.
  • unfitted model objects are saved in the Model_Init subdirectory.
  • commands for model fitting are saved as text files, with a separate file for each cross-validation strategy (e.g., Commands2Fit_CV_Dist.txt, Commands2Fit_CV_Large.txt).
  • model fitting commands are submitted as batch jobs using SLURM scripts, with a separate script for each strategy (e.g., CV_Bash_Fit_Dist.slurm, CV_Bash_Fit_Large.slurm).



Computationally intensive tasks offloaded to GPU

Previous attempts to prepare response curve data, predict at new sites, and compute variance partitioning using R on CPUs (such as the UFZ Windows server and LUMI HPC) were limited by memory constraints. As a result, these tasks are now offloaded to GPU-based computations using Python and TensorFlow. The mod_postprocess_1_CPU() function calls the following sub-functions to generate the necessary commands for GPU execution:

resp_curv_prepare_data() prepares data for predicting latent factors for response curves
predict_maps() prepares data for predicting latent factors at new sampling units
variance_partitioning_compute() prepares data for computing variance partitioning



Preparing commands for GPU computations

Predicting latent factors:

  • Predictions of latent factors for response curves and new sampling units are performed using a TensorFlow script located at inst/crossprod_solve.py.
  • For these tasks, the corresponding R functions export multiple .qs2 and .feather data files to the TEMP_Pred subdirectory, which are essential for GPU computations. Additionally, they generate execution commands saved as LF_RC_Commands_.txt (for response curves) and LF_NewSites_Commands_.txt (for new sites).






Computing variance partitioning:



Combining commands for GPU computations

Once mod_postprocess_1_CPU() has been executed for all habitat types, the mod_prepare_TF() function consolidates the batch scripts for GPU computations across all habitat types:

  • It aggregates the script files that contain commands for response curves and latent factor predictions, splitting them into multiple scripts (TF_Chunk_*.txt) for batch processing. Additionally, it generates a SLURM script (LF_SLURM.slurm) for executing the latent factor predictions.



  • It combines the variance partitioning command files into a single VP_Commands.txt file and prepares a SLURM script (VP_SLURM.slurm) for the variance partitioning computations.



Step 2: GPU

In this step, latent factor predictions and variance partitioning are computed on GPUs. The batch jobs for these computations can be submitted using the sbatch command:

# Submit SLURM jobs for variance partitioning and latent factor predictions
sbatch datasets/processed/model_fitting/Mod_Q_Hab_TF/VP_SLURM.slurm
sbatch datasets/processed/model_fitting/Mod_Q_Hab_TF/LF_SLURM.slurm

Additionally, cross-validated models are fitted by submitting the corresponding SLURM scripts for each cross-validation strategy:

# Submit SLURM jobs for cross-validated model fitting
#
# cross-validation method "CV_Dist"
sbatch datasets/processed/model_fitting/HabX/Model_Fitting_CV/CV_Bash_Fit_Dist.slurm
# cross-validation method "CV_Large"
sbatch datasets/processed/model_fitting/HabX/Model_Fitting_CV/CV_Bash_Fit_Large.slurm

Step 3: CPU

To continue the post-processing of the fitted models on CPUs, two functions need to be executed:

1. mod_postprocess_1_CPU()

This function progresses the post-processing pipeline for HMSC models on the CPU by automating the following tasks:

resp_curv_prepare_data(),
resp_curv_plot_SR(),
resp_curv_plot_species(),
resp_curv_plot_species_all()
continues the processing and visualisation of response curves.
predict_maps()
  • predicts habitat suitability across various climate scenarios.
  • computes the model’s explanatory power (internal evaluation without cross-validation) using four metrics: AUC (area under the ROC curve), RMSE (root mean square error), continuous Boyce index, and Tjur R2.
  • prepares GeoTIFF maps for free access via the IASDT OPeNDAP server and the IASDT Shiny app.
plot_prediction() visualises predictions of species and species richness as JPEG images.
plot_latent_factor() visualises the spatial variation in site loadings of HMSC models as JPEG images.
variance_partitioning_compute(),
variance_partitioning_plot()
continues the processing and visualisation of variance partitioning.
plot_evaluation() visualises the explanatory power of the model (internal evaluation).


2. mod_postprocess_CV_1_CPU()

This function begins the post-processing of cross-validated models on the CPU by automating the following tasks:

mod_merge_chains_CV() merges fitted cross-validated model chains into Hmsc model objects and saves them to disk.
predict_maps_CV() prepares scripts for predicting latent factors for each cross-validation strategy at new sampling units (evaluation folds). The arguments LF_only and LF_commands_only are set to TRUE to prepare only the necessary script files.

Once predict_maps_CV() has been completed, the function combines the computation commands into multiple text script files (TF_Chunk_*.txt) in each model’s Model_Fitting_CV/LF_TF_commands subdirectory. These scripts need to be executed on GPUs using a single batch job submitted via a SLURM script in the same directory, LF_SLURM.slurm.


Step 4: GPU

In this step, the computation of latent factors for cross-validated models is performed on GPUs using SLURM scripts.

sbatch datasets/processed/model_fitting/HabX/Model_Fitting_CV/LF_TF_commands/LF_SLURM.slurm

Step 5: CPU

The final step of the post-processing pipeline is carried out on CPUs using the mod_postprocess_CV_2_CPU() function. This function automates the following tasks:

  • Predicting habitat suitability at the testing cross-validation folds using the predict_maps_CV() function.
  • Computing the model’s predictive power (using spatially independent testing data) with the same function, based on four metrics: AUC (area under the ROC curve); RMSE (root mean square error); continuous Boyce index; and Tjur R2.
  • Plotting the model’s evaluation, including:
    • Predictive power values for each evaluation metric versus the mean number of testing presences
    • Explanatory versus predictive power for each evaluation metric

Previous articles:
1. Overview
2. Processing abiotic data
3. Processing biotic data
4. Model fitting