Post-processing of fitted models in the IASDT workflow
involves multiple steps, utilising both CPU and GPU computations to
optimize performance and manage memory constraints effectively.
Step 1: CPU
The mod_postprocess_1_cpu() function
begins the post-processing phase for each habitat type by automating the
following tasks:
visualises the convergence of rho, alpha,
omega, and beta parameters across all model
variants. This function is particularly useful for comparing convergence
across models with different thinning values, with and without
phylogenetic relationships, or varying GPP knot distances. It is
unnecessary when only a single model variant is considered.
displays the convergence of rho, alpha,
omega, and beta parameters for a selected
model, providing a more detailed view compared to
convergence_plot_all().
prepares the necessary data for fitting cross-validated models.
output files are saved in the model_fitting_cv
subdirectory.
the type of cross-validation strategy is controlled by the
cv_name argument, which defaults to both
cv_dist and cv_large.
unfitted model objects are saved in the model_init
subdirectory.
commands for model fitting are saved as text files, with a separate file
for each cross-validation strategy (e.g.,
commands_to_fit_cv_dist.txt,
commands_to_fit_cv_large.txt).
model fitting commands are submitted as batch jobs using SLURM scripts,
with a separate script for each strategy (e.g.,
cv_bash_fit_dist.slurm,
cv_bash_fit_large.slurm).
Computationally intensive tasks offloaded to GPU
Previous attempts to prepare response curve data, predict at new
sites, and compute variance partitioning using R on CPUs (such as the
UFZ Windows server and LUMI HPC) were limited by memory constraints. As
a result, these tasks are now offloaded to GPU-based computations using
Python and TensorFlow. The
mod_postprocess_1_cpu() function calls the following
sub-functions to generate the necessary commands for GPU execution:
Predictions of latent factors for response curves and new sampling
units are performed using a TensorFlow script located at
inst/crossprod_solve.py.
For these tasks, the corresponding R functions export multiple
.qs2 and .feather data files to the
temp_pred subdirectory, which are essential for GPU
computations. Additionally, they generate execution commands saved as
lf_rc_commands_.txt (for response curves) and
lf_new_sites_commands_.txt (for new sites).
For predictions at mean coordinates (as specified by the
coordinates argument in
Hmsc::constructGradient()), latent factor predictions —
which are typically memory-intensive when using
Hmsc::predictLatentFactor() — are computed on GPUs.
Predicting at new sites
predict_maps() sets up GPU computations for predictions
at new sites when both lf_only = TRUE and
lf_commands_only = TRUE.
The variance_partitioning_compute() function exports
the necessary files to the temp_vp subdirectory, including
numerous .qs2 and .feather files. It also
generates execution commands saved as VP_A_Command.txt,
VP_F_Command.txt, and VP_mu_Command.txt.
Combining commands for GPU computations
Once mod_postprocess_1_cpu() has been executed for all
habitat types, the mod_prepare_tf()
function consolidates the batch scripts for GPU computations across all
habitat types:
It aggregates the script files that contain commands for response
curves and latent factor predictions, splitting them into multiple
scripts (tf_chunk_*.txt) for batch processing. Additionally, it
generates a SLURM script (lf_SLURM.slurm) for executing the
latent factor predictions.
It combines the variance partitioning command files into a single
VP_Commands.txt file and prepares a SLURM script
(VP_SLURM.slurm) for the variance partitioning
computations.
#!/bin/bash
#SBATCH --job-name=VP_TF
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --account=project_465001588
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-node=1
#SBATCH --time=01:30:00
#SBATCH --partition=small-g
#SBATCH --output=datasets/processed/model_fitting/Mod_Q_Hab_TF/log/%x-%A-%a.out
#SBATCH --error=datasets/processed/model_fitting/Mod_Q_Hab_TF/log/%x-%A-%a.out
#SBATCH --array=1-24
# File containing commands to be executed
File=datasets/processed/model_fitting/Mod_Q_Hab_TF/VP_Commands.txt
# Load TensorFlow module and configure environment
ml use /appl/local/csc/modulefiles
ml tensorflow
export TF_CPP_MIN_LOG_LEVEL=3
export TF_ENABLE_ONEDNN_OPTS=0
# Verify GPU availability
python3 -c "import tensorflow as tf; print(\"Num GPUs Available:\", len(tf.config.list_physical_devices(\"GPU\")))"
# Run array job
head -n $SLURM_ARRAY_TASK_ID $File | tail -n 1 | bash
echo End of program at `date`
Step 2: GPU
In this step, latent factor predictions and variance partitioning are
computed on GPUs. The batch jobs for these computations can be submitted
using the sbatch command:
# Submit SLURM jobs for variance partitioning and latent factor predictionssbatch datasets/processed/model_fitting/Mod_Q_Hab_TF/VP_SLURM.slurmsbatch datasets/processed/model_fitting/Mod_Q_Hab_TF/lf_SLURM.slurm
Additionally, cross-validated models are fitted by submitting the
corresponding SLURM scripts for each cross-validation strategy:
# Submit SLURM jobs for cross-validated model fitting## cross-validation method "cv_dist"sbatch datasets/processed/model_fitting/HabX/model_fitting_cv/cv_bash_fit_dist.slurm# cross-validation method "cv_large"sbatch datasets/processed/model_fitting/HabX/model_fitting_cv/cv_bash_fit_large.slurm
Step 3: CPU
To continue the post-processing of the fitted models on CPUs, two
functions need to be executed:
1. mod_postprocess_2_cpu()
The mod_postprocess_2_cpu() function progresses the
post-processing pipeline for HMSC models on the CPU by automating the
following tasks:
predicts habitat suitability across various climate scenarios.
computes the model’s explanatory power (internal evaluation without
cross-validation) using four metrics: AUC (area under the ROC curve),
RMSE (root mean square error), continuous Boyce index, and Tjur
R2.
prepares scripts for predicting latent factors for each cross-validation
strategy at new sampling units (evaluation folds). The arguments
lf_only and lf_commands_only are set to
TRUE to prepare only the necessary script files.
Once predict_maps_cv() has been completed, the function
combines the computation commands into multiple text script files
(tf_chunk_*.txt) in each model’s
model_fitting_cv/lf_TF_commands subdirectory. These scripts
need to be executed on GPUs using a single batch job submitted via a
SLURM script in the same directory, lf_SLURM.slurm.
Step 4: GPU
In this step, the computation of latent factors for cross-validated
models is performed on GPUs using SLURM scripts.
The final step of the post-processing pipeline is carried out on CPUs
using the mod_postprocess_cv_2_cpu() function. This
function automates the following tasks:
Predicting habitat suitability at the testing cross-validation folds
using the predict_maps_cv() function.
Computing the model’s predictive power (using spatially independent
testing data) with the same function, based on four metrics: AUC (area
under the ROC curve); RMSE (root mean square error); continuous Boyce
index; and Tjur R2.
Plotting the model’s evaluation, including:
Predictive power values for each evaluation metric versus the mean
number of testing presences
Explanatory versus predictive power for each evaluation metric