Skip to contents

This function assign modelling input data into spatial-block cross-validation folds using three strategies (see below) using blockCV::cv_spatial. The function is planned to be used inside the Mod_Prep4HPC function.


  EnvFile = ".env",
  CV_NFolds = 4,
  CV_NGrids = 20,
  CV_NR = 2,
  CV_NC = 2,
  OutPath = NULL,
  FromHPC = TRUE,
  CV_Plot = TRUE



A data frame or tibble containing the input dataset. This data frame should include two columns for x and y coordinates as long as other columns matching the names of predictors listed in XVars argument.


String specifying the path to read environment variables from, with a default value of .env.


Vector of strings specifying variables to be used in the model. This argument is mandatory and can not be empty.


Number of cross-validation folds. Default: 4.


For CV_Dist cross-validation strategy (see below), this argument determines the size of the blocks (how many grid cells in both directions).


Integer, the number of rows and columns used in the CV_Large cross-validation strategy (see below), in which the study area is divided into large blocks given the provided CV_NR and CV_NC values. Both default to 2 which means to split the study area into four large blocks at the median latitude and longitude.


Logical. Indicating whether to use the spatial autocorrelation to determine the block size. Defaults to FALSE,


String specifying the folder path to save the cross-validation results. Default: NULL.


Logical. Indicates whether the function is being run on an HPC environment, affecting file path handling. Default: TRUE.


Logical. Indicating whether to plot the block cross-validation folds.


The function returns a modified version of the input dataset DT with 3 additional numeric columns (integer) indicating the cross-validation folds:

  1. CV_SAC in which the size of the blocks is determined by the median spatial autocorrelation range in the predictor data (estimated using blockCV::cv_spatial_autocor). This requires the availability of the automap R package.

  2. CV_Dist in which the size of spatial cross-validation blocks is determined by the CV_NGrids argument. The default CV_NGrids value is 20, which means blocks of 20x20 grid cell each.

  3. CV_Large which splits the study area into large blocks, as determined by the CV_NR and CV_NC arguments. if CV_NR = CV_NC = 2 (default), four large blocks will be used, split the study area at the median coordinates.


The function reads the following environment variable:

  • DP_R_Grid (if FromHPC = TRUE) or DP_R_Grid_Local (if FromHPC = FALSE). The function reads the content of the Grid_10_Land_Crop.RData file from this path.

  • DP_R_EUBound_sf (if FromHPC = TRUE) or DP_R_EUBound_sf_Local (if FromHPC = FALSE): path for the RData file containing the country boundaries (sf object).


Ahmed El-Gabbas