This function assign modelling input data into spatial-block cross-validation folds using three strategies (see below) using blockCV::cv_spatial. The function is planned to be used inside the Mod_Prep4HPC function.
Usage
GetCV(
DT,
EnvFile = ".env",
XVars,
CV_NFolds = 4,
CV_NGrids = 20,
CV_NR = 2,
CV_NC = 2,
CV_SAC = FALSE,
OutPath = NULL,
FromHPC = TRUE,
CV_Plot = TRUE
)
Arguments
- DT
A data frame or tibble containing the input dataset. This data frame should include two columns for
x
andy
coordinates as long as other columns matching the names of predictors listed inXVars
argument.- EnvFile
String specifying the path to read environment variables from, with a default value of
.env
.- XVars
Vector of strings specifying variables to be used in the model. This argument is mandatory and can not be empty.
- CV_NFolds
Number of cross-validation folds. Default: 4.
- CV_NGrids
For
CV_Dist
cross-validation strategy (see below), this argument determines the size of the blocks (how many grid cells in both directions).- CV_NR, CV_NC
Integer, the number of rows and columns used in the
CV_Large
cross-validation strategy (see below), in which the study area is divided into large blocks given the providedCV_NR
andCV_NC
values. Both default to 2 which means to split the study area into four large blocks at the median latitude and longitude.- CV_SAC
Logical. Indicating whether to use the spatial autocorrelation to determine the block size. Defaults to
FALSE
,- OutPath
String specifying the folder path to save the cross-validation results. Default:
NULL
.- FromHPC
Logical. Indicates whether the function is being run on an HPC environment, affecting file path handling. Default:
TRUE
.- CV_Plot
Logical. Indicating whether to plot the block cross-validation folds.
Value
The function returns a modified version of the input dataset DT
with 3 additional numeric columns (integer) indicating the cross-validation
folds:
CV_SAC
in which the size of the blocks is determined by the median spatial autocorrelation range in the predictor data (estimated using blockCV::cv_spatial_autocor). This requires the availability of theautomap
R package.CV_Dist
in which the size of spatial cross-validation blocks is determined by theCV_NGrids
argument. The defaultCV_NGrids
value is 20, which means blocks of 20x20 grid cell each.CV_Large
which splits the study area into large blocks, as determined by theCV_NR
andCV_NC
arguments. ifCV_NR = CV_NC
= 2 (default), four large blocks will be used, split the study area at the median coordinates.
Details
The function reads the following environment variable:
DP_R_Grid
(ifFromHPC = TRUE
) orDP_R_Grid_Local
(ifFromHPC = FALSE
). The function reads the content of theGrid_10_Land_Crop.RData
file from this path.DP_R_EUBound_sf
(ifFromHPC
=TRUE
) orDP_R_EUBound_sf_Local
(ifFromHPC
=FALSE
): path for theRData
file containing the country boundaries (sf
object).