Prepare spatial-block cross-validation folds for spatial analysis
Source:R/Mod_GetCV.R
Mod_GetCV.Rd
This function assign modelling input data into spatial-block cross-validation folds using three strategies (see below) using blockCV::cv_spatial. The function is planned to be used inside the Mod_Prep4HPC function.
Usage
Mod_GetCV(
Data = NULL,
EnvFile = ".env",
XVars = NULL,
CV_NFolds = 4L,
CV_NGrids = 20L,
CV_NR = 2,
CV_NC = 2L,
CV_SAC = FALSE,
OutPath = NULL,
CV_Plot = TRUE
)
Arguments
- Data
data.frame
. A data frame or tibble containing the input dataset. This data frame should include two columns forx
andy
coordinates as long as other columns matching the names of predictors listed inXVars
argument. This argument is mandatory and can not be empty.- EnvFile
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- XVars
Character vector. Variables to be used in the model. This argument is mandatory and can not be empty.
- CV_NFolds
Integer. Number of cross-validation folds. Default: 4L.
- CV_NGrids
Integer. Number of grid cells in both directions used in the
CV_Dist
cross-validation strategy (see below). Default: 20L.- CV_NR, CV_NC
Integer. Number of rows and columns used in the
CV_Large
cross-validation strategy (see below), in which the study area is divided into large blocks given the providedCV_NR
andCV_NC
values. Both default to 2L which means to split the study area into four large blocks at the median latitude and longitude.- CV_SAC
Logical. Whether to use the spatial autocorrelation to determine the block size. Defaults to
FALSE
,- OutPath
Character. Path for directory to save the cross-validation results. This argument is mandatory and can not be empty.
- CV_Plot
Logical. Indicating whether to plot the block cross-validation folds.
Value
The function returns a modified version of the input dataset with additional numeric columns (integer) indicating the cross-validation strategy used.
Note
The function uses the following cross-validation strategies:
CV_Dist
in which the size of spatial cross-validation blocks is determined by theCV_NGrids
argument. The defaultCV_NGrids
value is 20L, which means blocks of 20×20 grid cell each.CV_Large
which splits the study area into large blocks, as determined by theCV_NR
andCV_NC
arguments. ifCV_NR = CV_NC
= 2L (default), four large blocks will be used, split the study area at the median coordinates.CV_SAC
in which the size of the blocks is determined by the median spatial autocorrelation range in the predictor data (estimated using blockCV::cv_spatial_autocor). This requires the availability of theautomap
R package. This strategy is currently skipped by default.