Prepare spatial-block cross-validation folds for spatial analysis
Source:R/mod_cv_prepare.R
mod_CV_prepare.RdThis function assign modelling input data into spatial-block cross-validation folds using three strategies (see below) using blockCV::cv_spatial. The function is planned to be used inside the mod_prepare_hpc function.
Usage
mod_cv_prepare(
input_data = NULL,
env_file = ".env",
x_vars = NULL,
cv_n_folds = 4L,
cv_n_grids = 20L,
cv_n_rows = 2L,
cv_n_columns = 2L,
cv_sac = FALSE,
out_path = NULL
)Arguments
- input_data
data.frame. A data frame or tibble containing the input dataset. This data frame should include two columns forxandycoordinates as long as other columns matching the names of predictors listed inx_varsargument. This argument is mandatory and can not be empty.- env_file
Character. Path to the environment file containing paths to data sources. Defaults to
.env.- x_vars
Character vector. Variables to be used in the model. This argument is mandatory and can not be empty.
- cv_n_folds
Integer. Number of cross-validation folds. Default: 4L.
- cv_n_grids
Integer. Number of grid cells in both directions used in the
cv_distcross-validation strategy (see below). Default: 20L.- cv_n_rows, cv_n_columns
Integer. Number of rows and columns used in the
cv_largecross-validation strategy (see below), in which the study area is divided into large blocks given the providedcv_n_rowsandcv_n_columnsvalues. Both default to 2L which means to split the study area into four large blocks at the median latitude and longitude.- cv_sac
Logical. Whether to use the spatial autocorrelation to determine the block size. Defaults to
FALSE,- out_path
Character. Path for directory to save the cross-validation results. This argument is mandatory and can not be empty.
Value
The function returns a modified version of the input dataset with additional numeric columns (integer) indicating the cross-validation strategy used.
Note
The function uses the following cross-validation strategies:
cv_distin which the size of spatial cross-validation blocks is determined by thecv_n_gridsargument. The defaultcv_n_gridsvalue is 20L, which means blocks of 20×20 grid cell each.cv_largewhich splits the study area into large blocks, as determined by thecv_n_rowsandcv_n_columnsarguments. ifcv_n_rows = cv_n_columns= 2L (default), four large blocks will be used, split the study area at the median coordinates.cv_sacin which the size of the blocks is determined by the median spatial autocorrelation range in the predictor data (estimated using blockCV::cv_spatial_autocor). This requires the availability of theautomapR package. This strategy is currently skipped by default.