Prepare spatial-block cross-validation folds for spatial analysis
Source:R/mod_CV_prepare.R
mod_CV_prepare.Rd
This function assign modelling input data into spatial-block cross-validation folds using three strategies (see below) using blockCV::cv_spatial. The function is planned to be used inside the mod_prepare_HPC function.
Usage
mod_CV_prepare(
input_data = NULL,
env_file = ".env",
x_vars = NULL,
CV_n_folds = 4L,
CV_n_grids = 20L,
CV_n_rows = 2,
CV_n_columns = 2L,
CV_SAC = FALSE,
out_path = NULL,
CV_plot = TRUE
)
Arguments
- input_data
data.frame
. A data frame or tibble containing the input dataset. This data frame should include two columns forx
andy
coordinates as long as other columns matching the names of predictors listed inx_vars
argument. This argument is mandatory and can not be empty.- env_file
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- x_vars
Character vector. Variables to be used in the model. This argument is mandatory and can not be empty.
- CV_n_folds
Integer. Number of cross-validation folds. Default: 4L.
- CV_n_grids
Integer. Number of grid cells in both directions used in the
CV_Dist
cross-validation strategy (see below). Default: 20L.- CV_n_rows, CV_n_columns
Integer. Number of rows and columns used in the
CV_Large
cross-validation strategy (see below), in which the study area is divided into large blocks given the providedCV_n_rows
andCV_n_columns
values. Both default to 2L which means to split the study area into four large blocks at the median latitude and longitude.- CV_SAC
Logical. Whether to use the spatial autocorrelation to determine the block size. Defaults to
FALSE
,- out_path
Character. Path for directory to save the cross-validation results. This argument is mandatory and can not be empty.
- CV_plot
Logical. Indicating whether to plot the block cross-validation folds.
Value
The function returns a modified version of the input dataset with additional numeric columns (integer) indicating the cross-validation strategy used.
Note
The function uses the following cross-validation strategies:
CV_Dist
in which the size of spatial cross-validation blocks is determined by theCV_n_grids
argument. The defaultCV_n_grids
value is 20L, which means blocks of 20×20 grid cell each.CV_Large
which splits the study area into large blocks, as determined by theCV_n_rows
andCV_n_columns
arguments. ifCV_n_rows = CV_n_columns
= 2L (default), four large blocks will be used, split the study area at the median coordinates.CV_SAC
in which the size of the blocks is determined by the median spatial autocorrelation range in the predictor data (estimated using blockCV::cv_spatial_autocor). This requires the availability of theautomap
R package. This strategy is currently skipped by default.