Process GBIF sampling effort data for the IAS-pDT
Source: R/DWF_Efforts_Process.R
, R/DWF_Efforts_Request.R
, R/DWF_Efforts_Download.R
, and 3 more
Efforts_data.Rd
Downloads and processes GBIF sampling effort data for vascular plants in
Europe, supporting the Invasive Alien Species prototype Digital Twin
(IAS-pDT
). Orchestrated by Efforts_Process()
, it uses helper functions to
request, download, split, summarize, and visualize data at the Order level.
The functions prepares raster maps for the number of vascular plant
observations and species per grid cell.
Usage
Efforts_Process(
EnvFile = ".env",
Renviron = ".Renviron",
Request = TRUE,
Download = TRUE,
NCores = 6L,
StartYear = 1981L,
Boundaries = c(-30, 50, 25, 75),
ChunkSize = 100000L,
DeleteChunks = TRUE,
DeleteProcessed = TRUE
)
Efforts_Request(
EnvFile = ".env",
NCores = 3L,
StartYear = 1981L,
Renviron = ".Renviron",
Boundaries = c(-30, 50, 25, 75)
)
Efforts_Download(NCores = 6L, EnvFile = ".env")
Efforts_Summarize(
EnvFile = ".env",
NCores = 6L,
ChunkSize = 100000L,
DeleteChunks = TRUE
)
Efforts_Split(Path_Zip = NULL, EnvFile = ".env", ChunkSize = 100000L)
Efforts_Plot(EnvFile = ".env")
Arguments
- EnvFile
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- Renviron
Character. Path to
.Renviron
file with GBIF credentials (GBIF_EMAIL
,GBIF_USER
,GBIF_PWD
). Default:".Renviron"
. The credentials must be in the format:GBIF_EMAIL=your_email
GBIF_USER=your_username
GBIF_PWD=your_password
- Request
Logical. If
TRUE
(default), requests GBIF data; otherwise, loads existing data.- Download
Logical. If
TRUE
(default), downloads and saves GBIF data; otherwise, skips download. Default:TRUE
.- NCores
Integer. Number of CPU cores to use for parallel processing. Default: 6, except for
Efforts_Request
, which defaults to 3 with a maximum of 3.- StartYear
Integer. Earliest year for GBIF records (matches CHELSA climate data). Default:
1981
.- Boundaries
Numeric vector (length 4). GBIF data bounds (Left, Right, Bottom, Top). Default:
c(-30, 50, 25, 75)
.- ChunkSize
Integer. Rows per chunk file. Default:
100000
.- DeleteChunks
Logical. If
TRUE
(default), deletes chunk files post-processing.- DeleteProcessed
Logical. If
TRUE
(default), removes raw GBIF files after processing (>22 GB).- Path_Zip
Character. Path to zip file with CSV for splitting.
Note
Efforts_Process()
is the main entry point for processing sampling effort data.Time-intensive (>9 hours on 6-core Windows PC; GBIF request ~5 hours).
Detects and processes only new/missing data by order.
Functions details
Efforts_Process()
: Manages the workflow for requesting, downloading, processing, and plotting GBIF vascular plant data.Efforts_Request()
: Requests GBIF data by order in parallel. Stores results to disk.Efforts_Download()
: Downloads GBIF data, validates files, and loads existing data if available. Returns a dataframe (Efforts_AllRequests
) with paths.Efforts_Split()
: Splits zipped CSV data by order into chunks, saving each separately.Efforts_Summarize()
: Processes and summarizes data intoRData
and TIFF rasters.Efforts_Plot()
: Plots observation efforts (raw and log10 scales).
References
Data source: https://www.gbif.org