Process GBIF occurrence data for the IAS-pDT
Source: R/DWF_GBIF_Process.R
, R/DWF_GBIF_Check.R
, R/DWF_GBIF_Download.R
, and 2 more
GBIF_data.Rd
Extracts, processes, and visualizes occurrence data from the Global Biodiversity Information Facility (GBIF) for the
Invasive Alien Species prototype Digital Twin (IAS-pDT
). Orchestrated by
GBIF_Process()
, it requests, downloads, cleans, chunks, and maps species
data using helper functions.
Usage
GBIF_Process(
EnvFile = ".env",
Renviron = ".Renviron",
NCores = 6L,
Request = TRUE,
Download = TRUE,
SplitChunks = TRUE,
Overwrite = FALSE,
DeleteChunks = TRUE,
ChunkSize = 50000L,
Boundaries = c(-30, 50, 25, 75),
StartYear = 1981L
)
GBIF_Check(Renviron = ".Renviron")
GBIF_Download(
EnvFile = ".env",
Renviron = ".Renviron",
Request = TRUE,
Download = TRUE,
SplitChunks = TRUE,
ChunkSize = 50000L,
Boundaries = c(-30, 50, 25, 75),
StartYear = 1981L
)
GBIF_ReadChunk(
ChunkFile,
EnvFile = ".env",
MaxUncert = 10L,
StartYear = 1981L,
SaveRData = TRUE,
ReturnData = FALSE,
Overwrite = FALSE
)
GBIF_SpData(Species = NULL, EnvFile = ".env", Verbose = TRUE, PlotTag = NULL)
Arguments
- EnvFile
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- Renviron
Character. Path to
.Renviron
file with GBIF credentials (GBIF_EMAIL
,GBIF_USER
,GBIF_PWD
). Default:".Renviron"
. The credentials must be in the format:GBIF_EMAIL=your_email
GBIF_USER=your_username
GBIF_PWD=your_password
- NCores
Integer. Number of CPU cores to use for parallel processing. Default: 6.
- Request
Logical. If
TRUE
(default), requests GBIF data; otherwise, loads from disk.- Download
Logical. If
TRUE
(default), downloads and saves GBIF data.- SplitChunks
Logical. If
TRUE
(default), splits data into chunks for easier processing.- Overwrite
Logical. If
TRUE
, reprocesses existing.RData
chunks. Default:FALSE
. This helps to continue working on previously processed chunks if the previous try failed, e.g. due to memory issue.- DeleteChunks
Logical. If
TRUE
(default), deletes chunk files.- ChunkSize
Integer. Records per data chunk. Default:
50000
.- Boundaries
Numeric vector (length 4). GBIF data bounds (Left, Right, Bottom, Top). Default:
c(-30, 50, 25, 75)
.- StartYear
Integer. Earliest collection year to be included. Default is 1981.
- ChunkFile
Character. Path of chunk file for processing.
- MaxUncert
Numeric. Maximum spatial uncertainty in kilometers. Default:
10
.- SaveRData
Logical. If
TRUE
(default), saves chunk data as.RData
.- ReturnData
If
TRUE
, returns chunk data; otherwise,invisible(NULL)
. Default:FALSE
.- Species
Character. Species name for processing.
- Verbose
Logical. If
TRUE
(default), prints progress messages.- PlotTag
Character. Tag for plot titles.
Note
Relies on a static RDS file listing IAS species, GBIF keys, and metadata, standardized by Marina Golivets (Feb 2024).
Functions details
GBIF_Process()
: Orchestrates GBIF data requests, downloads, processing, and mapping. SavesRData
, Excel, and JPEG summary files.GBIF_Check()
: Verifies GBIF credentials in environment or.Renviron
. ReturnsTRUE
if valid, elseFALSE
.GBIF_Download()
: Requests and downloads GBIF data (ifDownload = TRUE
), using the specified criteria (taxa, coordinates, time period, and boundaries), splits into small chunks (ifSplitChunks = TRUE
), and saves metadata. Returnsinvisible(NULL)
.GBIF_ReadChunk()
: Filters chunk data (spatial/temporal, e.g., spatial uncertainty, collection year, coordinate precision, and taxonomic rank), select relevant columns, and saves as.RData
(ifSaveRData = TRUE
) or returns it (ifReturnData = TRUE
). Skips if.RData
exists andOverwrite = FALSE
.GBIF_SpData()
: Converts species-specific data tosf
and raster formats, generating distribution maps.