Skip to contents

Extracts, processes, and visualises occurrence data from the Global Biodiversity Information Facility (GBIF) for the Invasive Alien Species Digital Twin (IASDT). Orchestrated by GBIF_process(), it requests, downloads, cleans, chunks, and maps species data using helper functions.

Usage

GBIF_process(
  env_file = ".env",
  r_environ = ".Renviron",
  n_cores = 6L,
  strategy = "multisession",
  request = TRUE,
  download = TRUE,
  split_chunks = TRUE,
  overwrite = FALSE,
  delete_chunks = TRUE,
  chunk_size = 50000L,
  boundaries = c(-30, 50, 25, 75),
  start_year = 1981L
)

GBIF_download(
  env_file = ".env",
  r_environ = ".Renviron",
  request = TRUE,
  download = TRUE,
  split_chunks = TRUE,
  chunk_size = 50000L,
  boundaries = c(-30, 50, 25, 75),
  start_year = 1981L
)

GBIF_read_chunk(
  chunk_file,
  env_file = ".env",
  max_uncertainty = 10L,
  start_year = 1981L,
  save_RData = TRUE,
  return_data = FALSE,
  overwrite = FALSE
)

GBIF_species_data(
  species = NULL,
  env_file = ".env",
  verbose = TRUE,
  plot_tag = NULL
)

Arguments

env_file

Character. Path to the environment file containing paths to data sources. Defaults to .env.

r_environ

Character. Path to .Renviron file with GBIF credentials (GBIF_EMAIL, GBIF_USER, GBIF_PWD). Default: ".Renviron". The credentials must be in the format:

  • GBIF_EMAIL=your_email

  • GBIF_USER=your_username

  • GBIF_PWD=your_password

n_cores

Integer. Number of CPU cores to use for parallel processing. Default: 6.

strategy

Character. The parallel processing strategy to use. Valid options are "sequential", "multisession" (default), "multicore", and "cluster". See future::plan() and ecokit::set_parallel() for details.

request

Logical. If TRUE (default), requests GBIF data; otherwise, loads from disk.

download

Logical. If TRUE (default), downloads and saves GBIF data.

split_chunks

Logical. If TRUE (default), splits data into chunks for easier processing.

overwrite

Logical. If TRUE, reprocesses existing .RData chunks. Default: FALSE. This helps to continue working on previously processed chunks if the previous try failed, e.g. due to memory issue.

delete_chunks

Logical. If TRUE (default), deletes chunk files.

chunk_size

Integer. Records per data chunk. Default: 50000.

boundaries

Numeric vector (length 4). GBIF data bounds (Left, Right, Bottom, Top). Default: c(-30, 50, 25, 75).

start_year

Integer. Earliest collection year to be included. Default is 1981.

chunk_file

Character. Path of chunk file for processing.

max_uncertainty

Numeric. Maximum spatial uncertainty in kilometres. Default: 10.

save_RData

Logical. If TRUE (default), saves chunk data as .RData.

return_data

If TRUE, returns chunk data; otherwise, invisible(NULL). Default: FALSE.

species

Character. Species name for processing.

verbose

Logical. If TRUE (default), prints progress messages.

plot_tag

Character. Tag for plot titles.

Note

Relies on a static RDS file listing IAS species, GBIF keys, and metadata, standardized by Marina Golivets (Feb 2024).

Functions details

  • GBIF_process(): Orchestrates GBIF data requests, downloads, processing, and mapping. Saves RData, Excel, and JPEG summary files.

  • GBIF_download(): Requests and downloads GBIF data (if download = TRUE), using the specified criteria (taxa, coordinates, time period, and boundaries), splits into small chunks (if split_chunks = TRUE), and saves metadata. Returns invisible(NULL).

  • GBIF_read_chunk(): Filters chunk data (spatial/temporal, e.g., spatial uncertainty, collection year, coordinate precision, and taxonomic rank), select relevant columns, and saves as .RData (if save_RData = TRUE) or returns it (if return_data = TRUE). Skips if .RData exists and overwrite = FALSE.

  • GBIF_species_data(): Converts species-specific data to sf and raster formats, generating distribution maps.

Author

Ahmed El-Gabbas