Skip to contents

Downloads and processes GBIF sampling effort data for vascular plants in Europe, supporting the Invasive Alien Species prototype Digital Twin (IAS-pDT). Orchestrated by Efforts_Process(), it uses helper functions to request, download, split, summarize, and visualize data at the Order level. The functions prepares raster maps for the number of vascular plant observations and species per grid cell.

Usage

Efforts_Process(
  EnvFile = ".env",
  Renviron = ".Renviron",
  Request = TRUE,
  Download = TRUE,
  NCores = 6L,
  StartYear = 1981L,
  Boundaries = c(-30, 50, 25, 75),
  ChunkSize = 100000L,
  DeleteChunks = TRUE,
  DeleteProcessed = TRUE
)

Efforts_Request(
  EnvFile = ".env",
  NCores = 3L,
  StartYear = 1981L,
  Renviron = ".Renviron",
  Boundaries = c(-30, 50, 25, 75)
)

Efforts_Download(NCores = 6L, EnvFile = ".env")

Efforts_Summarize(
  EnvFile = ".env",
  NCores = 6L,
  ChunkSize = 100000L,
  DeleteChunks = TRUE
)

Efforts_Split(Path_Zip = NULL, EnvFile = ".env", ChunkSize = 100000L)

Efforts_Plot(EnvFile = ".env")

Arguments

EnvFile

Character. Path to the environment file containing paths to data sources. Defaults to .env.

Renviron

Character. Path to .Renviron file with GBIF credentials (GBIF_EMAIL, GBIF_USER, GBIF_PWD). Default: ".Renviron". The credentials must be in the format:

  • GBIF_EMAIL=your_email

  • GBIF_USER=your_username

  • GBIF_PWD=your_password

Request

Logical. If TRUE (default), requests GBIF data; otherwise, loads existing data.

Download

Logical. If TRUE (default), downloads and saves GBIF data; otherwise, skips download. Default: TRUE.

NCores

Integer. Number of CPU cores to use for parallel processing. Default: 6, except for Efforts_Request, which defaults to 3 with a maximum of 3.

StartYear

Integer. Earliest year for GBIF records (matches CHELSA climate data). Default: 1981.

Boundaries

Numeric vector (length 4). GBIF data bounds (Left, Right, Bottom, Top). Default: c(-30, 50, 25, 75).

ChunkSize

Integer. Rows per chunk file. Default: 100000.

DeleteChunks

Logical. If TRUE (default), deletes chunk files post-processing.

DeleteProcessed

Logical. If TRUE (default), removes raw GBIF files after processing (>22 GB).

Path_Zip

Character. Path to zip file with CSV for splitting.

Note

  • Efforts_Process() is the main entry point for processing sampling effort data.

  • Time-intensive (>9 hours on 6-core Windows PC; GBIF request ~5 hours).

  • Detects and processes only new/missing data by order.

Functions details

  • Efforts_Process(): Manages the workflow for requesting, downloading, processing, and plotting GBIF vascular plant data.

  • Efforts_Request(): Requests GBIF data by order in parallel. Stores results to disk.

  • Efforts_Download(): Downloads GBIF data, validates files, and loads existing data if available. Returns a dataframe (Efforts_AllRequests) with paths.

  • Efforts_Split(): Splits zipped CSV data by order into chunks, saving each separately.

  • Efforts_Summarize(): Processes and summarizes data into RData and TIFF rasters.

  • Efforts_Plot(): Plots observation efforts (raw and log10 scales).

References

Data source: https://www.gbif.org

Author

Ahmed El-Gabbas