Skip to contents

This function extracts and processes data from the European Alien Species Information Network (EASIN) for vascular plants. This function extracts plant species data from the EASIN database, matches them with a pre-processed standardized list of taxa, and prepares species-specific maps and summary maps. It also supports downloading data in chunks, handling pagination, and retrying failed downloads.

Usage

EASIN_Process(
  ExtractTaxa = TRUE,
  ExtractData = TRUE,
  NDownTries = 10,
  NCores = 6,
  SleepTime = 10,
  NSearch = 1000,
  FromHPC = TRUE,
  EnvFile = ".env",
  DeleteChunks = TRUE,
  StartYear = 1981,
  Plot = TRUE
)

Arguments

ExtractTaxa

Logical. If TRUE, the function will extract the EASIN taxonomy list using EASIN_Taxonomy. Default is TRUE.

ExtractData

Logical. If TRUE, the function will download EASIN species occurrence data using EASIN_Down. Default is TRUE.

NDownTries

Integer. Number of attempts to retry downloading data in case of failure. Default is 10.

NCores

Integer. Number of CPU cores to use for parallel processing. The maximum number of allowed cores are 8. Default is 6.

SleepTime

Numeric. Time in seconds to wait between download attempts and between chunks. Default is 10 seconds.

NSearch

Integer. Number of observations or species to download during EASIN taxonomy or data extraction, respectively. Default is 1000.

FromHPC

Logical indicating whether the work is being done from HPC, to adjust file paths accordingly. Default: TRUE.

EnvFile

Character. The path to the environment file containing variables required by the function. Default is ".env".

DeleteChunks

Logical. If TRUE, the function will delete intermediate files after processing. Default is FALSE.

StartYear

Integer. Minimum year for filtering species occurrence data. Records before this year will be excluded. Default is 1981, which matches the year ranges of CHELSA current climate data.

Plot

Logical. If TRUE, the function will generate summary plots of the processed data using EASIN_Plot. Default is TRUE.

Value

The function Returns NULL invisibly after completing the data extraction, processing, and optional plotting. The function saves multiple outputs to disk, including the extracted and processed EASIN data, species-specific data files, and summary statistics. The main outputs are:

  • EASIN_Taxa.RData: A dataset containing the standardized EASIN taxonomy.

  • EASIN_Data.RData: A cleaned and merged dataset of species occurrence data.

  • EASIN_NObs.RData: A rasterized dataset showing the number of observations per grid cell.

  • EASIN_NObs_PerPartner.RData: A rasterized dataset showing the number of observations per data partner.

  • EASIN_NSp.RData: A rasterized dataset showing the number of species per grid cell.

  • EASIN_NSp_PerPartner.RData: A rasterized dataset showing the number of species per data partner.

  • Species-specific data files, saved as both sf and raster objects.

Note

  • The function assumes that the necessary environment variables are correctly set up in the specified .env file. Users should ensure that all required files and directories are accessible before running the function.

  • The function skips processing (i.e. reuse) species data or data chunks if the data already exist on the raw directory. The function assumes that the contents of this folder should be removed as part of the data workflow. Skipping processing available data can help not to re-download already available data from the EASIN server.

  • This function depends on the following functions: EASIN_Taxonomy for getting the most recent EASIN taxonomy; EASIN_Down for processing EASIN dataset; and EASIN_Plot for plotting.

Author

Ahmed El-Gabbas