Process EASIN data for the IASDT
Source: R/DWF_EASIN_process.R
, R/DWF_EASIN_taxonomy.R
, R/DWF_EASIN_download.R
, and 1 more
EASIN_data.Rd
Extracts, processes, and visualises data from the European Alien Species Information Network (EASIN) for the
Invasive Alien Species Digital Twin (IASDT
). Manages taxonomy, occurrence
data, and plots, handling API pagination and server limits. Orchestrated by
EASIN_process()
with helpers EASIN_taxonomy()
, EASIN_download()
, and
EASIN_plot()
.
Usage
EASIN_process(
extract_taxa = TRUE,
extract_data = TRUE,
n_download_attempts = 10L,
n_cores = 6L,
strategy = "multisession",
sleep_time = 10L,
n_search = 1000L,
env_file = ".env",
delete_chunks = TRUE,
start_year = 1981L,
plot = TRUE
)
EASIN_taxonomy(
env_file = ".env",
kingdom = "Plantae",
phylum = "Tracheophyta",
n_search = 100
)
EASIN_download(
species_key,
timeout = 200,
verbose = FALSE,
env_file = ".env",
n_search = 1000,
n_attempts = 10,
sleep_time = 5,
delete_chunks = TRUE,
return_data = FALSE
)
EASIN_plot(env_file = ".env")
Arguments
- extract_taxa
Logical. If
TRUE
, extracts taxonomy usingEASIN_taxonomy()
. Default:TRUE
.- extract_data
Logical.If
TRUE
, downloads occurrence data withEASIN_download()
. Default:TRUE
.- n_download_attempts
Integer. Retry attempts for downloads. Default:
10
.- n_cores
Integer. Number of CPU cores to use for parallel processing. Default: 6. The maximum number of allowed cores are 8.
- strategy
Character. The parallel processing strategy to use. Valid options are "sequential", "multisession" (default), "multicore", and "cluster". See
future::plan()
andecokit::set_parallel()
for details.- sleep_time
Integer. Number of seconds to pause between each data retrieval request to prevent overloading the server. Default: 5 second.
- n_search
Integer. Number of records to attempt to retrieve per request. Default: 1000, which is the current maximum allowed by the API.
- env_file
Character. Path to the environment file containing paths to data sources. Defaults to
.env
.- delete_chunks
Logical. Whether to delete temporary files for data chunks from the
FileParts
subdirectory. Defaults toTRUE
.- start_year
Integer. Earliest year for occurrence data (excludes earlier records). Default:
1981
(aligned with CHELSA climate data).- plot
Logical. If
TRUE
, generates plots viaEASIN_plot()
. Default:TRUE
.- kingdom
Character. Taxonomic kingdom to query. Default:
"Plantae"
.- phylum
Character. Taxonomic phylum within kingdom. Default:
"Tracheophyta"
- species_key
Character. EASIN taxon ID for which data is to be retrieved. This parameter cannot be
NULL
.- timeout
Integer. Download timeout in seconds. Default:
200
.- verbose
Logical. If
TRUE
, prints progress messages. Default:FALSE
.- n_attempts
Integer. Max download attempts per chunk. Default:
10
.- return_data
Logical. If
TRUE
, returns data as a dataframe; otherwise, saves to disk and returnsinvisible(NULL)
. Default:FALSE
.
Note
Uses a static RDS file with EASIN-GBIF taxonomic standardization, prepared by Marina Golivets (Feb 2024).
Functions details
EASIN_process()
: Orchestrates taxonomy extraction, data downloads, and plotting for EASIN species data.EASIN_taxonomy()
: Fetches taxonomy data in chunks via the EASIN API, filtered by kingdom and phylum. Returns a tibble.EASIN_download()
: Downloads occurrence data for a given EASIN ID, handling pagination and pauses. Returns a dataframe ifreturn_data = TRUE
, elseinvisible(NULL)
.EASIN_plot()
: Creates summary plots (observations count, species count, distribution by partner) as JPEGs. Returnsinvisible(NULL)
.