This function scrapes a web page for all links (<a>
tags) and extracts both
the URLs and the link text.
Usage
ScrapLinks(URL, Arrange = c("Link", "Link_text"))
Arguments
- URL
Character. The URL of the web page to scrape. This URL is also used to resolve relative links to absolute URLs.
- Arrange
Character vector of length 1 or 2. The columns to arrange the output by. The default is c("Link", "Link_text"). The first column is the URL of the link, and the second column is the text of the link. The function will arrange the output in ascending order by the column(s) specified in this argument.
Value
A tibble with two columns: Link_text
containing the text of each
link, and URL
containing the absolute URL of each link. The tibble is
sorted by URL and then by link text, and only unique links are included.
Examples
head(
ScrapLinks(URL = "https://github.com/BioDT/IASDT.R"))
#> # A tibble: 6 × 2
#> Link_text Link
#> <chr> <chr>
#> 1 https://biodt.eu https://biodt.eu
#> 2 BioDT https://biodt.eu/
#> 3 link https://biodt.eu/
#> 4 IASDT.R https://biodt.github.io/IASDT.R
#> 5 biodt.github.io/IASDT.R/ https://biodt.github.io/IASDT.R/
#> 6 here https://biodt.github.io/IASDT.R/reference/index.html
head(
ScrapLinks(
URL = "https://github.com/BioDT/IASDT.R", Arrange = "Link_text"))
#> # A tibble: 6 × 2
#> Link_text Link
#> <chr> <chr>
#> 1 + 1 release https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/releases
#> 2 .Rbuildignore https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/blob/main/.Rbuild…
#> 3 .github https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/tree/main/.github
#> 4 .gitignore https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/blob/main/.gitign…
#> 5 .lintr https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/blob/main/.lintr
#> 6 0 forks https:/github.com/BioDT/IASDT.R/BioDT/IASDT.R/forks
# This will give an "Invalid URL" error
if (FALSE) { # \dontrun{
ScrapLinks(URL = "https://github50.com")
} # }