Skip to contents

This function scrapes a web page for all links (<a> tags) and extracts both the URLs and the link text.

Usage

ScrapLinks(url)

Source

The source code of this function was taken from this gist.

Arguments

url

A character string specifying the URL of the web page to scrape. This URL is also used to resolve relative links to absolute URLs.

Value

A tibble with two columns: link_text containing the text of each link, and url containing the absolute URL of each link. The tibble is sorted by URL and then by link text, and only unique links are included.

Examples

ScrapLinks("https://github.com/")
#> # A tibble: 123 × 2
#>    link_text            url                                
#>    <chr>                <chr>                              
#>  1 ""                   https:/github.com/                 
#>  2 "Reload"             https:/github.com/                 
#>  3 "Jump to footnote 1" https:/github.com/#footnote-1      
#>  4 "Jump to footnote 2" https:/github.com/#footnote-2      
#>  5 ""                   https:/github.com/#footnote-ref-1  
#>  6 ""                   https:/github.com/#footnote-ref-2  
#>  7 ""                   https:/github.com/#hero            
#>  8 "Skip to content"    https:/github.com/#start-of-content
#>  9 "About"              https:/github.com/about            
#> 10 "Inclusion"          https:/github.com/about/diversity  
#> # ℹ 113 more rows