My objective is to extract the URLs linked with specific CSS elements on a website using rvest
. Despite trying various methods, such as using the html_attr
function with the 'href'
argument, my current script only returns NA
values instead of the expected URLs.
Code snippet for setting up variables
library(rvest)
my_url <- "http://www.sherdog.com/events/UFC-Fight-Night-111-Holm-vs-Correia-58241"
my_read_url <- read_html(my_url)
my_nodes <- html_nodes(my_read_url, ".fighter_result_data a span , .right_side a span , .left_side a span")
Verify if my_nodes
correspond to athletes' names
html_text(my_nodes)
Display that my_nodes
are selecting the desired CSS elements
[1] "Holly Holm" "Bethe Correia" "Marcin Tybura"
[4] "Andrei Arlovski" "Colby Covington" "Dong Hyun Kim"
[7] "Rafael dos Anjos" "Tarec Saffiedine" "Jon Tuck"
[10] "Takanori Gomi" "Walt Harris" "Cyril Asker"
[13] "Alex Caceres" "Rolando Dy" "Yuta Sasaki"
[16] "Justin Scoggins" "Jingliang Li" "Frank Camacho"
[19] "Russell Doane" "Kwan Ho Kwak" "Naoki Inoue"
[22] "Carls John de Tomas" "Lucie Pudilova" "Ji Yeon Kim"
Attempt to retrieve URLs for each athlete's unique pages
html_attr(my_nodes, "href")
The output indicates that my efforts only yield a list of NA
values
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
If anyone can provide assistance in successfully obtaining the URLs instead of these NA
values, I would greatly appreciate it. Thank you!