"Using html_attr with the attribute "href" does not return any value in the rvest package

Question

"Using html_attr with the attribute "href" does not return any value in the rvest package

My objective is to extract the URLs linked with specific CSS elements on a website using rvest. Despite trying various methods, such as using the html_attr function with the 'href' argument, my current script only returns NA values instead of the expected URLs.

Code snippet for setting up variables

library(rvest)

my_url <- "http://www.sherdog.com/events/UFC-Fight-Night-111-Holm-vs-Correia-58241"

my_read_url <- read_html(my_url)

my_nodes <- html_nodes(my_read_url, ".fighter_result_data a span , .right_side a span , .left_side a span")

Verify if my_nodes correspond to athletes' names

html_text(my_nodes)

Display that my_nodes are selecting the desired CSS elements

[1] "Holly Holm"          "Bethe Correia"       "Marcin Tybura"      
 [4] "Andrei Arlovski"     "Colby Covington"     "Dong Hyun Kim"      
 [7] "Rafael dos Anjos"    "Tarec Saffiedine"    "Jon Tuck"           
[10] "Takanori Gomi"       "Walt Harris"         "Cyril Asker"        
[13] "Alex Caceres"        "Rolando Dy"          "Yuta Sasaki"        
[16] "Justin Scoggins"     "Jingliang Li"        "Frank Camacho"      
[19] "Russell Doane"       "Kwan Ho Kwak"        "Naoki Inoue"        
[22] "Carls John de Tomas" "Lucie Pudilova"      "Ji Yeon Kim"

Attempt to retrieve URLs for each athlete's unique pages

html_attr(my_nodes, "href")

The output indicates that my efforts only yield a list of NA values

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

If anyone can provide assistance in successfully obtaining the URLs instead of these NA values, I would greatly appreciate it. Thank you!

html css r web-scraping rvest

Answer 1

Answer №1

Make sure you are selecting the span elements, not the a elements when using the html_nodes function. Remember that only the a elements have an href= attribute, not the span elements. You should adjust your code to:

my_nodes <- html_nodes(my_read_url, ".fighter_result_data a, .right_side a, .left_side a")
html_text(my_nodes)
html_attr(my_nodes, "href")

Answer 2

Make sure you are selecting the span elements, not the a elements when using the html_nodes function. Remember that only the a elements have an href= attribute, not the span elements. You should adjust your code to:

my_nodes <- html_nodes(my_read_url, ".fighter_result_data a, .right_side a, .left_side a")
html_text(my_nodes)
html_attr(my_nodes, "href")

Answer 3

Answer №2

Like what @MrFlick mentioned, the hyperlinks can be found within <a> tags and you need to access them.

my_url %>%
  read_html() %>%
  html_nodes('.fighter_result_data') %>% html_nodes('a') %>% 
  html_attr('href')
[1] "/fighter/Marcin-Tybura-86928"        "/fighter/Andrei-Arlovski-270"

Answer 4

Like what @MrFlick mentioned, the hyperlinks can be found within <a> tags and you need to access them.

my_url %>%
  read_html() %>%
  html_nodes('.fighter_result_data') %>% html_nodes('a') %>% 
  html_attr('href')
[1] "/fighter/Marcin-Tybura-86928"        "/fighter/Andrei-Arlovski-270"

"Using html_attr with the attribute "href" does not return any value in the rvest package

Answer №1

Answer №2

Similar questions

The audio must start playing prior to being forwarded to a new page

The browser fails to implement styling prior to a demanding workload

Displaying a division when a button is pressed

JavaScript button with an event listener to enable sorting functionality

Retrieve text that is divided by the <p> tags using Xpath

Centering divs using iPad media queries does not seem to work properly

Creating a centered transparent rectangle of a specific width using HTML

What could be causing the "keyframes method" in my css to not function correctly?

Introducing a new feature that allows automatic line breaks when using the detail

Position an anchor image at the top left corner, no matter the DTD or browser

Begin one lesson, end all others

Using PHP to download a file

What are the steps for positioning tables using HTML?

Managing the vertical dimensions of a div

Adjust the class based on the number of li elements

using variables in sql within R

Retrieve information from the selected row within a table by pressing the Enter key

"Utilizing multiple class names in Next.js to enhance website styling

Does the a:hover:not() exception only apply to elements with the .active class?

Begin the jQuery ResponsiveSlides Slider with the final image in the <ul> list