I've been grappling with this issue for a few weeks now and haven't had any success. My ultimate goal is to extract each image from the website provided (link:). To start, I am attempting to retrieve just one instance of the image stored in the 'img alt' property within the HTML code.
The snippet of HTML code looks like this:
<div class="l-grid__item l-grid__item--3/12 l-grid__item--12/12@mobile--sm l-grid__item--4/12@desktop l-grid__item--6/12@tablet"><div tabindex="0" class="c-card u-flex u-flex--column u-height--100% u-cursor--pointer u-bxs--dark-lg:hover c-card--@print"><div class="u-height--100% u-width--100% u-p u-flex u-flex--centered u-mb--auto"><div aria-hidden="true" class="u-max-width--80% u-max-height--250px"><img alt="/photo/66c88d1d7401a93215e0b225.jpg" class="u-max-height--250px u-height--auto u-width--auto u-block" src="/photo/66c88d1d7401a93215e0b225.jpg"></div></div><div class="u-flex u-flex--column u-flex--no-shrink u-p u-bg--off-white u-fw--bold u-color--primary u-text--center u-bt--light-gray"><div class="u-cursor--pointer u-mb--xs">AANDAHL, Fred George</div><div class="u-fz--sm u-fw--semibold">1897 – 1966</div></div></div></div>
I have tried using the R code below, but I keep getting character(0):
library(httr)
library(rvest)
# Fetch the HTML content with a custom User-Agent
response <- GET("https://bioguide.congress.gov/search",
user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"))
# Parse the content
page <- read_html(content(response, as = "text", encoding = "UTF-8"))
# Navigate to the div with class starting with 'l-grid__item' and extract img alt attributes
img_alt_values <- page %>
html_nodes(xpath = "//div[starts-with(@class, 'l-grid__item')]") %>
html_nodes(xpath = ".//img") %>
html_attr("alt")
Does anyone have any suggestions on how to overcome this hurdle?