My goal is to automate web scraping with RSelenium in R. I've managed to find and click a button on a webpage using RSelenium, but I'm struggling to extract href attributes from the page after clicking the button.
Although I have a list of 4000 species, here is an example:
Species <- c("Abies balsamea", "Alchemilla glomerulans", "Antennaria dioica",
"Atriplex glabriuscula", "Brachythecium salebrosum")
Here's my current code:
library(RSelenium)
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
remDr$navigate("https://ser-sid.org/")
webElem <- remDr$findElement(using = "class", "flex")
# Find the input field and button within webElem
input_element <- webElem$findChildElement(using = "css selector", value = "input[type='text']")
button_element <- webElem$findChildElement(using = "css selector", value = "button")
# Input species name into the input field
input_element$sendKeysToElement(list("Abies balsamea"))
# Click the button to submit the form
button_element$clickElement()
Sys.sleep(5)
# Locate all <a> elements with species information
species_links <- remDr$findElements(using = "css selector", value = "a[href^='/species/']")
# Extract href attributes from the species links
hrefs <- sapply(species_links, function(link) {
link$getElementAttribute("href")
})
# Remove NULL values (in case some links don't have href attributes)
hrefs <- hrefs[!is.na(hrefs)]
# Print the extracted hrefs
print(hrefs)
The code doesn't throw any errors but species_links ends up empty, indicating that the elements with species information are not being found.
I attempted waiting for the page to load after clicking the button, but it appears that the page content isn't fully loading or as expected.
When I manually search for 'Abies balsamea' on the webpage, I find this:
https://i.sstatic.net/lFnLP.png
From there, I aim to retrieve this link at least:
Inspecting it in the webpage, brings me to this image below:
https://i.sstatic.net/ZcNXU.png
Any suggestions on how to troubleshoot this issue and ensure successful extraction of hrefs after button clicks?
My end goal would be to iterate through a species list like Species and create a data.frame containing the links to each species
Edit based on Brett Donald's answer
Brett's solution seems better, but I haven't located the API documentation yet.
This is what I've tried:
library(httr)
# Define the API endpoint URL
url <- "https://fyxheguykvewpdeysvoh.supabase.co/rest/v1/species_summary"
# Define query parameters
params <- list(
select = "*",
or = "(has_germination.eq.true,has_oil.eq.true,has_protein.eq.true,has_dispersal.eq.true,has_seed_weights.eq.true,has_storage_behaviour.eq.true,has_morphology.eq.true)",
genus = "ilike.Abies%",
epithet = "ilike.balsamea%",
order = "genus.asc.nullslast,epithet.asc.nullslast"
)
# Set request headers with the correct API key
headers <- add_headers(
`Content-Type` = "application/json",
Authorization = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImZ5eGhlZ3V5a3Zld3BkZXlzdm9oIiwicm9sZSI6ImFub24iLCJpYXQiOjE2NDc0MTY1MzQsImV4cCI6MTk2Mjk5MjUzNH0.XhJKVijhMUidqeTbH62zQ6r8cS6j22TYAKfbbRHMTZ8"
)
# Make a GET request
response <- GET(url, query = params, headers = headers)
# Check if the request was successful
if (http_type(response) == "application/json") {
# Parse JSON response
data <- content(response, "parsed")
print(data)
} else {
print("Error: Failed to retrieve data")
}
But I receive:
$message
[1] "No API key found in request"
$hint
[1] "No `apikey` request header or url param was found."