RSelenium: Issue with extracting hyperlinks from webpage following button click操作

My goal is to automate web scraping with RSelenium in R. I've managed to find and click a button on a webpage using RSelenium, but I'm struggling to extract href attributes from the page after clicking the button.

Although I have a list of 4000 species, here is an example:

Species <- c("Abies balsamea", "Alchemilla glomerulans", "Antennaria dioica",
"Atriplex glabriuscula", "Brachythecium salebrosum")

Here's my current code:

library(RSelenium)
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4445L,
  browserName = "firefox"
)

remDr$open()

remDr$navigate("https://ser-sid.org/")

webElem <- remDr$findElement(using = "class", "flex")

# Find the input field and button within webElem
input_element <- webElem$findChildElement(using = "css selector", value = "input[type='text']")
button_element <- webElem$findChildElement(using = "css selector", value = "button")

# Input species name into the input field

input_element$sendKeysToElement(list("Abies balsamea"))

# Click the button to submit the form
button_element$clickElement()



Sys.sleep(5)

# Locate all <a> elements with species information
species_links <- remDr$findElements(using = "css selector", value = "a[href^='/species/']")

# Extract href attributes from the species links
hrefs <- sapply(species_links, function(link) {
  link$getElementAttribute("href")
})

# Remove NULL values (in case some links don't have href attributes)
hrefs <- hrefs[!is.na(hrefs)]

# Print the extracted hrefs
print(hrefs)

The code doesn't throw any errors but species_links ends up empty, indicating that the elements with species information are not being found.

I attempted waiting for the page to load after clicking the button, but it appears that the page content isn't fully loading or as expected.

When I manually search for 'Abies balsamea' on the webpage, I find this:

https://i.sstatic.net/lFnLP.png

From there, I aim to retrieve this link at least:

Inspecting it in the webpage, brings me to this image below:

https://i.sstatic.net/ZcNXU.png

Any suggestions on how to troubleshoot this issue and ensure successful extraction of hrefs after button clicks?

My end goal would be to iterate through a species list like Species and create a data.frame containing the links to each species

Edit based on Brett Donald's answer

Brett's solution seems better, but I haven't located the API documentation yet.

This is what I've tried:

library(httr)

# Define the API endpoint URL
url <- "https://fyxheguykvewpdeysvoh.supabase.co/rest/v1/species_summary"

# Define query parameters
params <- list(
  select = "*",
  or = "(has_germination.eq.true,has_oil.eq.true,has_protein.eq.true,has_dispersal.eq.true,has_seed_weights.eq.true,has_storage_behaviour.eq.true,has_morphology.eq.true)",
  genus = "ilike.Abies%",
  epithet = "ilike.balsamea%",
  order = "genus.asc.nullslast,epithet.asc.nullslast"
)

# Set request headers with the correct API key
headers <- add_headers(
  `Content-Type` = "application/json",
  Authorization = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImZ5eGhlZ3V5a3Zld3BkZXlzdm9oIiwicm9sZSI6ImFub24iLCJpYXQiOjE2NDc0MTY1MzQsImV4cCI6MTk2Mjk5MjUzNH0.XhJKVijhMUidqeTbH62zQ6r8cS6j22TYAKfbbRHMTZ8"
)

# Make a GET request
response <- GET(url, query = params, headers = headers)

# Check if the request was successful
if (http_type(response) == "application/json") {
  # Parse JSON response
  data <- content(response, "parsed")
  print(data)
} else {
  print("Error: Failed to retrieve data")
}

But I receive:

$message
[1] "No API key found in request"

$hint
[1] "No `apikey` request header or url param was found."

Answer №1

Upon reviewing your code, I see no issues with it, although I am not well-versed in RSelenium.

If I were in your shoes, I might consider obtaining the data differently by mimicking the website's API calls rather than scraping it with a robotic browser.

By analyzing the network tab of your browser inspector when conducting a search on ser-sid.org, you can uncover both the API endpoint URL being accessed and the API key.

API endpoint URL (with parameters included)

https://fyxheguykvewpdeysvoh.supabase.co/rest/v1/species_summary?select=*&or=%28has_germination.eq.true%2Chas_oil.eq.true%2Chas_protein.eq.true%2Chas_dispersal.eq.true%2Chas_seed_weights.eq.true%2Chas_storage_behaviour.eq.true%2Chas_morphology.eq.true%29&genus=ilike.Abies%25&epithet=ilike.balsamea%25&order=genus.asc.nullslast%2Cepithet.asc.nullslast

API key (found in request headers)

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImZ5eGhlZ3V5a3Zld3BkZXlzdm9oIiwicm9sZSI6ImFub24iLCJpYXQiOjE2NDc0MTY1MzQsImV4cCI6MTk2Mjk5MjUzNH0.XhJKVijhMUidqeTbH62zQ6r8cS6j22TYAKfbbRHMTZ8

After replicating these details in a new Postman Get request, I received a JSON response like this:

[
  {
    "genus": "Abies",
    "epithet": "balsamea",
    "id": "ef741ce8-6911-4286-b79e-3ff0804520fb",
    "infraspecies_rank": null,
    "infraspecies_epithet": null,
    "has_germination": false,
    "has_oil": true,
    "has_protein": false,
    "has_dispersal": true,
    "has_seed_weights": true,
    "has_storage_behaviour": true,
    "has_morphology": false
  },
  {
    "genus": "Abies",
    "epithet": "balsamea",
    "id": "024cde5f-7cc5-48b7-89fd-be95638c8f2a",
    "infraspecies_rank": "var.",
    "infraspecies_epithet": "balsamea",
    "has_germination": true,
    "has_oil": false,
    "has_protein": false,
    "has_dispersal": false,
    "has_seed_weights": true,
    "has_storage_behaviour": true,
    "has_morphology": false
  }
]

You could easily automate these requests using any language of your choice. Personally, I would opt for Node.js. Wouldn't that be simpler than resorting to web scraping with a robotic browser?

PS. Since the data in this database is reportedly under a Creative Commons License, you might have luck contacting the Society for Ecological Restoration to access the data directly instead of having to extract it species by species.

Answer №2

If you want to utilize the rsDriver launch and xpath for searching with contains(), follow these steps:

library(RSelenium)

port <- 4567
#to terminate port for reuse use system(paste0("sudo kill -9 $(lsof -t -i:",port," -sTCP:LISTEN)"))

#my standard launch function with enhanced privacy features and image blocking for quicker loading
eCaps = list(`moz:firefoxOptions` = list(
  args = list("--disable-gpu","--no-sandbox","--disable-application-cache","--disable-dev-shm-usage", "--disable-extensions"),
  prefs =list(
    "browser.cache.disk.enable" = FALSE,
    "browser.cache.memory.enable" = FALSE,
    "browser.cache.offline.enable" = FALSE,
    "browser.sessionstore.max_tabs_undo" = 0,
    "network.http.use-cache" = FALSE,
    "permissions.default.image"= 2,
    "privacy.clearOnShutdown.cache" = TRUE,
    "privacy.clearOnShutdown.cookies" = TRUE)
)
)

rD <- rsDriver( browser = "firefox", extraCapabilities = eCaps, port=as.integer(port), check=F)
remDr <- rD$client

remDr$navigate("https://ser-sid.org/")

webElem <- remDr$findElement(using = "class", "flex")

# Locate the input field and button within webElem
input_element <- webElem$findChildElement(using = "css selector", value = "input[type='text']")
button_element <- webElem$findChildElement(using = "css selector", value = "button")

# Input the species name into the input field

input_element$sendKeysToElement(list("Abies balsamea"))

# Click the button to submit the form
button_element$click()

Sys.sleep(5)

# Find all <a> elements with species information
species_links <- remDr$findElements(using = "xpath", "//a[contains(@href,'species')]")

# Extract the href attributes from the species links
hrefs <- sapply(species_links, function(link) {
  link$getElementAttribute("href")
})

# Filter out NULL values (in case some links don't have href attributes)
hrefs <- hrefs[!is.na(hrefs)]

# Display the extracted hrefs
print(hrefs)

[[1]]
[1] "https://ser-sid.org/species/ef741ce8-6911-4286-b79e-3ff0804520fb"

[[2]]
[1] "https://ser-sid.org/species/024cde5f-7cc5-48b7-89fd-be95638c8f2a"

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

passport.initialize() function is currently inactive

For my project, I am utilizing node, express, mongoose, and passport. Initially, I successfully implemented a basic Log In functionality in my code within app.js. However, I decided to restructure my code to follow the MVC pattern, and after making the cha ...

What is the best way to insert a line break following a font awesome icon within a list?

My goal is to design a menu similar to LinkedIn's, featuring buttons made from icons positioned above text. However, I'm facing difficulty in inserting line breaks in CSS format after my FontAwesome icons. Despite having 'display:block' ...

Turn off transparency for the child element if the parent element has transparency

In my design setup, there is a container with an opacity set at 0.8. This allows the background image to subtly shine through the content area. The challenge arises when I place a client photo inside this container. The issue is that the photo's opac ...

particular shade for button border and background

I'm looking to create a button using Material UI that has a specific design. Right now, I've only been able to achieve this: https://i.stack.imgur.com/xvv7s.png Here is the code I am currently using. If anyone can help me identify what I'm ...

Unable to retrieve a state property within a Vue template

Embarking on my Vue journey, I've been immersing myself in online videos to grasp the essence of this framework. One intriguing observation that has piqued my curiosity is the difference in behavior when I switch from a template to a render function i ...

Creating vertical barplots that face each other in R can be achieved by using specific functions and

Can anyone guide me in creating a barplot similar to the one shown here: https://i.sstatic.net/WH02J.jpg (source: nature.com) I have been exploring R but I am unsure how to create two bar plots facing each other. I haven't found any helpful resourc ...

Save to a JSON file

Hey there, I'm having some trouble with pushing a temporary value to a JSON file using the command "MyJSON.name.push". It keeps giving me an error saying "Undefined is not an object". I've tried different approaches and using JavaScript arrays wo ...

Tips for obtaining the identifier of a div element while employing the bind() function in jQuery

Imagine having the following div. <div id="456" class="xyz">Lorem Ipsum</div> If I want to execute a function when this specific div is hovered over, I can achieve it like this: $(".xyz").bind({ mouseenter : AnotherFunction(id) }); Prio ...

Inside the Promise.then() function, iterate through the values using a for loop

I am curious about the behavior of promise.then() within a for loop in node.js. Consider the following code snippet: const arrayObject = [ { id: 1234, toto: 'abc' }, { id: 5678, toto: 'def' }, { id: 910, ...

Using Key Press to Rotate Messages - Jquery

Need help with rotating an array based on alphanumeric key presses? Check out the code snippet I've been working on below. Unfortunately, I'm having trouble getting the loop to function properly. Any suggestions or feedback would be greatly appre ...

Is there a way to alter the footer across all my pages using just one document?

I'm having trouble coming up with a title for my question, so please bear with me. I am working on a Bootstrap website and I want to create a consistent navbar and footer across all pages without duplicating the code in each document. How can I achiev ...

Trying out the Send feature of Gmail API using Postman

Attempting to use the Gmail API for sending emails. Utilizing Postman as a tool to test requests and obtain correct code for web application integration, encountering an error: { "error": { "errors": [ { "domain": "global", ...

AngularJS's $resource module returns an empty array as a response

I am trying to display a table with items from JSON data. I have created a Service that returns the JSON data. In my controller, I am querying the Service to receive an array of data. It's a little confusing because I am putting the response in a new ...

Use CSS Grid to anchor the final element to the right side of a horizontal navigation menu

I am facing an issue with my horizontal navigation bar that contains a dynamic number of links. In this specific case, there are 3 links displayed below: https://i.sstatic.net/f5vcn.png My goal is to keep the first two links in their original position an ...

Tips for centering or aligning a component to the right using Material UI?

Is there an efficient method to align my button to the right of its parent in Material UI? One approach could be using: <Grid container justify="flex-end"> However, this would also require implementing another <Grid item />, which m ...

Require a more efficient strategy for iterating through lines of input

One of the challenges I'm facing with my form is that it contains 5 input lines. I need to keep any blank lines that are sandwiched between two filled lines, while removing all others. For instance, if the first line is blank, the second line contains ...

modifying a mongodb array without actually updating it

Currently, I am facing an issue with updating the collection of users whose purchase dates have expired. When I attempt to save the changes, the user's role gets updated successfully but the purchase history does not reflect the changes. Below is the ...

Is there a way to make a TABLE expand to match the height of its surrounding element? (or, tackling sluggishness in IE with JavaScript)

I am facing a challenge with a web page where I have a table nested inside of a TD tag. Despite the unconventional approach, I need to ensure that the height of the nested table matches the height of the TD cell containing it upon page load. Currently, I a ...

Guide to setting up a dropdown menu with Material UI in React JS

I'm currently working on a dropdown menu that includes a nested menu structure, as depicted below: https://i.sstatic.net/FleC5.png Upon expanding the dropdown, two options are displayed: https://i.sstatic.net/jQlwN.png The issue I'm facing is ...

Unraveling the mystery of "??=" in Javascript/Typescript code

In a recent TypeScript code snippet, I came across the following: const arrayAA: Record< someSchema['propX'], typeof arrayBB > = {}; for (const varB of arrayBB) { (arrayAA[someStringValue] ??= []).push(varB) } What is ...