What is the most effective way to extract data from a .dpbox table using selectorgadget in R (rvest)?

Question

What is the most effective way to extract data from a .dpbox table using selectorgadget in R (rvest)?

Recently, I've been experimenting with web scraping data from different websites using selectorgadget in R. One successful example was when I extracted information from . My usual approach involves utilizing the selectorgadget Chrome extension to choose the tables I need and then inserting the CSS Selection outcome into my code like this:

urlx <- "http://www.dotabuff.com/heroes/abaddon/matchups"
rawData <- html_text(html_nodes(read_html(urlx),"td:nth-child(4) , td:nth-child(3), .cell-xlarge"))

When trying to extract data from , my selectorgadget query looked like this:

urlx <- "http://www.dotapicker.com/heroes/abaddon"
rawData <- html_text(html_nodes(read_html(urlx),".ng-scope:nth-child(1) .ng-scope .ng-binding"))

However, this time, no nodes were returned after calling the html_nodes function, resulting in:

{xml_nodeset (0)}

I suspect that the issue might be related to the structure of the table being nested within a drop-down box, unlike the previous scenario where the table was directly on the webpage. I'm currently exploring solutions to overcome this challenge.

Your assistance is greatly appreciated!

html css r web-scraping rvest

Answer 1

Answer №1

Upon examining this website, it appears that data is loaded dynamically through XHR requests. To confirm this in Chrome, you can navigate to the inspect tool and access the network tab. Once there, you will observe various json files being fetched. By directly scraping these json files, you can then extract the information you require. Below is a brief illustration:

library(httr)
library(jsonlite)

heroinfo_json <- GET("http://www.dotapicker.com/assets/json/data/heroinfo.json")
heroinfo_flat <- fromJSON(content(heroinfo_json, type = "text"))
#> No encoding supplied: defaulting to UTF-8.

winrates_json <- GET("http://www.dotapicker.com/assets/dynamic/winrates10d.json")
winrates_flat <- fromJSON(content(winrates_json, type = "text"))
#> No encoding supplied: defaulting to UTF-8.

Answer 2

Upon examining this website, it appears that data is loaded dynamically through XHR requests. To confirm this in Chrome, you can navigate to the inspect tool and access the network tab. Once there, you will observe various json files being fetched. By directly scraping these json files, you can then extract the information you require. Below is a brief illustration:

library(httr)
library(jsonlite)

heroinfo_json <- GET("http://www.dotapicker.com/assets/json/data/heroinfo.json")
heroinfo_flat <- fromJSON(content(heroinfo_json, type = "text"))
#> No encoding supplied: defaulting to UTF-8.

winrates_json <- GET("http://www.dotapicker.com/assets/dynamic/winrates10d.json")
winrates_flat <- fromJSON(content(winrates_json, type = "text"))
#> No encoding supplied: defaulting to UTF-8.

What is the most effective way to extract data from a .dpbox table using selectorgadget in R (rvest)?

Answer №1

Similar questions

Angular showcases the presence of a signed-in user at page load, disregarding the absence of

What is the significance of using "your-shop" as the action form without a file extension?

"Encountering a glitch while attempting to upload documents into RavenDB

Converting text/plain form data to JSON using Node.js - step by step guide

Updating an HTML Table with AJAX Technology

Tips for programmatically choosing dropdown menus in Angular 5

How to use AngularJS to collapse various panels with unique content

Using HTML and CSS to implement a broadened perspective for a specific design

Maintain the HTML font color when printing - Issue with IE settings, not the printer

Having trouble closing the phonegap application using the Back Button on an Android device

adding <script> elements directly before </body> tag produces unexpected results

Unable to load JQuery from a div element

How can I transform an HTML div into a video file like MP4 using Python and Django?

Adjust the font color in code function for Woocommerce

The side modal features a full-height design with headers, bodies, and footers that are

"Can anyone provide guidance on how to initiate a css 3d animation by clicking a button

`The header navigation is not responding to window resizing functionality`

Update dynamically generated CSS automatically

Typescript includes empty spaces in its duplicate-checking process

What is the best way to make an element "jump to the top" when the parent element has reached its maximum height?