Acquire data from child elements during the process of web scraping

I am looking to extract a list of Phase states from a website. Here is the code I have so far:

library("rvest") 
library("magrittr")

url <- 'https://energybase.ru/en/oil-gas-field/index'

read_html(url) %>% 
        html_nodes(".info")%>% 
             html_children()%>%
                  html_children()

After running this code, I received the following output:

 [1] <small>City</small>
 [2] <div class="value">Игарка</div>
 [3] <small>Phase state</small>
 [4] <div class="value">нефтегазовое</div>
 [5] <small>Извлекаемые запасы A+B1+B2+C1</small>
 [6] <div class="value">479.10 mln. tons</div>
 [7] <small>City</small>
 [8] <div class="value">Тазовский</div>
 [9] <small>Phase state</small>
[10] <div class="value">газонефтяное</div>
[11] <small>Извлекаемые запасы A+B1+B2+C1</small>
[12] <div class="value">422.00 mln. tons</div>
[13] <small>City</small>
[14] <div class="value">Лянтор</div>
[15] <small>Phase state</small>
[16] <div class="value">нефтегазоконденсатное</div>
[17] <small>Извлекаемые запасы A+B1+B2+C1</small>
[18] <div class="value">380.00 mln. tons</div>
[19] <small>City</small>
[20] <div class="value">Тобольск</div>

To get all the phase states after

<div class="value">

The desired result should be:

нефтегазовое
газонефтяное
нефтегазоконденсатное

What function would best help me solve this issue?

Answer №1

If you want to extract specific data from a webpage, you can utilize the following code snippet:

read_html(url) %>% 
  html_nodes(".col-md-8:nth-child(2) .value") %>% 
  html_text

This will retrieve information like:

 [1] "нефтегазовое"          "газонефтяное"          "нефтегазоконденсатное" "нефтяное"             
 [5] "нефтяное"              "нефтегазовое"          "нефтяное"              "нефтяное"             
 [9] "нефтяное"              "нефтегазоконденсатное" "нефтегазоконденсатное" "нефтяное"             
[13] "нефтегазоконденсатное" "нефтегазоконденсатное" "нефтяное"              "нефтяное"             
[17] "газонефтяное"          "нефтегазоконденсатное" "нефтяное"              "нефтегазовое"  

To identify the correct css-selector (.col-md-8:nth-child(2) .value), consider using a helpful tool like - you can see an example screenshot below:

https://i.sstatic.net/obASs.jpg

Answer №2

If you pull from the dropdown options, you can extract a unique list without any repeating elements. It all depends on whether you want the complete list with duplicates or not.

Using the rvest and magrittr libraries in R, you can fetch data from a specific webpage for analysis. For example, by reading HTML content from 'https://energybase.ru/en/oil-gas-field/index' and selecting certain nodes, you can build your dataset while eliminating redundant information.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Enhance Material-UI's SwipeableDrawer by incorporating a clickable handle

Is there a way to include a clickable handle in a material ui SwipeableDrawer? Something similar to what's shown on Google Maps here: Google Maps button.css position: absolute; top: 100px; left: -50px; width: 50px; height: 100px; z-index: 10000; // ...

The button fails to function properly after loading a partial view using AJAX

Having an issue with my page that contains two buttons - Next and Previous. These buttons are supposed to load a table in a PartialView. The problem arises when I press the button for the second time, it doesn't work as expected. Below is the code sn ...

How to ensure that a div element occupies the entire height of the webpage

After developing a small app in Angular, I'm looking to make the container element expand to the full height of the page, even when the content doesn't fill the entire space. On larger screens, the page doesn't stretch as desired. Here' ...

Should I use Sqlite or Mysql for this project?

I'm in the process of developing an internet-based operating system and I am planning to utilize a database along with persistence.js to store installed apps. Can anyone recommend the optimal database for this purpose? ...

What is the proper method for navigating down a webpage after clicking a hyperlink?

I'm currently developing this webpage. When I click on the "state" (e.g. CA), I expect the page to slide down and display the results. Currently, the results are showing up but without any sliding effect. ...

Concealing dropdown menus with JQuery

I'm facing an issue with my code where my drop-down menu hides whenever I click on nested elements of "element2" in the menu. I only want it to hide when clicking directly on "element2," not its subelements. Here is the desired effect I am looking for ...

The !important rule in CSS seems to be malfunctioning

I've been struggling to change the background color of my table rows. Here's what I attempted: <tr style="background-color:#000099 !important;"> Unfortunately, this resulted in no change: After hours of trying and searching online for so ...

Center nested rows vertically within an alert box using Bootstrap

Can anyone provide some insight into why the inner row is not vertically centered within its alert-box? I feel like it might have something to do with the nested row structure. Here is a link to the code: https://jsfiddle.net/d2pg4xta/ <div class=" ...

Maintain the current layout, but reduce its size on mobile devices

For quite some time, I have been struggling to make my website mobile-friendly. Despite using a grid layout, the website looks terrible when viewed on phones. All I want is to scale down the desktop version to half the size when accessed on phones. I&apo ...

Tips for transferring data from ajax to Django views

I have a set of JSON data that looks like the following: [{"item":"Datalogger","hsn":"123","unit_name":"BAG","unit_price":"100","quantity":"6", "tax_c ...

Performing multiple actions with the same key in Jquery based on their position

My goal is to enable users to scroll through a list by pressing the down arrow key, and I have successfully implemented this feature. In addition, users should be able to load the next list when they reach the end of the current list. This functionality h ...

Shift the inline form towards the right within the navigation bar

I have the code located at this link. How can I align the inline form to the right? I am using Bootstrap 4 and have tried using the float-right class, but it doesn't seem to be working. Is there something I am missing here? <!DOCTYPE html> < ...

Transforming content dynamically

I am currently working on a single-page website that features a comprehensive product catalog. Here are the key elements I'm focusing on: A one-page website layout, complete with header, content, and footer (developed using HTML5/CSS3) Dynamic cont ...

What is the best way to insert a space within a stationary element?

I'm facing an issue with two fixed elements positioned at the bottom of my webpage: #wrapper { position: fixed; background: gray; color: #fff; bottom: 0; left: 0; right: 0; border-radius: 12px 12px 0 0; width: 100%; } #bottom-eleme ...

The for loop in angularjs is limited to a single iteration

JS Code: $scope.checkCheckboxStatus = function() { var checkboxes = document.getElementsByName('chk[]'); var checkboxChecked = false; for (var i = 0; i < checkboxes.length; i++) { if (checkboxes[i]. ...

Embedded iframe links failing to open within the designated frame

I have encountered an issue on my website where the links under the portfolio section now open in a new window instead of the intended iframe. This change occurred suddenly without any alterations to the code. Despite closely examining and trying various ...

Switching between classes using jQuery

I'm working on implementing a play and pause button for a video, and I'm facing an issue with switching the class when the button is pressed. What I want is that when the pause button is clicked, the class changes to something else, and vice vers ...

Extract all information from the HTML table and store it in an array

I'm currently able to store all generic text data in an array, but I'm facing difficulties with the select boxes inside table cells. In my jQuery script, I have the following: $('.image-button').click(function(){ var myTableArr ...

Exploring the Power of Vectors in R

To solve this problem, your function should accept a vector consisting of 0s and 1s. Each time a sequence of 1s appears in the data, the number of children should increase by 1. However, special care needs to be taken with two consecutive sequences of 1s w ...

Problem with modals not triggering within the iDangero.us swiper

I am encountering an issue with the iDangerous.us Swiper where I cannot activate any events within the swiper-wrapper. I am attempting to trigger a modal on each slide but nothing is happening. Any Modal placed inside the swiper-wrapper does not work. I a ...