Omit a specific tag during the web scraping process using Scrapy

I'm currently working on web crawling and analyzing the source code of a webpage. Here is a snippet from the page:

                  <div class="accordion-row">
              <h4 class="accordion-title down-arrow">The Problem</h4>
              ...
            </div>
                                <div class="accordion-row">
              <h4 class="accordion-title down-arrow">The Strategy</h4>
              ...
            </div>

In my crawling process, I have used the following line in my code to extract data:

introduction = response.css('.accordion-content').extract()

While it successfully crawls the data, I would like to crawl the sections within the accordion class separately. For example, I specifically want to crawl the paragraph starting with -

<h4 class="accordion-title down-arrow">The Problem</h4>

and also the section that begins with

<h4 class="accordion-title down-arrow">The Strategy</h4>

I only need the "Strategy" section and not all the other sections. As I am not very familiar with CSS, I am unsure how to specify the selector to achieve this selective crawling. Can anyone provide some guidance or suggestions?

Answer №1

When using extract(), the returned data is a list. This means that the first paragraph, labeled "The Problem," can be accessed with introduction[0], and the second paragraph, labeled "The Strategy," with introduction[1].

If you want to scrape these paragraphs individually, you can use the following code:

problem_paragraph = response.css('div.accordion-row:nth-child(1) > div').get()
strategy_paragraph = response.css('div.accordion-row:nth-child(2) > div').get()

This will retrieve the text along with any <br> tags present.

To only extract the text from each paragraph without any tags, you can utilize xpath with string():

problem_paragraph = response.xpath('string((//div[@class="accordion-content"])[1]/p)').get()
strategy_paragraph = response.xpath('string((//div[@class="accordion-content"])[2]/p)').get()

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Tips for formatting text within a post using Apostrophe CMS

I recently followed your recommended guide on utilizing reusable content with pieces, available at: Below is the definition for a biography: { name: 'body', label: 'Biography', type: 'area', options: { ...

Inconsistencies with sticky table functionality across various window sizes

Currently utilizing bootstrap-4 for my table implementation and have applied specific CSS styling to make the header and first column sticky. If you're facing a similar challenge, check out this informative resource: Sticky Header and First Column Yo ...

Refresh table in php without the need to reload the entire page

Currently, I am retrieving data in a table from the database and have three buttons - update, delete, and an active/inactive button. Each button has its own functionality, but I need to reload the updated data in the table after any event without refreshin ...

Display an echo within a DIV

Encountering a frustrating issue and seeking assistance. The problem seems to involve loading a page into a DIV. I've created a form for updating database information in a single file with PHP code, loaded into a DIV. When accessing the page directly ...

Is it possible to customize the deep elements of ExpansionPanelSummary using styled-components in React?

After digging into the documentation and examples on how to customize Material UI styling with styled-components, I successfully applied styling to the root and "deeper elements" within an ExpansionPanel and ExpansionPanelDetails. However, when attempting ...

In which situations is it appropriate to utilize + and > within CSS?

Although it may seem like a simple question, I still find myself puzzled about the usage of + or > in CSS. I often come across selectors such as li > a or div + span, but I'm unsure of the distinctions between them and when to apply each one? ...

CSS tooltip within the document styling

Previously, I utilized the following code: I am seeking guidance on how to position the tooltip inline (to the right) with the parent div instead of above. Thank you for reviewing For a more user-friendly version, refer to this jsFiddle link HTML: < ...

Which camera is typically activated for the getUserMedia API on mobile devices: front or rear?

When utilizing the getUserMedia API to access the camera on a desktop, it will open the web camera. This is useful for video communication, but when used on a mobile device, which camera is invoked - the front cam or rear cam? Is there a specific code ne ...

Here's a tutorial on creating a dropdown menu containing a list of years. When a specific year is selected, the table will display the individuals who registered during that year using PHP

<table class="table table-striped table-bordered bootstrap-datatable datatable"> <thead> <tr> <th></th> <th>Business Name</th> <th>Commencement Da ...

Delays can occur in CSS when transitioning an element's visibility from `visible` to `hidden`

I have a navigation bar with three elements arranged horizontally from left to right: socialMediaIcons, logoArea, and navBarOptionsList. I've written JavaScript and CSS code that changes the visibility of socialMediaIcons and navBarOptionsList to hid ...

Unable to get the active class to work in Bootstrap 4 navbar

I'm struggling to make the active class in this specific navbar work on my website. I want the link to turn blue when I click on the current page. Being new to web development, I would really appreciate any help. Thank you in advance. This is the nav ...

Attempting to dynamically change the text of a button in JavaScript without any user interaction

I have created a button function that displays a word's definition when clicked. However, I am now attempting to modify it so that the definitions are shown automatically every few seconds using "SetInterval" without requiring a click. I am unsure of ...

Enhancing list types with CSS styles

I am looking to achieve a unique style for my list that resembles the design in the image linked below. I want an ordered list with purple numbers paired with an unordered list featuring yellow bulletshttps://i.sstatic.net/ohReN.png However, my current st ...

Creating a Bootstrap grid layout: dividing the page into three sections

My goal is to divide the HTML page into three parts: 1. The first part will contain text and a table. 2. This section will house a form for user submission. 3. The third part will include hyperlinks or similar elements. Below is an example of my code: &l ...

dealing with a problem with the bootstrap navigation bar concept

I’ve been struggling to align the menu and company name in a single row. Initially, I tried setting the company name followed by some spaces before adding the menu items, but it didn't work out as expected. I've been spending the last couple of ...

The link text does not appear in black color

Can anyone assist in modifying this code snippet? I'm trying to make the text appear black, but it's showing up as light grey on the Dreamweaver preview screen even though the CSS says it should be black. View Fiddle Below is my HTML code: < ...

How can I use the store command to retrieve the href value of a link using a css selector in Selenium?

Is there a way to store a URL from a link using CSS instead of xpath? store //tr[td[contains(.,'6 Day')]][1]/td[8]/a@href my_var open $my_var If so, how can I achieve this goal with css? I managed to use the following locator: store css= ...

Utilizing checkboxes to dictate PHP and MySQL queries

Seeking assistance with MySQL, jquery and PHP. Here is the code provided: HTML <label class="checkbox"><input type="checkbox" name="code_site1" class="code_site1" checked="checked">Code 1</label> <label class="checkbox"><input ...

Utilize UI Kit to incorporate fluid margins dynamically

How can I dynamically add internal margins between cards in a responsive layout? The following code ensures responsiveness: Large Screens display 4 Cards. Medium Screens display 3 Cards. Small Screens display 2 Cards. Micro Screens display 1 Card. &l ...

Is it possible to direct to a webpage while retrieving a JSON object at the same

My login form code looks like this: <form action="urlLink/auth/signin" method="POST" class="signin form-horizontal" autocomplete="off"> <fieldset> <div class="form-group"> <!-- the email is expected as the user ...