Triggering the retrieval of complete HTML content by clicking a button in order to load extra elements using Selenium

Question

Triggering the retrieval of complete HTML content by clicking a button in order to load extra elements using Selenium

My goal is to scrape a webpage and gather all the links present on it. The page displays 30 entries and to access the complete list, one must click on a "load all" button.

Below is the Python code snippet I'm currently using for this task:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.PhantomJS()
driver.get('http://www.christies.com/lotfinder/searchresults.aspx?&searchfrom=header&lid=1&entry=edgar%20degas&searchtype=p&action=paging&pg=all')

load_all_button = driver.find_element_by_css_selector('a.load-all')
load_all_button.click()

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
soup = BeautifulSoup(source_code, 'lxml')

url_list = []
for div in soup.find_all(class_='image-container'):
    for childdiv in div.find_all('a'):
        url_list.append(childdiv['href'])
print(url_list)

Provided below is the HTML code snippet of the "load all" button:

<div class="loadAllbtn">
    <a class="load-all" id="loadAllUpcomingPast" href="javascript:void(0);">Load all</a> 
</div>

Despite implementing the above code, I am still only able to extract the initial 30 links instead of the complete list. It appears that I may not be utilizing Selenium correctly and would appreciate any insights on what might be going wrong.

Up until now, I've been successful in setting up Selenium, installing Node JS, capturing and saving a screenshot to a file.

css python-3.x selenium web-scraping beautifulsoup

Answer 1

Answer №1

By selecting "Load all," you trigger an additional request to fetch all the items available. It is important to allow some time for the server to respond:

from selenium.webdriver.support.ui import WebDriverWait as wait

driver = webdriver.PhantomJS()
driver.get('http://www.christies.com/lotfinder/searchresults.aspx?&searchfrom=header&lid=1&entry=edgar%20degas&searchtype=p&action=paging&pg=all')

labtn = driver.find_element_by_css_selector('a.load-all')
labtn.click()

wait(driver, 15).until(lambda x: len(driver.find_elements_by_css_selector("div.detailscontainer")) > 30)

This piece of code enables you to wait for a maximum of 15 seconds until the number of items exceeds 30. At that point, you can extract the page source containing the complete list of items.

Just a reminder that you can simplify the code by using:

source_code = driver.page_source

Additionally, there's no need to resort to BeautifulSoup for extracting links to each item. You can achieve this with:

links = [link.get_attribute('href') for link in driver.find_elements_by_css_selector('div.image-container>a')]

Answer 2

By selecting "Load all," you trigger an additional request to fetch all the items available. It is important to allow some time for the server to respond:

from selenium.webdriver.support.ui import WebDriverWait as wait

driver = webdriver.PhantomJS()
driver.get('http://www.christies.com/lotfinder/searchresults.aspx?&searchfrom=header&lid=1&entry=edgar%20degas&searchtype=p&action=paging&pg=all')

labtn = driver.find_element_by_css_selector('a.load-all')
labtn.click()

wait(driver, 15).until(lambda x: len(driver.find_elements_by_css_selector("div.detailscontainer")) > 30)

This piece of code enables you to wait for a maximum of 15 seconds until the number of items exceeds 30. At that point, you can extract the page source containing the complete list of items.

Just a reminder that you can simplify the code by using:

source_code = driver.page_source

Additionally, there's no need to resort to BeautifulSoup for extracting links to each item. You can achieve this with:

links = [link.get_attribute('href') for link in driver.find_elements_by_css_selector('div.image-container>a')]

Triggering the retrieval of complete HTML content by clicking a button in order to load extra elements using Selenium

Answer №1

Similar questions

Discovering changing text on a website using Python's Selenium WebDriver

Trouble triggering hidden radio button in Angular 9 when clicked

Preventing Div items from rearranging during size transitions using toggleClass

Unable to establish a hyperlink to specific section of page using MUI 5 Drawer

What is the process for executing a right-click with a mouse in Robot Framework?

Issues with the CSS code causing the dropdown menu to malfunction

Center the popover over the element

Platform designed to simplify integration of high-definition imagery and scalable vector illustrations on websites

Having trouble incorporating custom CSS into my Rails/Bootstrap project

Is there a corresponding Python Selenium WebDriver command that mirrors the functionality of the Selenium IDE command 'openWindow'?

What is the importance of using time.sleep in Python Selenium Webdriver when locating elements by XPath?

What is the method for updating the value of the Sass variable in Angular 4?

What is the best way to align the text in the center without centering the numbers in an HTML ordered list?

"Creating a new element caused the inline-block display to malfunction

What methods can I use to customize the appearance of the acceptance label in a contact form created with

Use CSS to manipulate the dimensions of an overlay

The appearance of my website appears differently depending on the resolution

Choosing pseudo-elements with CSS styling rules

Can object-fit be preserved while applying a CSS transform?

Unable to toggle class feature