My goal is to scrape a webpage and gather all the links present on it. The page displays 30 entries and to access the complete list, one must click on a "load all" button.
Below is the Python code snippet I'm currently using for this task:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.PhantomJS()
driver.get('http://www.christies.com/lotfinder/searchresults.aspx?&searchfrom=header&lid=1&entry=edgar%20degas&searchtype=p&action=paging&pg=all')
load_all_button = driver.find_element_by_css_selector('a.load-all')
load_all_button.click()
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
soup = BeautifulSoup(source_code, 'lxml')
url_list = []
for div in soup.find_all(class_='image-container'):
for childdiv in div.find_all('a'):
url_list.append(childdiv['href'])
print(url_list)
Provided below is the HTML code snippet of the "load all" button:
<div class="loadAllbtn">
<a class="load-all" id="loadAllUpcomingPast" href="javascript:void(0);">Load all</a>
</div>
Despite implementing the above code, I am still only able to extract the initial 30 links instead of the complete list. It appears that I may not be utilizing Selenium correctly and would appreciate any insights on what might be going wrong.
Up until now, I've been successful in setting up Selenium, installing Node JS, capturing and saving a screenshot to a file.