Is there a way to extract and store an image from a webpage using selenium, beautifulsoup, and Python 3?

Question

Is there a way to extract and store an image from a webpage using selenium, beautifulsoup, and Python 3?

Currently, my main goal is to extract and save a single image from a website post logging in. After examining the image, I discovered that it has a full xpath of

/html/body/form/main/div/section/div[1]/div/div[2]/div/img

. My plan is to utilize beautiful soup or an image crawler to save the image into a variable and then use tesseract to extract text from the image. So far, I've encountered difficulties with urllib, urllib.requests, and selenium's method of reading images by xpath. I initially attempted to use selenium to save the image but yielded no successful outcomes. At this point, I am seeking assistance with the coding aspect to determine if it's feasible to store the image in a variable and whether tesseract can access the image through that variable. Both the image samples and their inspection images are provided below (the highlighted image showcases the inspected text). Please note that the form displayed is only a representation and does not actually exist in reality - at least to my knowledge. Any guidance on this matter would be greatly appreciated. Thank you.

Image 1:

https://i.stack.imgur.com/kpJ55.png

Image 2:

https://i.stack.imgur.com/DEygr.png

html css selenium-webdriver beautifulsoup tesseract

Answer 1

Answer №1

To save the image, you can utilize urllib.

import urllib
from selenium import webdriver

driver = webdriver.Chrome()
driver.get(WEBSITE_URL)

# locate and retrieve the image  
img = driver.find_element_by_xpath('/html/body/form/main/div/section/div[1]/div/div[2]/div/img')
src = img.get_attribute('src')

# download the image
urllib.request.urlretrieve(src, "img.png")

This method will store the image in a file named img.png within your current working directory. Subsequently, you may employ image processing and tesseract to extract text from it. It is advisable not to solely rely on static XPATH for image detection, as changes made by the website's owner could disrupt this process. Instead, consider using:

img = driver.find_element_by_id("ContentPlaceHolder1_Imgquestions")

,

This way, even if there are modifications to the website layout, you'll still be able to locate the image based on its unique id.

Answer 2

To save the image, you can utilize urllib.

import urllib
from selenium import webdriver

driver = webdriver.Chrome()
driver.get(WEBSITE_URL)

# locate and retrieve the image  
img = driver.find_element_by_xpath('/html/body/form/main/div/section/div[1]/div/div[2]/div/img')
src = img.get_attribute('src')

# download the image
urllib.request.urlretrieve(src, "img.png")

This method will store the image in a file named img.png within your current working directory. Subsequently, you may employ image processing and tesseract to extract text from it. It is advisable not to solely rely on static XPATH for image detection, as changes made by the website's owner could disrupt this process. Instead, consider using:

img = driver.find_element_by_id("ContentPlaceHolder1_Imgquestions")

,

This way, even if there are modifications to the website layout, you'll still be able to locate the image based on its unique id.

Is there a way to extract and store an image from a webpage using selenium, beautifulsoup, and Python 3?

Answer №1

Similar questions

Switch the ng-bind-html option

Having difficulty in setting a Cookie with php

What can I do to prevent Masonry from floating all of my grid items to the left?

Allow users to zoom in and out on a specific section of the website similar to how it works on Google Maps

Aligning the icon within the div and adding a gap between each div

Customize your CSS line height for the bottom of text only

Looking to position the Secondary Navigation Bar on squarespace at the bottom of the page, distinct from the primary Navigation Bar

Discrepancies in Span Element Behavior Between Firefox and Chrome

Is there a way to implement a scrollbar that only scrolls through one specific column in an HTML table?

Struggled to Find a Solution for Code Alignment

Creating a JSON file using an object to send requests in Angular

Include a scrollbar within a div element nested inside a table cell

How can I obtain the current state of HTML checkboxes from a local JSON file?

Bootstrap-tour is incompatible with a row within a table structure

What is the best way to redirect to the index page after successfully submitting a new record on the front-end? [Using Rubymine 2020.2.3, Ruby 2.7.2p137, and gem 3.1.2]

Tips for locating Xpath when the identifiers for id, class, and type are identical

Stylish CSS for your website's navigation

jQuery live DataAttribute manipulation

Choosing a specific item from a drop down menu in Selenium WebDriver

Firefox compatibility issue: Bootstrap modal button not functioning properly when wrapping other HTML elements