Red-colored text (as displayed in Firefox) is not recognized by JSoup when parsing HTML <img> tags

Question

Red-colored text (as displayed in Firefox) is not recognized by JSoup when parsing HTML <img> tags

UPDATE: I was able to answer my own question. JSoup can indeed find all image tags.

I've been attempting to scrape content from but encountered a problem.

In the website's source code, the main images are displayed in red font, which seems to be causing issues with my JSoup select method and getElementsByTag method. It would be helpful if you could check the source code yourself due to formatting differences, but I'll provide the essential information here.

UPDATE: After inspecting the code on Chrome and IE, I noticed that the image tags are not displayed in red, indicating it might be an issue specific to Firefox. However, JSoup still fails to detect these images. (Further updates at the end of the post)

UPDATE 3: Instead of pasting my code, I have shared a screenshot here: . If you observe, the red blocks represent the user-uploaded images (the ones I need), while other img tags appear in different colors (mostly logos). When I execute the following code:

Elements imageElements = doc.select("img");

and print the results, I only get non-red image tags.

Since I don't have much experience with HTML or CSS, is there something crucial I'm missing? Could this be an issue with my code or some specific quirk of the webpage design? Is there a way to retrieve those "red font" images as well?

UPDATE 2: Upon further investigation, I found that the red HTML font in Firefox displays an error message stating: No space between attributes.

This has left me a bit perplexed considering Flickr is a renowned platform where everything seems to be functioning correctly. Could this behavior be intentional to prevent scraping? Is there still a way for me to access and download those images?

html css firefox jsoup

Answer 1

Answer №1

Responding to my own inquiry.

I admit that I was wrong; JSoup actually does locate all the img tags. Although, I can't pinpoint exactly where I went wrong since I observed it working yesterday and have made changes to my code since then. I suspect my mistake had something to do with how I used .select, which might have caused those images to be excluded (the code in this post was simplified for clarity).

I'll keep this question posted as it may assist others grappling with malformed HTML in their source code, as there are several useful pointers in the comments section.

Answer 2

Responding to my own inquiry.

I admit that I was wrong; JSoup actually does locate all the img tags. Although, I can't pinpoint exactly where I went wrong since I observed it working yesterday and have made changes to my code since then. I suspect my mistake had something to do with how I used .select, which might have caused those images to be excluded (the code in this post was simplified for clarity).

I'll keep this question posted as it may assist others grappling with malformed HTML in their source code, as there are several useful pointers in the comments section.

Red-colored text (as displayed in Firefox) is not recognized by JSoup when parsing HTML <img> tags

Answer №1

Similar questions

Tips for deactivating the double class with jQuery

Passing a Javascript variable to the NAME attribute of an HTML <a href> tag: Steps to do it efficiently

Display the initial three image components on the HTML webpage, then simply click on the "load more" button to reveal the subsequent two elements

What could be causing Vuejs to not update elements promptly?

Rearrange the middle column to be the top column on smaller screens

Guide on inserting tooltip to designated header column in primeNG data table

Error loading custom Javascript in MVC 4 view during the first page load

I am in the process of creating several checkboxes and am looking to incorporate some added functionality

The component triggering the redirect prematurely, interrupting the completion of useEffect

Is it possible to generate an HTML element by utilizing an array of coordinates?

What is the best way to vertically align a container beneath a navbar without triggering the appearance of a scrollbar?

.scss compiling with errors

Using setInterval together with jQuery to animate CSS based on changes in window size

Guidance on incorporating static files with Spring MVC and Thymeleaf

AngularJS tips for resolving an issue when trying to add duplicates of a string to an array

Offspring of the superior element resting above another element

Customizing the DatePicker with a unique button in material-ui

Assist me in temporarily altering the color of menu items in a sequential manner upon the page's loading

The element is unclickable due to being obstructed by another element

The method of utilizing React with Redux to display component properties