Red-colored text (as displayed in Firefox) is not recognized by JSoup when parsing HTML <img> tags

UPDATE: I was able to answer my own question. JSoup can indeed find all image tags.

I've been attempting to scrape content from but encountered a problem.

In the website's source code, the main images are displayed in red font, which seems to be causing issues with my JSoup select method and getElementsByTag method. It would be helpful if you could check the source code yourself due to formatting differences, but I'll provide the essential information here.

UPDATE: After inspecting the code on Chrome and IE, I noticed that the image tags are not displayed in red, indicating it might be an issue specific to Firefox. However, JSoup still fails to detect these images. (Further updates at the end of the post)

UPDATE 3: Instead of pasting my code, I have shared a screenshot here: . If you observe, the red blocks represent the user-uploaded images (the ones I need), while other img tags appear in different colors (mostly logos). When I execute the following code:

Elements imageElements ="img");

and print the results, I only get non-red image tags.

Since I don't have much experience with HTML or CSS, is there something crucial I'm missing? Could this be an issue with my code or some specific quirk of the webpage design? Is there a way to retrieve those "red font" images as well?

UPDATE 2: Upon further investigation, I found that the red HTML font in Firefox displays an error message stating: No space between attributes.

This has left me a bit perplexed considering Flickr is a renowned platform where everything seems to be functioning correctly. Could this behavior be intentional to prevent scraping? Is there still a way for me to access and download those images?

Answer №1

Responding to my own inquiry.

I admit that I was wrong; JSoup actually does locate all the img tags. Although, I can't pinpoint exactly where I went wrong since I observed it working yesterday and have made changes to my code since then. I suspect my mistake had something to do with how I used .select, which might have caused those images to be excluded (the code in this post was simplified for clarity).

I'll keep this question posted as it may assist others grappling with malformed HTML in their source code, as there are several useful pointers in the comments section.

