I've been struggling with this basic task for hours. I can't find any libraries that work and none of the questions here address my specific issue.
Here's what I need to do:
- The entire page's markup is in a string format.
- I must use CSS selectors to target the elements I want to extract data from.
- I don't want to create actual HTML DOM elements, just scrape data. The page may contain images, audio, video, and other elements that I'm not interested in creating.
- It needs to handle markup errors and follow HTML5-style tagging. Trying to parse it as XML throws an "Invalid XML" error.
- This operation must happen in the browser without using NodeJS modules.
In Java, I achieved this using JSoup. However, I haven't found a comparable library for JavaScript in the browser.
Thank you for your assistance.