Obtaining data from a website with Java: A step-by-step guide

I am trying to retrieve the SIC Code from the following URL: . However, when I run my code, I encounter the following error:

public static void main(String[] args) {
    try {
        Document doc = Jsoup.connect("http://www.manta.com/c/mx4s4sw/bowflex-academy").ignoreHttpErrors(true).get();
        String textContents = doc.select("itemprop").first().text();
    } catch (IOException e) {
        e.printStackTrace();
    }
  }
}

Exception in thread "main" java.lang.NullPointerException
    at com.inndata.connection.GoogleScraperDemo.main(GoogleScraperDemo.java:22)

Answer №1

The selector "itemprop" needs to be adjusted.

The code for the SIC in the content can be found within a specific HTML structure:

  <tr>
      <th class="text-left" style="width:30%;">SIC Code</th>
      <td rel="sicDetails"><span itemprop="isicV4">7991</span>, Physical Fitness Facilities</td>
  </tr>

A more suitable selector would be something like

"span[itemprop='isicV4']"

This suggestion has not been tested yet. Keep in mind that any changes made by the website owners may break this selector, making it necessary to adjust accordingly in response. It might be worthwhile exploring alternative methods such as searching for the string SIC Code and then locating the relevant information below, but bear in mind that such techniques could be impacted by website modifications and adjustments.

Answer №2

Attempting to scrape data from a website that prohibits scraping can trigger bot detection if third party tools like Jsoup or HtmlUnit are used.

To avoid detection, it is recommended to utilize the built-in "java.net" library in Java for webpage retrieval and scraping.

Follow these steps to proceed smoothly:

  1. Create a URL Object from the target URL string -

    URL url = new URL(targetPageURLString);

  2. Establish an HTTP connection through the URL -

    HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();

  3. Retrieve the web response from the input stream -

    InputStream urlStream = urlConnection.getInputStream();

  4. Once the response is read byte by byte from the stream, convert the byte array into a String.

  5. Utilize regex to extract the necessary information or content.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Ways to change specific CSS class properties in a unique style

As a Java programmer using primefaces 5.1, I've encountered some challenges with handling CSS classes. One particular issue I'm facing is the need to override certain CSS properties to remove the rounded corners of a DIV element. The rendered cod ...

Two adjacent divs positioned next to each other, where the right-hand div occupies the remaining space within its

I am facing a common issue with two side-by-side divs, which I usually solve without any problems by floating both divs left and adding a clear:both div after them. However, my requirements have made this problem more complicated to solve... What I would ...

Revealing non-div elements with the CSS hover attribute

I'm looking for a way to utilize the CSS hover property on individual elements within a div in order to affect specific span elements. For example, hovering over "something1" should reveal "text1" and so forth. Is there a way to achieve this without ...

Trying to reduce the cursor box area within an A link containing a div box

My communication skills are lacking, so I have included an image to better illustrate my issue. Here is the problem .body { text-align: center; display: flex; flex-direction: column; align-items: center; } .flex-container { display: flex; ...

What is causing my page to refresh when making an Ajax call? (e.g. triggering the servlet request again

I'm currently learning about RESTful web service implementation using Java in my development environment, which consists of Netbeans IDE with GlassFish v3. Within my project, I have a specific page URL called /inventoryList, which is mapped to the In ...

Is there a way to format a block of text so that right-to-left text aligns to the right, while left-to-right text aligns

I have a content block that includes text in both left-to-right and right-to-left languages, and I need to ensure that both display correctly. If the user begins with left-to-right text, it should be displayed from left to right with left alignment If the ...

Issue with Caching during Javascript Minification

I Have ASP.Net MVC 3 App. Utilizing YUICompressor.Net for compressing Javascript and CSS files post build with MSBuild. The minimized javascript file is named JSMin.js and the CSS file is CssMin.css. In my master page, I reference these files as shown bel ...

Issue with Bootstrap 4 Navbar collapsing and not expanding again

Help needed! My navigation bar collapses when the window is resized, but clicking on the hamburger icon does not open it back up. I have included my code below. Can someone please provide guidance on how to expand the collapsed navbar? <html lang=&quo ...

Error encountered on Facebook Like button: The large button is failing to show the total number of likes received

Despite functioning flawlessly for months, the large Facebook Like button has suddenly stopped showing the number of "Likes". Strangely, the compact version is still working fine, but the larger button is concealing the count. I am using a Mac and have obs ...

What's the reason behind this file not opening?

When I try to insert this code into files index.html, style.css, and app.js, the page refuses to open. The browser constantly displays a message saying "The webpage was reloaded because a problem occurred." I am using a MacBook Air with macOS Big Sur and a ...

Why is my CSS and Bootstrap full-page image not displaying properly?

I've been struggling to get a full-page image to display on my website and resize responsively across different screens. I've searched through w3schools and Stack Overflow for solutions, but no matter what I try, it just doesn't seem to work ...

Lighthouse Issue: Facing PWA Challenges with a "Request Blocked by DevTools" Error

For hours now, I've been struggling to make Lighthouse work in Chrome for my initial PWA project. I feel completely lost as nothing seems to be making sense despite the basic code I have included below. The issue arises when I load the page normally ...

Is there a way to shift a background image pattern?

After searching extensively, I came up empty-handed and am seeking guidance on how to achieve a specific effect. Specifically, I am in need of a JavaScript or jQuery script that can smoothly shift a background image to the right within a designated div con ...

The image appears fuzzy until you hover over it and it transforms into a larger size

Encountering a strange issue with using the zoom property on an image during hover state. The image seems to be blurry before and after the scale transition, but surprisingly sharp during the actual transition. Any tips on how to prevent this blurriness in ...

The background image on mobile devices is excessively zoomed in

My RWD page is experiencing a strange issue with the background image. The image has the following styles: #background-image { width: 100%; height: 100%; opacity: 0.5; position: absolute; z-index: -1; background-image: url('../landing.jpe ...

javascript/AngularJS - make elements gradually disappear

I need help implementing a fade effect for an icon in the middle of a picture that indicates scrollability to the user. Currently, when the user scrolls, I can hide the icon but would like to add a nice smooth fade-out effect. Here is the HTML code snippe ...

Exploring various viewport dimensions using Django and Selenium

I am currently experimenting with testing the characteristics of specific UI elements at various viewport sizes and media query breakpoints defined in CSS. Initially, I have a setup function that initializes a headless Chrome browser with what I believe is ...

Optimizing CSS for printing with margins and overflow

After stumbling upon a helpful solution here, I wanted to print small cage cards in a neat format with some tweaks of my own. Currently, this is how it appears for me (Fiddle): /* CSS styles */ (styles modified here) * { -moz-box-sizing: border-b ...

Looking to target an element using a cssSelector. What is the best way to achieve this?

Below are the CSS Selector codes I am using: driver.findElement(By.cssSelector("button[class='btn-link'][data-sugg-technik='append_numbers']")).click(); driver.findElement(By.cssSelector("button[class='btn-link'][data-sugg- ...

Activating a link without clicking will not trigger any javascript functions

I have been attempting to display an image when I hover over a link, but for some reason, the .hover() event is not functioning as expected. Initially, I am just aiming to have an alert pop up. Once I have that working, I can proceed with fading elements i ...