Obtaining data from a website with Java: A step-by-step guide

I am trying to retrieve the SIC Code from the following URL: . However, when I run my code, I encounter the following error:

public static void main(String[] args) {
    try {
        Document doc = Jsoup.connect("http://www.manta.com/c/mx4s4sw/bowflex-academy").ignoreHttpErrors(true).get();
        String textContents = doc.select("itemprop").first().text();
    } catch (IOException e) {
        e.printStackTrace();
    }
  }
}

Exception in thread "main" java.lang.NullPointerException
    at com.inndata.connection.GoogleScraperDemo.main(GoogleScraperDemo.java:22)

Answer №1

The selector "itemprop" needs to be adjusted.

The code for the SIC in the content can be found within a specific HTML structure:

  <tr>
      <th class="text-left" style="width:30%;">SIC Code</th>
      <td rel="sicDetails"><span itemprop="isicV4">7991</span>, Physical Fitness Facilities</td>
  </tr>

A more suitable selector would be something like

"span[itemprop='isicV4']"

This suggestion has not been tested yet. Keep in mind that any changes made by the website owners may break this selector, making it necessary to adjust accordingly in response. It might be worthwhile exploring alternative methods such as searching for the string SIC Code and then locating the relevant information below, but bear in mind that such techniques could be impacted by website modifications and adjustments.

Answer №2

Attempting to scrape data from a website that prohibits scraping can trigger bot detection if third party tools like Jsoup or HtmlUnit are used.

To avoid detection, it is recommended to utilize the built-in "java.net" library in Java for webpage retrieval and scraping.

Follow these steps to proceed smoothly:

  1. Create a URL Object from the target URL string -

    URL url = new URL(targetPageURLString);

  2. Establish an HTTP connection through the URL -

    HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();

  3. Retrieve the web response from the input stream -

    InputStream urlStream = urlConnection.getInputStream();

  4. Once the response is read byte by byte from the stream, convert the byte array into a String.

  5. Utilize regex to extract the necessary information or content.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

"Create a notification pop-up in CSS that appears when a link is clicked, similar to

I am looking to create a model page that functions similarly to the inbox popup on Stack Overflow. When we click on the inbox icon, a small box appears with a tiny loader indicating messages or comments. The box then expands depending on the content inside ...

Tips for resolving the stale element reference error within my codebase

I keep encountering the Stale element error. Despite my attempts to resolve it, I have been unsuccessful so far. d.get("https://iaeme.com/ijciet/index.asp"); java.util.List<WebElement>link = d.findElements(By.className("lik")); for (int k=1 ; k< ...

Guide to adding a parent and child in a OneToOne unidirectional connection

As I embark on creating a Web application, the structure I have set up is as follows: +--------------------------+ | Company | |--------------------------| | id PK, NN, AI | | attibute1 | | attribute2 ...

Utilizing HTML form action to link to a PHP script located in a separate folder

My journey begins here Main: Members/name.html search.php Members is a directory containing my html code in name.html <form action="../search.php" method="get"> <div> &l ...

Creating a visually engaging parallax effect in two columns with differing lengths that align perfectly at the conclusion

Recently, I came across an interesting layout technique on Apple's website that involves two columns with different lengths. As you scroll down the page, one column slows down to align with the other so that they both meet at the bottom. You can chec ...

Align Bootstrap 4 dropdowns with their parent element

Is there a way to match the width of a bootstrap dropdown menu with its parent input group? <div id="ddl_1" class="input-group"> <input type="text" class="form-control "> <div class="input ...

Can one customize the background color of a segment in a radar chart using Chart.js?

Could I customize the color of the sectors in my radar chart to resemble this specific image? https://i.stack.imgur.com/U8RAb.png Is it feasible to achieve this using Chart.js? Alternatively, are there other chart libraries that have this capability? It ...

Tips for populating a database with JPA entities using IntelliJ

Is there a way to generate database tables from entity classes in IntelliJ without using an ER diagram? I know that the IDE allows you to create entities from the database, but can it work the other way around as well? When I used netbeans, I could easily ...

Accessing my server using Android Studio on my Android device

While I am able to access my laptop's web server at http://localhost:8084/.. using the Android emulator, I face issues when trying to connect from my real Android phone. Despite being connected to the laptop via a USB cable, I cannot establish a conne ...

Activating Vue-Bootstrap components through an image click event in VueJS 2

Seeking to achieve: VueJS integration with Bootstrap for clickable cards I am currently working on a VueJS project where I want the cards to be clickable and reveal collapsible elements upon click. To accomplish this, I have implemented a button with the ...

Looking to showcase your logo prominently at the center of the page, flanked by two elegant

Is it possible to create a grid-like structure with dimensions 5.5-1-5.5? I would like to center a logo on the page with two lines on the left and right. (Please excuse my limited English skills) https://i.sstatic.net/cgjfS.png ...

Is it possible for the ImageListItem Img to occupy the entire Card Space?

I am faced with the challenge of making an ImageListItem occupy all available space within a card. The goal is to have a rounded Card with an image background and a text bar at the bottom. However, I am struggling to get the ImageListItem to expand beyond ...

Optimizing the position of smart info windows in Google Maps

I am facing a challenge while attempting to switch the infowindow in Google maps to the smartinfowindow, as the position of the infowindow appears incorrect. This issue only occurs with the smartinfowindow and not with the standard infowindow. Upon furthe ...

Adjust the z-Index of the list item element

I am attempting to create an effect where, upon clicking an icon, its background (width and height) expands to 100% of the page. However, I am struggling with ensuring that the 'effect' goes underneath the this.element and above everything else. ...

Tips for transferring a jQuery array to PHP

I am encountering an issue when trying to send a jQuery array to PHP. Initially, I have one form in HTML and upon clicking 'add', I end up with two forms. Afterwards, I input data into the form which is then stored in a jQuery array. However, I a ...

Adjust the size of the <textarea> to match the height of the table cell

Below is the code I am using to generate a table containing an image along with a <textarea>: <table border="1" style="border-color: #a6a6a6" cellpadding="4" cellspacing="0" width="702">\ <col width="455"> <col width="230"> ...

Send a user to a different page following a page redirection

I'm attempting to redirect a page after a previous redirect, each one at a specific time interval. For example, after 5 seconds I want to redirect to one page, then after another 5 seconds to a different page. I'm not sure if this is possible. T ...

Handle improperly formatted XML in Perl using Perl-XML

Using the perl command line utility xpath, I am extracting data from HTML code in the following manner: #!/bin/bash echo $HTML | xpath -q -e "//h2[1]" The HTML is not well-formed, causing xpath to throw the error below: not well-formed (invalid token) a ...

Styling emails in an inbox with CSS

I am developing an email application and aiming for the inbox layout to be similar to that of Mac Mail. The emails are fetched from a database using ajax, outputting to XML. I then loop through the entries to extract the necessary elements. My concern li ...

Display a dropdown menu when clicking on a close button in a single element using Vanilla JavaScript

I'm currently in the process of learning Javascript and trying to grasp the concept of events and selectors. My aim is to have a close button that, when clicked, triggers a specific dropdown related to the card it's attached to. I plan to achie ...