Steps to obtain the precise source code of a webpage

Is there a way to download the exact source code of a webpage? I have tried using the URL method and Jsoup method, but I am not getting the precise data as seen in the actual source code. For example:

<input type="image"
       name="ctl00$dtlAlbums$ctl00$imbAlbumImage"    
       id="ctl00_dtlAlbums_ctl00_imbAlbumImage"
       title="Independence Day Celebr..."
       border="0"         
       onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');"
       onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" 
       src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG"     
       alt="Independence Day Celebr..." 
       style="height:79px;width:148px;border-width:0px;"
/>

The 'style' attribute in this tag is not being detected by the Jsoup code. Additionally, when downloading using the URL method, the style tag gets changed into a border=""/> attribute.

I have tried the following code:

URL url=new URL("http://www.apcob.org/");
InputStream is = url.openStream();  // throws an IOException
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
File fileDir = new File(contextpath+"\\extractedtxt.txt");
Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
while ((line = br.readLine()) != null)
{
  fw.write("\n"+line);
}
 InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
String baseUrl="http://www.apcob.org/";
Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
System.out.println(doc);

Another method I attempted is:

Document doc = Jsoup.connect(url_of_currentpage).get();

I am trying to achieve this in Java for the website '' where this issue is happening.

Answer №1

The reason for the variation is likely a result of using a distinct user agent string - when you access the page through your browser, it transmits a user agent string containing information about the type of browser being utilized. Certain websites may display different pages based on the browser being used (e.g. mobile devices).

Try matching your browser's user agent string to see if that resolves the issue.

Answer №2

The download page has been altered by a javascript code, which cannot be executed by Jsoup, an html parser.

If you want to view the source code as it appears in Chrome, you can use one of these tools:

All three tools are capable of parsing and executing Javascript code within the page.

Answer №3

It seems like this solution would do the trick,

public static void main(String[] args) throws Exception {
    //Only use this if you are working with a proxy
    //System.setProperty("java.net.useSystemProxies", "true");

    URL url = new URL("http://www.apcob.org/");

    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36");
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream()));

    String inputLine;
    while ((inputLine = bufferedReader.readLine()) != null)
        System.out.println(inputLine);
    bufferedReader.close();
}

Answer №4

Check out this useful function for fetching webpages. Use it to get the HTML String, then convert the String to a Document with JSOUP.

public static String fetchPage(String urlFullAddress) throws IOException {
//      String proxy = "10.3.100.207";
//      int port = 8080;
        URL url = new URL(urlFullAddress);
        HttpURLConnection connection = null;
//      Proxy proxyConnect = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxy, port));
        connection = (HttpURLConnection) url.openConnection();//proxyConnect);
        connection.setDoOutput(true);
        connection.setDoInput(true);

        connection.addRequestProperty("User-Agent",
                "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10'");
        connection.setReadTimeout(5000); // set timeout

        connection.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
        connection.addRequestProperty("Accept-Language", "en-US,en;q=0.5");
        connection.addRequestProperty("Accept-Encoding", "gzip, deflate");
        connection.addRequestProperty("connection", "keep-alive");
        System.setProperty("http.keepAlive", "true");

        BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

        String urlString = "";
        String current;
        while ((current = in.readLine()) != null) {
            urlString += current;
        }

        return urlString;   
}

If you encounter issues with the JSOUP Parser, consider using . It parses HTML as-is, without correcting errors.

A couple of other things I observed: You forgot to close fw. Replace UTF8 with UTF-8`. For extensive CSS parsing, try a CSS-Parser

Answer №5

When retrieving a webpage through the use of http, the web server typically presents the source in a specific format; accessing the exact source code of a php file is not possible via http. From what I understand, the only method to achieve this is by utilizing ftp.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Retrieving the authenticated user post logging in through Firebase

After a user signs up, I want to send a verification email. I've written the code for it, but I'm facing an issue where trying to access the current user with Firebase in React Native always returns null. How can I resolve this? Below is the sig ...

Is it possible to adjust the width of Material-UI TextField to match the width of the input text?

Is there a way for Material-UI to adjust the width of the TextField element automatically based on the input text? When creating a form view/edit page and rendering data into fields, I also have parameters set by the server. It would be convenient to have ...

Words appear on the screen, flowing smoothly from left to right

I am trying to create a hover effect where a caption appears when an image is hovered over. The text should slide in from left to right, causing the container to grow along the X axis as the image is hovered over. I have managed to make the text appear on ...

Attempting to establish a cookie from the server end, however, it is not being successfully set on my client

Attempting to set a cookie on the client browser from the server side using Node.js and Express. When signing up, the cookie is properly sent in the response object but not being set on the client browser. Additionally, when trying to access a protected AP ...

Closing the Material UI Drawer

Having an issue with my material UI drawer - I can open it successfully, but when attempting to close it, the event does not trigger. import React from 'react'; import './App.css'; import { fade, makeStyles } from '@material-ui/co ...

Does using .stopImmediatePropagation() in the click event of a menu item have any impact on analytical tools?

Scenario I've implemented a navigation menu that loads subpages into a div using AJAX. While everything seems to be working fine, I noticed a bug where if I navigate to a subpage, then return to the main page and revisit the same subpage, it gets loa ...

Could offering a Promise as a module's export be considered a legitimate approach for asynchronous initialization in a Node.js environment?

I am in the process of developing modules that will load data once and create an interface for accessing that data. I am interested in implementing asynchronous loading of the data, especially since my application already utilizes promises. Is it considere ...

Using Vue.js to dynamically append router links with JavaScript

let link = `<router-link :to="{name : 'profile' , params : { slug : ${response.data.nickname} }}"> <img src="${response.data.avatar}" class="card__image"> </router-link>`; $('body').appen ...

Dependency on the selection of items in the Bootstrap dropdown menu

I am currently struggling with a few tasks regarding Bootstrap Dropdown Selection and despite looking for information, I couldn't find anything helpful. Check out my dropdown menu Here are the functions I would like to implement: 1) I want the subm ...

Is there a method available for us to successfully deliver an email to the user who has been registered?

I am currently working on the registration page for my React app. One of the requirements is to send a confirmation email to the user's email address once they have registered. The user's account will only be confirmed once they click on the veri ...

The Jquery AJAX call is sending the data twice

Why is my AJAX request uploading my form data twice into the database? Here's the code for the AJAX function: function uploadProjects() { let projectName = $('#projectName').val(); let description = $('#description').val(); ...

Extracting raw data from the dojo.xhrGet request

When working with a JSP and servlet, I encountered an issue. In the JSP, I make an ajax call to the servlet which in turn calls a REST API to fetch JSON data. Using json.serialize(true);, I format the JSON data in the servlet before sending it to the front ...

Guide to setting a background image on a div when hovering using jQuery

I'm having trouble adding a background image to this specific div element: <div class="logo-main"></div> Here is the script I've been using, but it doesn't seem to be working as expected: <script type='text/javascript& ...

The mismatch between JSON schema validation for patternProperties and properties causes confusion

Here is the JSON schema I am working with: { "title": "JSON Schema for magazine subscription", "type": "object", "properties": { "lab": { "type": "string" } }, "patternProperties": { "[A-Za-z][A-Za-z_]*[A-Za-z]": { "type" ...

HTML: Dealing with issues in resizing and resolution with floating elements on the left and right using

Encountering some issues with the HTML code below when resizing the window: 1: The right bar suddenly drops down if the width is made too small. 2: The spacing between the content and the right bar expands as the window width increases. <style ty ...

Limiting maximum loading time for WebView in Android Java

Is there a way to display an error page in my WebView if a webpage takes longer than 5 seconds to load? ...

Change the printer orientation to landscape when using the WebBrowser Control

When trying to print an HTML file using the WebBrowser Control, I encountered an issue in which I needed to force the print to be in landscape mode. I attempted a solution involving setting the printer settings to landscape mode, but it did not work as exp ...

Saving JSON Data into my HTML document

Currently, I am in the process of learning about API's at school which means my knowledge of jQuery is quite limited. However, I have a specific task that involves making an API call and using the retrieved information. Recently, I conducted an ajax ...

Mastering the art of transitioning between DIV elements

I am looking to implement a rotating three-card display on click, and have come up with the following code: $('.box1').click(function(){ $('.box1').toggleClass('removeanimate'); $(this).toggleClass('go'); ...

Error occurred: Undefined module imported

CounterDisplay.js import React from 'react'; const CounterDisplay = <div> <h1>{this.state.counter}</h1> <button onClick={this.handleDecrement}>-</button> <button onClick={this.handleIncrement}>+ ...