Parsing HTML files with Perl to extract XML data

Can an XML output be generated from a webpage using web::scraper in Perl? Here is an example of HTML taken from a URL:

> <table class="reference">
    >     <tr>
    >     <th width="23%" align="left">Property</th>
    >     <th width="71%" align="left">Description</th>
    >     <th style="text-align:center;">DOM</th>
    >     </tr>
    >     <tr>
    // more HTML code

The Perl code snippet used to scrape this data is as follows:

    #!/usr/bin/perl
    use warnings;
    use strict;
    use URI;
    use Web::Scraper;

    my $urlToScrape = "http://www.w3schools.com/jsref/dom_obj_node.asp";

    my $rennersdata = scraper {
    process "table.reference > tr > td > a", 'renners[]' => 'TEXT';
    // more processing code
       };

    my $res = $teamsdata->scrape(URI->new($urlToScrape));

// more code-snippet

The current output obtained is structured like this:

<PropertyList>
<Property>
<Name>attributes</Name>
// more output
<ReturnValue>
Returns a collection of a node's attributes
</ReturnValue>
....

Desired output format:

<PropertyList>
<Property>
<Name>attributes</Name>
<ReturnValue>Returns a collection of a node's attributes</ReturnValue>
<DOMVersion>1</DOMVersion>
</Property> 
</PropertyList>

Please suggest how the for loops can be combined to achieve the desired output structure.

Thank you!

Answer №1

If you want to access all the individual items in each of the three keys in $res, make sure to move your output into the first for loop. By utilizing the $i variable, which represents the number of items in each key, you can easily retrieve the values that correspond to one another during each iteration.

for my $i (0 .. $#{$res->{renners}}) {
  print <<"XML";
<PropertyList>
  <Property>
    <Name>$res->{renners}[$i]</Name>
    <ReturnValue>$res->{landrenner}[$i]</ReturnValue>
    <domversion>$res->{dom}[$i]</domversion>
  </Property>
</PropertyList>
XML
}

To enhance readability, I switched the print statements to use a HERE doc. Additionally, I rectified an issue by changing the line from

my $res = $teamsdata->scrape(URI->new($urlToScrape));
to
my $res = $rennersdata->scrape(URI->new($urlToScrape));
since it seems that $teamsdata was not properly declared.

Answer №2

While this may not be the exact solution you had in mind, I recommend checking out HTML::Element. It offers an as_XML method that allows you to easily convert HTML into XML format. Hope this helps.

Answer №3

Give this a try: Reorder the print statements like so:

print "<PropertyList>\n";   
for my $i (0 .. $#{$res->{renners}}) {    
    print "<Property>\n";
    print "<Name> ";
        print $res->{renners}[$i]; print "\n";
    print "</Name>"; print "\n";

    for my $j (0 .. $#{$res->{landrenner}}) {
        print "<ReturnValue>\n";
            print $res->{landrenner}[$j];print "\n";
        print "</ReturnValue>\n";
    }
    for my $k (0 .. $#{$res->{dom}}) {
        print "<domversion>\n";
            print $res->{dom}[$k];print "\n";
        print "</domversion>\n";
    }   
    print "</Property>\n";
}
print "</PropertyList>\n";

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What is causing my jQuery to only impact the initial item in my iron-list?

How can I create a toggle effect for the left border of an iron-list entry in Polymer when clicking on it? I found some jQuery code that adds the border to the first entry clicked, but I need help extending this functionality to apply to all list entries ...

Removing classes from multiple cached selectors can be achieved by using the .removeClass

Currently working on enhancing some JavaScript/jQuery code by storing selectors in variables when they are used more than once to improve performance, for example: var element = $("#element"); element.hide(); However, I am facing difficulties while tryin ...

Is there a way to organize a specific column based on user selection from a drop down menu on the client side?

When selecting Timestamp from the drop-down menu, I expect the TimeStamp to be sorted in either ascending or descending order. Similarly, if I choose Host, the Host should be sorted accordingly. Despite using the Tablesorter plugin, sorting doesn't s ...

Implementing the Upload Feature using AngularJS

Currently, I'm facing a challenge in implementing an upload button on my webpage using AngularJS and Bootstrap. Specifically, I am having trouble assigning the (upload) function to that button in AngularJS. The goal is for the button to enable users t ...

How to center an item in a Bootstrap navbar without affecting the alignment of other right-aligned items

Currently, I am in the process of designing a website and have implemented a Bootstrap navbar. The navbar consists of three icons aligned to the right, and I am looking to center another element. However, when I use mx-auto, the element does not center per ...

disableDefault not functioning properly post-fadeout, subsequent loading, and fade-in of new content with a different URL into

After extensive searching, I still haven't found a solution to my problem. My task involves bringing in HTML pages into a div element. I managed to make the content fade out, load new href content, and then fade in the new content. However, I'm ...

Disable and grey out the button while waiting for the Observable to broadcast successfully

component.html <button mat-raised-button color="primary" type="submit"> <mat-icon>account_box</mat-icon> <span *ngIf="!loading">&nbsp;&nbsp;&nbsp;Register</span> <span * ...

using jQuery to show a block element

I'm attempting to determine whether a div with the style display as block then perform an action. Here is an example: This is just an idea I'm trying to implement using jQuery. if ($("#toshow").css("display") == "block"){ }else{ } ...

Tips for creating visually appealing tables in ReactJS

For my school project, I need to create a table using ReactJs. I have already created the table but it needs improvement in terms of design. However, I am unsure how to modify my code to achieve the desired look. I tried looking on YouTube for tutorials, b ...

Utilizing AJAX and JavaScript to generate a table using the AJAX response and placing it within a <div> element

I am currently passing the response of this action in text form, but I would like to display it in a table format. Is there a way to do this? function loadAditivos(){ $('#aditivoAbertoInformacoesTexto').html('<div id="loaderMaior ...

Is Django_compressor concealing the CSS code?

I have implemented django_compressor in my project. This is how I set it up in my template : {% load compress %} {% compress css %} <link rel="stylesheet" href="/media/css/master.css" type="text/css" charset="utf-8"> {% endcompress %} In my setti ...

The quantity of elements stays constant even after adding to them using Javascript

I have implemented a notes list feature that allows users to add notes using an input field. Beneath the notes list ul, there is a message showing the total number of notes: $('li').length. However, I encountered an issue where the count of not ...

Div with headers that stick to the top and stack when scrolling

I am working on creating a lengthy scrollable list of grouped items where the group titles remain visible at all times (stacked). When a user clicks on a group header, the page should scroll to the corresponding items. I have successfully used the positio ...

Storing information on the webpage when it is refreshed

My goal is to maintain the order of the data in the target ordered list even after a page refresh, achieved through jQuery prepend on document ready. Here's the code snippet: // when a refresh event occurs window.onbeforeunload = function(event){ ...

Dynamically size and position elements to fit perfectly within the container

I am currently facing a challenge with my absolutely positioned elements that have different position.top and height values generated from the database. The main goal is to resolve collisions between these elements by shifting them to the right while adju ...

Responsive Text and Alignment in the Latest Bootstrap 5

My goal is to center my div element while keeping the text aligned to the left. Here's the code I have: <div class="container"> <div class="row"> <div class="col"> <h1>< ...

How can I activate a radio button by its corresponding label using Selenium?

After diving into the world of automation with Selenium in C#, I find myself faced with a challenge. I need to automate clicking on a specific radio button on a website based on the text of the associated label. What makes it even more interesting is that ...

Conceal the initial element featuring an image

Given that the post-body includes various elements, such as links and divs, I am interested in concealing only the first child element that contains an image. <div class="post-body"> <div class="separator"><img/></div> </div&g ...

Implementing nested popup windows using Bootstrap

I am facing an issue with my two-page sign-in and sign-up setup in the header of my angular2 project. On the sign-up page, I have a link that should redirect to the sign-in page when clicked, but I am struggling to make it work. Can someone provide guidanc ...

Combining td elements within a table

Currently, I am working on creating a weekly calendar using a combination of JavaScript and PHP to interact with an SQL table. The process involves generating an empty table structure in JavaScript and then populating specific cells with data retrieved fro ...