Extracting information from CSS code

I am struggling to parse a RSS feed because all the information is contained within the description element. The CSS formatting makes it challenging to extract the actual strings. For example, below is a snippet of the description element:

<table style="border-collapse: collapse; border-spacing: 0; color:#493800; font-size: 11px; border:solid 1px #bababa;    margin: 10px;"><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Start Time</th><td style="padding:5px; margin:0; background:#fff;">21/11/2013 19:30 UTC</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Backup Job</th><td style="padding:5px; margin:0; background:#fff;">Backup</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Computer</th><td style="padding:5px; margin:0; background:#fff;">theComputer</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Disk</th><td style="padding:5px; margin:0; background:#fff;">theDisk</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Username</th><td style="padding:5px; margin:0; background:#fff;">theUsername</td></tr><tr><th style="padding:5px; background:#ddd; border:solid 1px #bababa; color:#493800; font-size: 10px;">Searched</th><td style="padding:5px; margin:0; background:#fff;">112306 (52.5 GB)</td></tr><tr><th style="padding:5px; background:#ddd; borde...

The CSS contains key value pairs such as Computer: Computername, Uploaded: Amountuploaded that I need to extract. I have attempted using HTML Agility Pack but encountered difficulties due to my limited proficiency in using it.

Any assistance on how to efficiently extract this data would be greatly appreciated. Thank you.

Answer №1

http://www.example.com/XML-Parsing-CSharp provides valuable information on parsing XML content using C#. It appears that utilizing .NET's Xml objects is a straightforward approach to parsing.

Familiarize yourself with .NET's Xml Document parsing by starting with this article as a reference point.

Converting the string into an XmlDocument can be achieved simply by executing:

// Code snippet for loading string into XML Document
string xTxt = "<table><tr><th>...</th><td>...</td></tr></table>";
XmlDocument doc = new XmlDocument();
doc.LoadXml("<?xml version=\"1.0\"?><root>" + xTxt + "</root>");
// End of code snippet

string extractedData = null;
XmlNodeList trNodes = doc.SelectNodes("//tr");
foreach (XmlNode node in trNodes)
    XmlNode thNode = node.SelectSingleNode("th");
    XmlNode tdNode = node.SelectSingleNode("td");
    extractedData += thNode.InnerText + ':';
    extractedData += tdNode.InnerText + Environment.NewLine;
txtInfo.AppendText("nodes.Count = " + nodes.Count + '\n');

Note that each data item you need is enclosed within a TR HTML element, with the item name in a TH element and its value in a TD element, making them easy to locate. The code snippet above retrieves all 10 'tr' elements in trNodes.

In the provided example, there is a TextBox named txtInfo utilized to display results. To enhance your implementation, consider avoiding storing results in a string variable. The usage of the t string variable is solely for illustrative purposes on how to transform items. The methods thNode.InnerText and tdNode.InnerText fetch each respective item.

You may opt to create a List of items or design a dedicated class with properties based on the data structure. Alternatively, consider creating a specialized class that handles this processing logic and integrate it into your project. Choose the approach that best suits your requirements. :)

Enjoy coding!

