Is there a sophisticated method to retrieve the computed style for individual DOM nodes on multiple web pages, allowing for comparison of styling data between similar nodes across those pages?
I am currently dealing with over 500 HTML files that were created using MS FrontPage and contain poorly structured HTML. My goal is to extract the styling information and convert it into meaningful markup. Initially, I attempted to accomplish this using regex; however, as the complexity increased, I realized that parsing HTML with regex was not ideal. Now, I am exploring ways to have the browser parse the HTML and provide me with the computed styles for each node.
While I am aware that this can be achieved by accessing the DOM and utilizing JavaScript, my concern lies in the ability to only perform this task on one file at a time, making it challenging to compare data across multiple files. Additionally, transferring data from JavaScript to a file seems to pose limitations. Are there any alternative solutions available for this scenario?
(On a side note, attempts to use HTMLTidy have been unsuccessful due to the severely corrupted nature of the HTML files.)