Creating an interactive and dynamic Table of Contents with Puppeteer for PDF documents

After researching extensively for a solution, most responses suggest that it is not possible. However, I stumbled upon Paged.js which appears to use the CSS code

a::after { content: target-counter(attr(href), page, decimal); }
effectively filling in the correct page numbers for the Table of Contents. This makes me wonder if they have a CSS polyfill within their library to address this issue. It seems unlikely that browsers support target-counter and ::after directly. Hence, I contemplated finding a CSS polyfill to enable target-counter and page-break-after: avoid, as demonstrated by paged.js.

An alternate approach I considered involves using a PDF parser to analyze all data in the PDF file, implementing regex patterns, loops, and other methods to identify page numbers of specific elements. Perhaps the parser's output could be stored in a JSON file accessible from the Table of Contents. Nonetheless, this method seems time-consuming, especially since another merge operation would be required on the PDF (already done once for the front page).

To elaborate further, I am interested in exploring whether either or both of these solutions are feasible. If so, any guidance on A. obtaining a polyfill for the essential CSS tags, or B. structuring and organizing data retrieved from the PDF parser, would be highly appreciated.

Answer №1

If you're looking for a starting point for option B, the script below could be really helpful.

const fs = require('fs');
const pdf = require('pdf-parse');
const { Readable } = require("stream");
let dataBuffer = fs.readFileSync('./generated.pdf');
pdf(dataBuffer).then(function(data) {
    let toc ={}, page;
    const pagePattern = /Page [0-9]+\/[0-9]+/;
    const topicPattern = /Title: [A-Za-z 0-9]+/;
    const lines = data.text.split('\n');
    lines.forEach((chunk, i, lines) => {
        if(chunk.match(pagePattern)) {
            page = chunk
        }
        if(chunk.match(topicPattern) && !toc[chunk]) {
            toc[chunk] = page
        }
    });
    console.log(toc); // Utilize this object to populate your table of content
});

This information may prove valuable to someone in need.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What is the best way to establish the routes properly?

Below is a code snippet for reference: var x = require('./folder/usefile'); var Auth = passport.authenticate('jwt', { session: false }); module.exports = function(app){ console.log('inside function(app)'); / ...

Is it possible to add more data to additional fields in a mongoose document after it has already been loaded?

After loading a document, I want to populate additional fields. In the ecommerce application I'm building, I load my cart on all routes using the following code: app.use(function(req, res, next) { Cart.findOne({session: req.cookies['express:s ...

Issue with sending both a string field and an image in one request when using Express, Node.js, and MySQL

I am trying to use multer in my project to upload images along with string data. It seems to work fine when I only upload the image using Postman or React, but when I try to include strings as well, I encounter the following error: undefined TypeError: Can ...

Using Azure Functions in JavaScript to retrieve an image from a blob storage and convert it to a base64

Seeking assistance with an Azure function I'm developing in Node.js. The objective is to retrieve an image from a blob and convert it into a base64 string. However, there seems to be an issue as my function gets stuck when I use toString('base64& ...

Challenges with HTML and CSS drop-down menus

My navigation is giving me a headache. I've been trying to add a dropdown menu, but nothing seems to be working. Here's my current HTML/CSS code. If you have any suggestions on how to fix it, please lend a hand! I've also included a picture ...

Experiencing a problem while trying to connect Sails JS with MySQL using Node v7.5.0

error: The 'orm' hook failed to load due to an unexpected error. Error (E_UNKNOWN) :: Encountered an unexpected error: Could not establish a connection with MySQL: Error: connect ECONNREFUSED 127.0.0.1:3306 at afterwards (/root/sails/sailsUs ...

Node.js connection is limited to the localhost environment

Recently, I developed a compact Node.js application utilizing connect that not only serves up a webpage but also sends periodic updates while recording user observations to a disk file. Although it works flawlessly on localhost, I encountered difficulty e ...

Using JQuery to SlideUp with a Background Color Fading Behind an Image

I'm currently utilizing JQuery slideUp/slideDown functionality to create an overlay effect on an image. This overlay is initially hidden, only appearing when the mouse hovers over the image and sliding up from the bottom. The issue I'm facing is ...

Express script in Node runs into issues with require() not functioning properly

I've been following a PERN stack tutorial on YouTube, and I've encountered an issue with the require function in my server application's index file. My setup involves Node v 17.7.2 and Express 4.17. Here is the snippet of code causing the p ...

Is it possible to update the CSS file of an external SVG file in real-time?

Is there a way for an SVG image to reference another CSS file? A webpage contains an SVG file. A button allows users to switch between classic colors and high contrast mode on the entire webpage, including the SVG image. Attempt w.css (white backgrou ...

What is the best way to manage json-parse errors in a node.js environment?

After countless hours of research, I am still unable to find a solution to my seemingly simple and common problem: In Node.js using Express, I want to retrieve JSON-data via http-POST. To simplify the process, I intend to utilize the app.use(express.json( ...

The occurrence of "Error [ERR_STREAM_WRITE_AFTER_END]" was noted when trying to write to an HTTP server in

How to set up a new http server using Node.js After launching the server, the initial HTML text is displayed correctly. However, moving to other links in the code (e.g., localhost:5001/about) results in an error appearing in the IDE console. events.js:377 ...

The custom configuration file for SailsJS does not seem to be loading properly

After following the documentation and advice on StackOverflow, I attempted to load custom configuration for my application. However, I faced a failure. I went ahead and created a new file named /config/application.js In this file, I added the followi ...

Python script: Looping through a folder and saving the content to individual text documents

I am currently working on a project that involves iterating through a directory of PDF files. My goal is to convert each PDF into a JPEG and then into a text file. While I have successfully been able to iterate through the directory, converting each PDF ...

Eliminate any excessive space located at the bottom of the table

How can we adjust the spacing at the bottom of the table to remove extra space? Currently, it looks like this: https://i.stack.imgur.com/DwkbG.png We want it to look like this: https://i.stack.imgur.com/extDP.png CSS .wk_mp_body td { background: ...

Using jQuery to toggle an open and close button

My icon functions as an open/close button. When opened, it turns red and rotates 45 degrees. The issue arises when trying to close it. Changing the div class causes the icon to disappear while in its active state. Below is the jQuery code I am using: $(". ...

Writing with Node.js streams, reading content, then writing again

I am attempting to perform a series of operations on an image file. First, I want to write the image from an external source to a file. Next, I need to read the image from that file and apply some updated styling to it. Here is an example code snippet: va ...

The text in the image caption has been drastically reduced along with the size of the image

Currently, I am dealing with the following code: <div style="min-width: 1000px;"> <p> Some text </p> <div class='div_img_left'> <img src="images/sth.png" alt="missing image" style="max-width ...

Having trouble installing npm module on docker machine

As a Docker beginner, I encountered an issue on Windows. When I run npm i on my docker machine, I receive the following error: panel_1 | > <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b0c0d8d1dec4dfddc39dc0c2d5d2c5d9dcc4 ...

What could be causing the undefined value for 'message' in relation to connect-flash?

Recently, I've been working on an express app that involves a form submission process. Upon successful completion of the form, users should receive a confirmation alert. Within my server.js file, I have set up the following: var cookieParser = requi ...