Discovering the complexity of achieving optimal performance and functionality in HTML, a supposedly declarative markup language, proved to be challenging. Below, I have outlined my discoveries after a month of extensive testing and experimentation.
To begin replacing existing text efficiently, we will utilize TreeWalker
to iterate through every Text
node in the document and process their content. In this demonstration, we will censor "heck" with "h*ck".
const callback = text => text.replaceAll(/heck/gi, 'h*ck');
function processNodes(root) {
const nodes = document.createTreeWalker(
root, NodeFilter.SHOW_TEXT, { acceptNode:
node => valid(node) ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
});
while (nodes.nextNode()) {
nodes.currentNode.nodeValue = callback(nodes.currentNode.nodeValue);
}
}
function valid(node) {
return (
node.parentNode !== null
&& node.parentNode.tagName !== 'SCRIPT'
&& node.parentNode.tagName !== 'STYLE'
&& !node.parentNode.isContentEditable
);
}
processNodes(document.body);
Take note of the valid
function. This handles three specific cases:
- We need to verify that the parent node exists as the node may be removed from the document before processing
- Modifying
<script>
and <style>
tags could disrupt functionality or design
- Editing a
contenteditable
element resets the cursor position, leading to a poor user experience
However, the above only covers previously displayed text. To monitor future changes, we can employ MutationObserver
to detect newly added or modified text nodes.
const IGNORED = [
Node.CDATA_SECTION_NODE,
Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE,
];
const CONFIG = {subtree: true, childList: true, characterData: true};
const observer = new MutationObserver((mutations, observer) => {
observer.disconnect();
for (const mutation of mutations) {
const target = mutation.target;
switch (mutation.type) {
case 'childList':
for (const node of mutation.addedNodes) {
if (node.nodeType === Node.TEXT_NODE) {
if (valid(node)) {
node.nodeValue = callback(node.nodeValue);
}
} else if (!IGNORED.includes(node.nodeType)) {
processNodes(node);
}
}
break;
case 'characterData':
if (!IGNORED.includes(target.nodeType) && valid(target)) {
target.nodeValue = callback(target.nodeValue);
}
break;
}
}
observer.observe(document.body, CONFIG);
});
observer.observe(document.body, CONFIG);
The observer's callback features two main sections: one for childList
handling new subtrees and text nodes, and another for characterData
managing modified text nodes. The observer must be disabled before making any edits to prevent an infinite loop. Additionally, note the IGNORED
array, essential for excluding certain non-user-visible nodes falling under the Text
interface.
Combining these methods should cover most scenarios effectively. However, several special cases remain unaddressed:
An in-depth explanation of workarounds for these issues is beyond the scope of this StackOverflow response. However, I have developed a free library called TextObserver designed to address them.