The HTML content to extract from.
Optional
opts: { The options for content extraction.
default=true - Remove elements that match non-article- like criteria first (e.g., elements with a classname of "comment").
default=true - Modify an element's score based on certain classNames or IDs (e.g., subtract if a node has a className of 'comment', add if a node has an ID of 'entry-content').
default=true - Clean the node to remove superfluous content like forms, ads, etc. Initially, pass in the most restrictive options which will return the highest quality content. On each failure, retry with slightly more lax options.
The extracted content as an HTML string, or null if extraction fails.
Based on Postlight Mercury Parser (2017-)
var url = "https://en.wikipedia.org/wiki/David_Hilbert"
var html = await (await fetch(url)).text();
var content = extractMainContentFromHTML(html);
console.log(content); // HTML content of main article body
HTML-to-Main-Content Extractor #2
Article Extraction Benchmark