Skip to main content

html-to-content

Documentation / extractor/html-to-content/html-to-content

ExtractedContent​

Defined in: extractor/html-to-content/html-to-content.js:83

Properties​

PropertyTypeDescriptionDefined in

author

string

The author's name

extractor/html-to-content/html-to-content.js:87

author_cite

string

The full citation for the author

extractor/html-to-content/html-to-content.js:85

author_short

string

A shortened version of the author's name

extractor/html-to-content/html-to-content.js:86

date

string

The publication date

extractor/html-to-content/html-to-content.js:88

html

string

The extracted main content in HTML format

extractor/html-to-content/html-to-content.js:90

source

string

The source of the content

extractor/html-to-content/html-to-content.js:89

title

string

The title of the content

extractor/html-to-content/html-to-content.js:84


extractContentAndCite()​

function extractContentAndCite(documentOrHTML: any, options: object): any;

Defined in: extractor/html-to-content/html-to-content.js:30

Extracts the main content and citation information from a document or HTML string

Parameters​

ParameterTypeDescription

documentOrHTML

any

The document or HTML string to extract content from

options

{ formatting: boolean; images: boolean; links: boolean; url: string; useExtractor2: boolean; }

Optional configuration options

options.formatting

boolean

default=true - Whether to preserve formatting in the extracted content

options.images

boolean

default=true - Whether to include images in the extracted content

options.links

boolean

default=true - Whether to include links in the extracted content

options.url

string

The URL of the original document, if available, for absolutify-ing URLs

options.useExtractor2

boolean

default=false - false uses Mozilla Readability, true uses Postlight Mercury. then use the alternate if the first returns less than 200 characters

Returns​

any

The extracted content and citation information

Author​

ai-research-agent (2024)