html-to-content
ai-research-agent / extractor/html-to-content/html-to-content
Functions
extractContentAndCite()
function extractContentAndCite(documentOrHTML, options): Object
Extracts the main content and citation information from a document or HTML string
Parameters
Parameter | Type | Description |
---|---|---|
|
| The document or HTML string to extract content from |
| { | Optional configuration options |
|
| default=true - Whether to preserve formatting in the extracted content |
|
| default=true - Whether to include images in the extracted content |
|
| default=true - Whether to include links in the extracted content |
|
| The URL of the original document, if available, for absolutify-ing URLs |
|
| default=false - false uses Mozilla Readability, true uses Postlight Mercury. then use the alternate if the first returns less than 200 characters |
Returns
Object
The extracted content and citation information
Author
Interfaces
ExtractedContent
Properties
author
author: string;
The author's name
author_cite
author_cite: string;
The full citation for the author
author_short
author_short: string;
A shortened version of the author's name
date
date: string;
The publication date
html
html: string;
The extracted main content in HTML format
source
source: string;
The source of the content
title
title: string;
The title of the content