pdf-to-html
Documentation / extractor/pdf-to-html/pdf-to-html
Extract​
convertPDFToHTML()​
function convertPDFToHTML(pdfURLOrBuffer: string, options?: object): any;
Defined in: extractor/pdf-to-html/pdf-to-html.js:46
Convert PDF to HTML​

Extracts formatted text from PDF with parsing of linebreaks , page headers, footnotes, and section headings. Supports fonts, links, bold, italics, lists, headings, headers, footnotes, and Table of Contents, Quotes, and Code Blocks, . Removes repeated headers, links footnote anchors to the footnote, and preserves number of the PDF page with invisible I element.
This function uses pdfjs-serverless to work in more environments than PDF.js-based tools: Cloudflare workers, serverless, node.js, and front-end only.
Parameters​
Parameter | Type | Description |
---|---|---|
|
| URL to a PDF file or buffer from fs.readFile |
| { | |
|
| default=false - Adds # to end of each page |
|
| default=true - Removes repeated headers found on each page |
Returns​
any
HTML formatted text
Author​
ai-research-agent (2024), pdf-to-markdown (2017), pdf.js (2012-),