pdf-to-html
ai-research-agent / extractor/pdf-to-html/pdf-to-html
Extract
convertPDFToHTML()
function convertPDFToHTML(pdfURLOrBuffer, options?): string | Object
Convert PDF to HTML
Extracts formatted text from PDF with parsing of linebreaks , page headers, footnotes, and section headings. Supports fonts, links, bold, italics, lists, headings, headers, footnotes, and Table of Contents, Quotes, and Code Blocks, . Removes repeated headers, links footnote anchors to the footnote, and preserves number of the PDF page with invisible I element.
This function uses pdfjs-serverless to work in more environments than PDF.js-based tools: Cloudflare workers, serverless, node.js, and front-end only.
Parameters
Parameter | Type | Description |
---|---|---|
|
| URL to a PDF file or buffer from fs.readFile |
| { | |
|
| default=false - Adds # to end of each page |
|
| default=true - Removes repeated headers found on each page |
Returns
string
| Object
HTML formatted text
Author
ai-research-agent (2024), pdf-to-markdown (2017), pdf.js (2012-),