Function extract

extract(urlOrDoc, options?): Article
🚜📜 Tractor the Text Extractor
1. Extract URL or HTML to main content, based on Readability with improved version using 100+ custom adapters for major websites.
2. Strips to basic HTML for reading mode or saving research notes.
3. Youtube - get full transcript for video if detected a youtube video.
4. PDF - Extracts formatted text from PDF with parsing of headings, page headers, footnotes, and adding linebreaks based on standard deviation of range text height.
Parameters
- urlOrDoc: string | Document
  url or dom object with article content
- Optionaloptions: {
      images: boolean;
      links: boolean;
      formatting: boolean;
      absoluteURLs: boolean;
      timeout: number;
  } = {}
  - images: boolean
    default=true - include images
  - links: boolean
    default=true - include links
  - formatting: boolean
    default=true - preserve formatting
  - absoluteURLs: boolean
    default=true - convert URLs to absolute
  - timeout: number
    default=5 - http request timeout
Returns Article
- object containing url, html, author, date, title, source
Author
Gulakov, A. (2024)
- Defined in extractor/url-to-content/url-to-content.js:43