Function extract

    1. Extract URL or HTML to main content, based on Readability with improved version using 100+ custom adapters for major websites.
    2. Strips to basic HTML for reading mode or saving research notes.
    3. Youtube - get full transcript for video if detected a youtube video.
    4. PDF - Extracts formatted text from PDF with parsing of headings, page headers, footnotes, and adding linebreaks based on standard deviation of range text height.


    Parameters

    • urlOrDoc: string | Document

      url or dom object with article content

    • Optionaloptions: {
          images: boolean;
          links: boolean;
          formatting: boolean;
          absoluteURLs: boolean;
          timeout: number;
      } = {}
      • images: boolean

        default=true - include images

      • links: boolean

        default=true - include links

      • formatting: boolean

        default=true - preserve formatting

      • absoluteURLs: boolean

        default=true - convert URLs to absolute

      • timeout: number

        default=5 - http request timeout

    Returns Article

    • object containing url, html, author, date, title, source