Function extractCite

  • 📚💎 Extract Expert Excerpt

    Extract author, date, source, and title from HTML using meta tags and common class names. Validates human name from author string to check against common list of 90k first names, last names,and organizations to infer if it should be reversed starting by author last name (accounting for affixes/titles), since organizations are not reversed.

    Article Extraction Benchmark

    Parameters

    • document: Document

      dom object or html string with article content

    Returns {
        author: string;
        author_cite: string;
        author_short: string;
        date: string;
        title: string;
        source: string;
    }

    An object containing extracted citation information.

    • author: string
    • author_cite: string
    • author_short: string
    • date: string
    • title: string
    • source: string