Function extractSEEKTOPIC

extractSEEKTOPIC(docText, options?): {
    topSentences: any[];
    keyphrases: any[];
    sentences: string[];
}
🔤📊 SEEKTOPIC: Summarization by Extracting Entities, Keyword Tokens, and Outline Phrases Important to Context
Extracts unique, domain-specific key phrases from a document using noun n-grams and ranks sentences based on their centrality to the most frequently referenced key phrase concepts, enabling efficient extraction of domain-specific content. This can be a first step to use key sentences or topics to vectorize or fit more docs into context limit and visualize them in vector space.
1. Sentence Segmentation: Split the text into sentences, accounting for common abbreviations, numbers, URLs, and other exceptions.
2. Tokenization and Phrase Extraction: Employ a Wiki Phrases tokenizer to identify wiki topics, phrases, and nouns. This includes spell-checking and root word verification using Porter Stemmer.
3. Noun N-gram Extraction: Generate noun edge-grams, allowing for stop words in the middle (e.g., "state of the art").
4. Key Phrase Consolidation: Merge smaller n-grams that are subsets of larger ones by comparing weights.
5. Domain Specificity Calculation: Determine named entities and phrase domain specificity using WikiIDF. This rewards unique key phrases specific to the document's field (e.g., "endocrinology" in medical texts or "thou shall" in religious texts).
6. Key Phrase Filtering: Select top key phrases based on a combination of frequency and word count.
7. Graph Construction: Create a double-ring weighted graph with key phrases in the central ring and sentences in the outer ring. Assign weights to links based on concept usage probability.
8. Sentence Weighting: Apply TextRank algorithm to weight sentences, identifying those that centralize and connect key phrase concepts most referenced by other sentences. This process, based on TextRank and PageRank, includes random surfing and jumping to avoid loops.
9. Top Results Selection: Select top sentences and key phrases based on overall weight and graph centrality, using either a fixed number or percentage for larger documents.
10. Output Generation: Return top sentences (with associated key phrases) and top key phrases (with associated sentences).
11. Dynamic Reranking: If a user interacts with a key phrase or if there's a search query leading to the document, compare query similarity to key phrases, heavily weight the most similar key phrase, and reapply TextRank from step 8.
Parameters
- docText: string
  input text to analyze
- Optionaloptions: {
      phrasesModel: any;
      maxWords: number;
      minWords: number;
      minWordLength: number;
      topKeyphrasesPercent: number;
      limitTopSentences: number;
      limitTopKeyphrases: number;
      minKeyPhraseLength: number;
      heavyWeightQuery: string;
  } = {}
  - phrasesModel: any
    phrases model
  - maxWords: number
    default=5 - maximum words in a keyphrase
  - minWords: number
    default=1 - minimum words in a keyphrase
  - minWordLength: number
    default=3 - minimum length of a word
  - topKeyphrasesPercent: number
    default=0.2 - percentage of top keyphrases to consider
  - limitTopSentences: number
    default=5 - maximum number of top sentences to return
  - limitTopKeyphrases: number
    default=10 - maximum number of top keyphrases to return
  - minKeyPhraseLength: number
    default=6 - minimum length of a keyphrase
  - heavyWeightQuery: string
    query to give heavy weight to
Returns {
    topSentences: any[];
    keyphrases: any[];
    sentences: string[];
}
- topSentences: any[]
- keyphrases: any[]
- sentences: string[]
Example
```
const result = extractSEEKTOPIC(testDoc, { phrasesModel, heavyWeightQuery: "self attention", limitTopSentences: 10});
  console.log(result.topSentences); // Array of top sentences with their keyphrases and weights
  console.log(result.keyphrases); // Array of top keyphrases with their weights and associated sentence indices
  console.log(result.sentences); // Array of all sentences in the input text
```
Author
ai-research-agent (2024)
- Defined in topics/seektopic-keyphrases.js:73

Function extractSEEKTOPIC

🔤📊 SEEKTOPIC: Summarization by Extracting Entities, Keyword Tokens, and Outline Phrases Important to Context

Parameters

phrasesModel: any

maxWords: number

minWords: number

minWordLength: number

topKeyphrasesPercent: number

limitTopSentences: number

limitTopKeyphrases: number

minKeyPhraseLength: number

heavyWeightQuery: string

Returns {
    topSentences: any[];
    keyphrases: any[];
    sentences: string[];
}

topSentences: any[]

keyphrases: any[]

sentences: string[]

Example

Author

Settings

On This Page

Function extractSEEKTOPIC

🔤📊 SEEKTOPIC: Summarization by Extracting Entities, Keyword Tokens, and Outline Phrases Important to Context

Parameters

phrasesModel: any

maxWords: number

minWords: number

minWordLength: number

topKeyphrasesPercent: number

limitTopSentences: number

limitTopKeyphrases: number

minKeyPhraseLength: number

heavyWeightQuery: string

Returns { topSentences: any[]; keyphrases: any[]; sentences: string[]; }

topSentences: any[]

keyphrases: any[]

sentences: string[]

Example

Author

Settings

On This Page

Returns {
topSentences: any[];
keyphrases: any[];
sentences: string[];
}