ngrams
Documentation / topics/ngrams
extractNounEdgeGrams()​
function extractNounEdgeGrams(
nGramSize: number,
terms: (string | number)[][],
index: number,
nGrams: object,
minWordLength: number,
sentenceNumber: number): object;
Defined in: topics/ngrams.js:27
Extracts noun-based edge grams from a given set of terms. This function is crucial for identifying important multi-word concepts in the text.
The function looks for sequences of words (n-grams) that:
- Start and end with a noun
- Contain words that are either nouns or common ignored words (like articles or prepositions)
- Meet the minimum word length requirement
Parameters​
Parameter | Type | Description |
---|---|---|
|
| The size of the n-grams to extract. For example, 2 for bigrams, 3 for trigrams, etc. |
| ( | Array of terms, where each term is an array containing the word and its part of speech tag. Example: [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3]] |
|
| The starting index in the terms array to begin extraction. This allows for sliding window extraction. |
| { } | Object to store the extracted n-grams. |
|
| The minimum length a word should have to be considered in the n-gram. |
|
| The current sentence number being processed. Used to track which sentences contain the n-gram. |
Returns​
object
The updated nGrams object with newly extracted n-grams.
Example​
let terms = [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3], ["jumps", 4]];
let nGrams = {};
extractNounEdgeGrams(3, terms, 0, nGrams, 3, 1);
// nGrams might now contain: {3: {"brown fox jumps": [1]}}