ngrams
ai-research-agent / topics/ngrams
Functions
extractNounEdgeGrams()
function extractNounEdgeGrams(
nGramSize,
terms,
index,
nGrams,
minWordLength,
sentenceNumber): object
Extracts noun-based edge grams from a given set of terms. This function is crucial for identifying important multi-word concepts in the text.
The function looks for sequences of words (n-grams) that:
- Start and end with a noun
- Contain words that are either nouns or common ignored words (like articles or prepositions)
- Meet the minimum word length requirement
Parameters
Parameter | Type | Description |
---|---|---|
|
| The size of the n-grams to extract. For example, 2 for bigrams, 3 for trigrams, etc. |
| ( | Array of terms, where each term is an array containing the word and its part of speech tag. Example: [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3]] |
|
| The starting index in the terms array to begin extraction. This allows for sliding window extraction. |
| {} | Object to store the extracted n-grams. |
|
| The minimum length a word should have to be considered in the n-gram. |
|
| The current sentence number being processed. Used to track which sentences contain the n-gram. |
Returns
object
The updated nGrams object with newly extracted n-grams.
Example
let terms = [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3], ["jumps", 4]];
let nGrams = {};
extractNounEdgeGrams(3, terms, 0, nGrams, 3, 1);
// nGrams might now contain: {3: {"brown fox jumps": [1]}}