Skip to main content

ngrams

ai-research-agent / topics/ngrams

Functions

extractNounEdgeGrams()

function extractNounEdgeGrams(
nGramSize,
terms,
index,
nGrams,
minWordLength,
sentenceNumber): object

Extracts noun-based edge grams from a given set of terms. This function is crucial for identifying important multi-word concepts in the text.

The function looks for sequences of words (n-grams) that:

  1. Start and end with a noun
  2. Contain words that are either nouns or common ignored words (like articles or prepositions)
  3. Meet the minimum word length requirement

Parameters

ParameterTypeDescription

nGramSize

number

The size of the n-grams to extract. For example, 2 for bigrams, 3 for trigrams, etc.

terms

(string | number)[][]

Array of terms, where each term is an array containing the word and its part of speech tag. Example: [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3]]

index

number

The starting index in the terms array to begin extraction. This allows for sliding window extraction.

nGrams

{}

Object to store the extracted n-grams.

minWordLength

number

The minimum length a word should have to be considered in the n-gram.

sentenceNumber

number

The current sentence number being processed. Used to track which sentences contain the n-gram.

Returns

object

The updated nGrams object with newly extracted n-grams.

Example

let terms = [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3], ["jumps", 4]];
let nGrams = {};
extractNounEdgeGrams(3, terms, 0, nGrams, 3, 1);
// nGrams might now contain: {3: {"brown fox jumps": [1]}}