Skip to main content

ngrams

Documentation / topics/ngrams

extractNounEdgeGrams()​

function extractNounEdgeGrams(
nGramSize: number,
terms: (string | number)[][],
index: number,
nGrams: object,
minWordLength: number,
sentenceNumber: number): object;

Defined in: topics/ngrams.js:27

Extracts noun-based edge grams from a given set of terms. This function is crucial for identifying important multi-word concepts in the text.

The function looks for sequences of words (n-grams) that:

  1. Start and end with a noun
  2. Contain words that are either nouns or common ignored words (like articles or prepositions)
  3. Meet the minimum word length requirement

Parameters​

ParameterTypeDescription

nGramSize

number

The size of the n-grams to extract. For example, 2 for bigrams, 3 for trigrams, etc.

terms

(string | number)[][]

Array of terms, where each term is an array containing the word and its part of speech tag. Example: [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3]]

index

number

The starting index in the terms array to begin extraction. This allows for sliding window extraction.

nGrams

{ }

Object to store the extracted n-grams.

minWordLength

number

The minimum length a word should have to be considered in the n-gram.

sentenceNumber

number

The current sentence number being processed. Used to track which sentences contain the n-gram.

Returns​

object

The updated nGrams object with newly extracted n-grams.

Example​

let terms = [["The", 1], ["quick", 2], ["brown", 2], ["fox", 3], ["jumps", 4]];
let nGrams = {};
extractNounEdgeGrams(3, terms, 0, nGrams, 3, 1);
// nGrams might now contain: {3: {"brown fox jumps": [1]}}