Function weighRelevanceTermFrequency

  • Calculate term specificity for a single doc with BM25 formula by using Wikipedia term frequencies as the baseline IDF.
    ritvikmath (2023). "BM25 : The Most Important Text Metric in Data Science". https://www.youtube.com/watch?v=ruBm9WywevM

    Parameters

    • document: string

      a single document to calculate the score for

    • query: string

      phrase to search tf and idf for each word

    • Optionaloptions: {
          saturationWeight: number;
          normalizeLength: number;
          avgDocWordCount: number;
          totalWikiPages: number;
      } = {}
      • saturationWeight: number

        saturationWeight controls the impact of term frequency saturation. It typically ranges from 1.2 to 2.0, with 1.5 being a common default value. As saturationWeight increases: The impact of term frequency increases (i.e., multiple occurrences of a term in a document become more significant).

      • normalizeLength: number

        normalizeLengthcontrols the document length normalization. It ranges from 0 to 1, with 0.75 being a common default value. When normalizeLength=1: Full length normalization is applied. Longer documents are penalized more heavily.

      • avgDocWordCount: number

        Estimated average word count of all documents

      • totalWikiPages: number

        Total number of Wikipedia pages used to calculate IDF

    Returns number

    score for term specificity