Skip to main content

compare-letters

ai-research-agent / match/compare-letters

Match

weighSimilarityByCharacter()

function weighSimilarityByCharacter(s1, s2): number

Jaro-Winkler String Similarity Comparison

Measures similarity between two strings, taking into account the common characters and their positions. Jaro-Winkler is often used in record linkage and data cleansing to improve the accuracy of string matching, particularly for names and addresses, by giving more weight to the common prefix and penalizing longer string differences. It is more optimal for words than Levenshtein distance:

  1. Edit operations: Levenshtein considers insertions, deletions, and substitutions, while Jaro focuses on transpositions.
  2. Sensitivity to string length: Levenshtein is more sensitive to overall string length, while Jaro normalizes for length in its formula.
  3. Prefix matching: The Jaro-Winkler variant explicitly rewards matching prefixes, which Levenshtein does not.
  4. Scale of results: Levenshtein produces an edit distance (usually converted to a similarity score), while Jaro directly produces a similarity score.

A Comprehensive List of Similarity Search Algorithms

Parameters

ParameterTypeDescription

s1

string

First string

s2

string

Second string

Returns

number

0-1 string similarity score

Author

Jaro, M., Winkler, W. (1990)