Skip to main content

compare-letters

Documentation / match/compare-letters

Match​

weighSimilarityByCharacter()​

function weighSimilarityByCharacter(s1: string, s2: string): number;

Defined in: match/compare-letters.js:28

Jaro-Winkler String Similarity Comparison​

Measures similarity between two strings, taking into account the common characters and their positions. Jaro-Winkler is often used in record linkage and data cleansing to improve the accuracy of string matching, particularly for names and addresses, by giving more weight to the common prefix and penalizing longer string differences. It is more optimal for words than Levenshtein distance:

  1. Edit operations: Levenshtein considers insertions, deletions, and substitutions, while Jaro focuses on transpositions.
  2. Sensitivity to string length: Levenshtein is more sensitive to overall string length, while Jaro normalizes for length in its formula.
  3. Prefix matching: The Jaro-Winkler variant explicitly rewards matching prefixes, which Levenshtein does not.
  4. Scale of results: Levenshtein produces an edit distance (usually converted to a similarity score), while Jaro directly produces a similarity score.

A Comprehensive List of Similarity Search Algorithms

Parameters​

ParameterTypeDescription

s1

string

First string

s2

string

Second string

Returns​

number

0-1 string similarity score

Author​

Jaro, M., Winkler, W. (1990)