compare-letters
ai-research-agent / match/compare-letters
Match
weighSimilarityByCharacter()
function weighSimilarityByCharacter(s1, s2): number
Jaro-Winkler String Similarity Comparison
Measures similarity between two strings, taking into account the common characters and their positions. Jaro-Winkler is often used in record linkage and data cleansing to improve the accuracy of string matching, particularly for names and addresses, by giving more weight to the common prefix and penalizing longer string differences. It is more optimal for words than Levenshtein distance:
- Edit operations: Levenshtein considers insertions, deletions, and substitutions, while Jaro focuses on transpositions.
- Sensitivity to string length: Levenshtein is more sensitive to overall string length, while Jaro normalizes for length in its formula.
- Prefix matching: The Jaro-Winkler variant explicitly rewards matching prefixes, which Levenshtein does not.
- Scale of results: Levenshtein produces an edit distance (usually converted to a similarity score), while Jaro directly produces a similarity score.
A Comprehensive List of Similarity Search Algorithms
Parameters
Parameter | Type | Description |
---|---|---|
|
| First string |
|
| Second string |
Returns
number
0-1 string similarity score