This site is supported by donations to The OEIS Foundation.
Levenshtein distance
In information theory and computer science, the Levenshtein distance is either
- a string metric for measuring the amount of difference between two strings, or
- a tree metric for measuring the amount of difference between two trees.
It is named after Vladimir Levenshtein, who considered this distance in 1965.[1] The term edit distance is often used to refer specifically to Levenshtein distance.
Contents
Levenshtein distance between two strings
The Levenshtein distance between two strings (or string edit distance) is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being
- insertion (of a single character),
- deletion (of a single character), or
- substitution (of a single character).
Examples (strings)
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of 's' for 'k')
- sitten → sittin (substitution of 'i' for 'e')
- sittin → sitting (insertion of 'g' at the end).
For example, the Levenshtein distance between "{2, 3, 5, 8, 13}" and "{2, 3, 5, 7, 11}" is 2, since the following two edits change one into the other, and there is no way to do it with fewer than two edits:
- {2, 3, 5, 8, 13} → {2, 3, 5, 7, 13} (substitution of '7' for '8')
- {2, 3, 5, 7, 13} → {2, 3, 5, 7, 11} (substitution of '1' for '3')
Levenshtein distance between two trees
The Levenshtein distance between two trees (or tree edit distance) is defined as the minimum number of edits needed to transform one tree into the other, with the allowable edit operations being
- insertion (of a single node),
- deletion (of a single node), or
- substitution (of a single node).
Notes
- ↑ В.И. Левенштейн (1965). “Двоичные коды с исправлением выпадений, вставок и замещений символов”. Доклады Академий Наук СCCP 163 (4): pp. 845–8. Appeared in English as: Levenshtein VI (1966). “Binary codes capable of correcting deletions, insertions, and reversals”. Soviet Physics Doklady 10: pp. 707–10 .
External links
- Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "Levenshtein distance," in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 14 August 2008. (accessed 23 July 2012) Available from: http://www.nist.gov/dads/HTML/Levenshtein.html
- David Camacho Fernández, String Distance and Record Linkage.
- Dynamic Programming Algorithm (DPA) for Edit-Distance, © L. Allison.
- ALEXANDR ANDONI AND KRZYSZTOF ONAK, APPROXIMATING EDIT DISTANCE IN NEAR-LINEAR TIME.
- Levenshtein distance—Wikipedia.org.