Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Words in Time and Place: Exploring Language Through the Historical Thesaurus of the Oxford English Dictionary

By David Crystal

Offers a unique view of the English language and its development, and includes witty commentary and anecdotes along the way.


New from Cambridge University Press!

ad

Thesaurus of English Words and Phrases

By Peter Mark Roget

This book "supplies a vocabulary of English words and idiomatic phrases 'arranged … according to the ideas which they express'. The thesaurus, continually expanded and updated, has always remained in print, but this reissued first edition shows the impressive breadth of Roget's own knowledge and interests."


New from Brill!

ad

The Brill Dictionary of Ancient Greek

By Franco Montanari

Coming soon: The Brill Dictionary of Ancient Greek by Franco Montanari is the most comprehensive dictionary for Ancient Greek to English for the 21st Century. Order your copy now!


Academic Paper


Title: A novel string distance metric for ranking Persian respelling suggestions
Author: Omid Kashefi
Institution: Iran University of Science and Technology
Author: Mohsen Sharifi
Institution: Iran University of Science and Technology
Author: Behrooz Minaie
Institution: Iran University of Science and Technology
Linguistic Field: Computational Linguistics
Abstract: Spelling errors in digital documents are often caused by operational and cognitive mistakes, or by the lack of full knowledge about the language of the written documents. Computer-assisted solutions can help to detect and suggest replacements. In this paper, we present a new string distance metric for the Persian language to rank respelling suggestions of a misspelled Persian word by considering the effects of keyboard layout on typographical spelling errors as well as the homomorphic and homophonic aspects of words for orthographical misspellings. We also consider the misspellings caused by disregarded diacritics. Since the proposed string distance metric is custom-designed for the Persian language, we present the spelling aspects of the Persian language such as homomorphs, homophones, and diacritics. We then present our statistical analysis of a set of large Persian corpora to identify the causes and the types of Persian spelling errors. We show that the proposed string distance metric has a higher mean average precision and a higher mean reciprocal rank in ranking respelling candidates of Persian misspellings in comparison with other metrics such as the Hamming, Levenshtein, Damerau–Levenshtein, Wagner–Fischer, and Jaro–Winkler metrics.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 19, Issue 2, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page