Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

The Vulgar Tongue: Green's History of Slang

By Jonathon Green

A comprehensive history of slang in the English speaking world by its leading lexicographer.


New from Cambridge University Press!

ad

The Universal Structure of Categories: Towards a Formal Typology

By Martina Wiltschko

This book presents a new theory of grammatical categories - the Universal Spine Hypothesis - and reinforces generative notions of Universal Grammar while accommodating insights from linguistic typology.


New from Brill!

ad

Brill's MyBook Program

Do you have access to Dynamics of Morphological Productivity through your library? Then you can by the paperback for only €25 or $25! Find out more about Brill's MyBook program!


Academic Paper


Title: Extraction of multi-word expressions from small parallel corpora
Author: Yulia Tsvetkov
Institution: Language Technologies Institute Carnegie Mellon University
Author: Shuly Wintner
Institution: University of Haifa
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 18, Issue 4, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page