Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

How Traditions Live and Die

By Olivier Morin

This book brings together cognitive science and quantitative cultural history to look into the causes of cultural survival.


New from Cambridge University Press!

ad

The Acquisition of Heritage Languages

By Silvina Montrul

"This work centres on the grammatical development of the heritage language and the language learning trajectory of heritage speakers, synthesizing recent experimental research."


Academic Paper


Title: Extraction of multi-word expressions from small parallel corpora
Author: Yulia Tsvetkov
Institution: Language Technologies Institute Carnegie Mellon University
Author: Shuly Wintner
Institution: University of Haifa
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 18, Issue 4, which you can READ on Cambridge's site or on LINGUIST .



Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page