Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Latin: A Linguistic Introduction

By Renato Oniga and Norma Shifano

Applies the principles of contemporary linguistics to the study of Latin and provides clear explanations of grammatical rules alongside diagrams to illustrate complex structures.


New from Cambridge University Press!

ad

The Ancient Language, and the Dialect of Cornwall, with an Enlarged Glossary of Cornish Provincial Words

By Frederick W.P. Jago

Containing around 3,700 dialect words from both Cornish and English,, this glossary was published in 1882 by Frederick W. P. Jago (1817–92) in an effort to describe and preserve the dialect as it too declined and it is an invaluable record of a disappearing dialect and way of life.


New from Brill!

ad

Linguistic Bibliography for the Year 2013

The Linguistic Bibliography is by far the most comprehensive bibliographic reference work in the field. This volume contains up-to-date and extensive indexes of names, languages, and subjects.


Academic Paper


Title: Statistical Translation After Source Reordering: Oracles, Context-Aware Models, and Empirical Analysis
Author: Maxim Khalilov
Email: click here to access email
Homepage: http://staff.science.uva.nl/~khalilov/index.html
Institution: University of Amsterdam
Author: Khalil Sima'an
Institution: University of Amsterdam
Linguistic Field: Computational Linguistics
Abstract: In source reordering the order of the source words is permuted to minimize word order differences with the target sentence and then fed to a translation model. Earlier work highlights the benefits of resolving long-distance reorderings as a pre-processing step to standard phrase-based models. However, the potential performance improvement of source reordering and its impact on the components of the subsequent translation model remain unexplored. In this paper we study both aspects of source reordering. We set up idealized source reordering (oracle) models with/without syntax and present our own syntax-driven model of source reordering. The latter is a statistical model of inversion transduction grammar (ITG)-like tree transductions manipulating a syntactic parse and working with novel conditional reordering parameters. Having set up the models, we report translation experiments showing significant improvement on three language pairs, and contribute an extensive analysis of the impact of source reordering (both oracle and model) on the translation model regarding the quality of its input, phrase-table, and output. Our experiments show that oracle source reordering has untapped potential in improving translation system output. Besides solving difficult reorderings, we find that source reordering creates more monotone parallel training data at the back-end, leading to significantly larger phrase tables with higher coverage of phrase types in unseen data. Unfortunately, this nice property does not carry over to tree-constrained source reordering. Our analysis shows that, from the string-level perspective, tree-constrained reordering might selectively permute word order, leading to larger phrase tables but without increase in phrase coverage in unseen data.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 18, Issue 4, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page