Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

The Social Origins of Language

By Daniel Dor

Presents a new theoretical framework for the origins of human language and sets key issues in language evolution in their wider context within biological and cultural evolution


New from Cambridge University Press!

ad

Preposition Placement in English: A Usage-Based Approach

By Thomas Hoffmann

This is the first study that empirically investigates preposition placement across all clause types. The study compares first-language (British English) and second-language (Kenyan English) data and will therefore appeal to readers interested in world Englishes. Over 100 authentic corpus examples are discussed in the text, which will appeal to those who want to see 'real data'


New from Brill!

ad

Free Access 4 You

Free access to several Brill linguistics journals, such as Journal of Jewish Languages, Language Dynamics and Change, and Brill’s Annual of Afroasiatic Languages and Linguistics.


Academic Paper


Title: Comparing example-based and statistical machine translation
Author: Andy Way
Institution: Dublin City University
Author: Nano Gough
Institution: Dublin City University
Linguistic Field: Computational Linguistics
Abstract: In previous work (Gough and Way 2004), we showed that our Example-Based Machine Translation (EBMT) system improved with respect to both coverage and quality when seeded with increasing amounts of training data, so that it significantly outperformed the on-line MT system Logomedia according to a wide variety of automatic evaluation metrics. While it is perhaps unsurprising that system performance is correlated with the amount of training data, we address in this paper the question of whether a large-scale, robust EBMT system such as ours can outperform a Statistical Machine Translation (SMT) system. We obtained a large English-French translation memory from Sun Microsystems from which we randomly extracted a near 4K test set. The remaining data was split into three training sets, of roughly 50K, 100K and 200K sentence-pairs in order to measure the effect of increasing the size of the training data on the performance of the two systems. Our main observation is that contrary to perceived wisdom in the field, there appears to be little substance to the claim that SMT systems are guaranteed to outperform EBMT systems when confronted with 'enough' training data. Our tests on a 4.8 million word bitext indicate that while SMT appears to outperform our system for French-English on a number of metrics, for English-French, on all but one automatic evaluation metric, the performance of our EBMT system is superior to the baseline SMT model.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 11, Issue 3, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page