LINGUIST List 12.353

Sun Feb 11 2001

Confs: Machine Translation Evaluation - Geneva

Editor for this issue: Lydia Grebenyova <>

Please keep conferences announcement as short as you can; LINGUIST will not post conference announcements which in our opinion are excessively long.


  1. Reeder,Florence M., Machine Translation Evaluation, Geneva

Message 1: Machine Translation Evaluation, Geneva

Date: Fri, 09 Feb 2001 12:52:13 -0500
From: Reeder,Florence M. <>
Subject: Machine Translation Evaluation, Geneva

MT Evaluation : An invitation to get your hands dirty!
(in conjunction with the MT Evaluation Working Group 
of the ISLE project)

- --------
A workshop on MT evaluation organised during the AMTA conference in 
Cuernavaca in 2000 included a series of practical exercises on machine 
translation evaluation. Carrying out the exercises provided insights 
into the difficulties and subtleties of MT evaluation, thus inspiring 
several of those present to suggest the organisation of a longer 
workshop whose primary focus would be to design and carry out portions 
of a thorough evaluation.

At the same time, the Evaluation Working Group of the ISLE project 
(funded by the EU, the NSF in the USA and by the Swiss and Danish 
governments) has been working on the provision of support material for 
those involved in MT evaluation. This material takes the form of 
classification schemes intended to be helpful in the definition of user 
needs, the choice of system characteristics of importance to the 
specific evaluation and the choice of metrics to be applied to system 
characteristics. The current version of the ISLE proposals can be seen 
at . 

Date and Place
- ------------
We invite you to a practical workshop to be held in Geneva between 
April 19th and 24th 2001. 

Organisation and activities
- -------------------------
Participants in the workshop will be provided with a scenario 
describing a practical situation in which an evaluation of an MT system 
or systems might be undertaken. The organisers will ensure that the 
scenario(s) reflect real life situations. Particpants will then spend 
two days designing an evaluation which is appropriate to their scenario, 
using a unified framework (ISLE) described in the introductory talks. 
They may choose to work alone or in small groups. Participants will 
have free access to the machine translation systems available on the 
web, and to the considerable computing support available at the 
University of Geneva School of Translation and Interpretation. As much 
as possible, the evaluations will be carried out. Results and experience 
will be pooled and discussed in the final day of the workshop. This 
workshop can be seen as one in an-going series (LREC 2000, AMTA 2000, 
NAACL 2001, MT Summit VIII 2001), where each workshop builds on the 
experience and the results of previous workshops. Potential participants 
need not however have participated in earlier workshops, although we 
would of course like to encourage them to participate in later ones.

More information about the conferences where workshops have been held 
or will be held can be found at
NAACL 2001:
MT Summit VIII:

Week outline timetable
- --------------------
* April 19th, morning : Introduction of the ISLE proposals. 
	Distribution and discussion of scenarios, formation of working 
* April 19th afternoon, April 20th : Design of evaluations.
* April 21st, morning : Execution of evaluations.
* April 21st, afternoon : Free time.
* April 22nd : Free time.
* April 23rd : Interpretation of evaluation scores and metrics.
* April 24th : Reports and discussion of results.

Major themes of interest
- ----------------------
	* What metrics are suitable for assessing what system 
	* What system characteristics reflect what user needs?
	* Is there a radical difference between evaluation focusing on 
	 research or development needs and evaluation focusing on 
	 end-user needs?
	* When should real world data be used, and what is the impact of 
	 using it?
	* What constitutes a valid metric? How can you demonstrate that 
	 a metric is valid?
	* What metrics can be automated?
	* What are the advantages and disadvantages of specific 
	* For the metric(s) selected for the evaluation, what are the 
	 difficulties in applying them? 
	* For a given metric, what variations in scores are typically 
	 What are the statistical error variances? 
	* For a given metric, what are the score ranges for 'good' and 
	 for 'bad' systems? 
	* Are there metrics which correlate with one another? Are there 
	 metrics which indicate an overall quality score?
	* Are there metrics which work better with specific language 

- -----------
Participation in the workshop is free of charge, although particpants 
must pay their own travel and living expenses. Because of the nature of 
the exercise, participation is limited to a maximum of 20 persons, and 
will be on a first come first served basis. Note, though, that if there 
is a team which would like to participate as a team, these restrictions 
may be relaxed in order to accomodate them. 

Further information can be obtained from 
	Maghi King at or 
	Florence Reeder at

How to register
- -------------
Send your registration request to Gisella Anspach at as soon as possible and at the 
absolute latest by March 15th. Gisella will also be able to help 
you if necessary with finding accomodation in Geneva, as will the 
Geneva Tourist Office whose site at 
will provide you with much information about the city. 

- -------
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue