LINGUIST List 17.3066|
Wed Oct 18 2006
FYI: RTE 3 Preliminary Announcement
Editor for this issue: Amy Renaud
To post to LINGUIST, use our convenient web form at
RTE 3 Preliminary Announcement
Message 1: RTE 3 Preliminary Announcement
From: Danilo Giampiccolo <infocelct.it>
Subject: RTE 3 Preliminary Announcement
Apologies for cross-postings
3RD PASCAL TEXTUAL ENTAILMENT CHALLENGE AND RESOURCES POOL
Encouraged by the success of the two previous rounds of the Recognizing
Textual Entailment (RTE) challenge (more details to be found at the RTE
http://www.pascal-network.org/Challenges/RTE2/), the RTE organizing
committee would like to announce the 3rd round of the PASCAL Recognizing
Textual Entailment (RTE) Challenge.
RTE has been proposed as a generic empirical framework for evaluating
semantic inference in an application independent manner. The goal of the
first RTE has proven to be of great interest and the community's response
encouraged us to gradually extend its scope. In the 2nd RTE Challenge 23
participating groups presented their work at the PASCAL Challenges Workshop
in April 2006 in Venice. The event was successful and the number of
participants and their contributions demonstrated that Textual Entailment
is a quickly growing field of NLP research. Already, the workshops have
spawned a large number of publications in major conferences, with more work
in progress (see RTE-2 website for a comprehensive reference list).
RTE 3 HIGHLIGHTS: WHAT IS NEW IN THE NEXT CHALLENGE
RTE 3 will follow the same structure of the previous campaign, to
facilitate the participation of newcomers and to allow assessing
improvements of earlier systems. Nevertheless, the following innovations
will be introduced to extend the challenge:
+ A limited number of longer texts - i.e. one or two paragraphs long - will
be introduced as a first step towards addressing broader settings which
require discourse analysis.
+ An RTE Resource Pool has been created as a shared central repository and
evaluation forum for resource contributors and users (see details below).
+ A tentative proposed dataset based on the results from the Answer
Validation Exercise in the QA track at CLEF 2006, as a new pilot task.
TASKS AND DATA DESCRIPTION
RTE is the task of recognizing that the meaning of one text is entailed
(can be inferred) by another. The input to the challenge task consists of
pairs of text units, termed T(ext) - the entailing text, and H(ypothesis) -
the entailed text. The task consists in recognizing a directional relation
between the two text fragments, deciding whether T entails H. More
specifically, we say that T entails H if, typically, a human reading T
would infer that H is most likely true. System results will be compared to
a human-annotated gold-standard test corpus.
The following H/T pairs exemplify the task proposed in the challenge:
T: Dr. George Carlo, an epidemiologist, asserts that medical science
indicates increased risks of tumors, cancer, genetic damage and other
health problems from the use of cell phones.
H: Cell phones pose health risks.
T: The available scientific reports do not show that any health problems
are associated with the use of wireless phones.
H: Cell phones pose health risks
T: Exposure therapy is the main therapy used for treating agoraphobia. As
agoraphobic problems tend to be more widespread, treatment can take longer
- from three to six months.
H: Agoraphobia is a widespread disorder.
T: With agoraphobia there is widespread avoidance and restriction of
activities and places.
H: Agoraphobia is a widespread disorder.
The test and development data sets will be based on multiple data sources
and are intended to be representative of typical problems encountered by
applied systems. Examples will be a mixture of pairs that could/could not
be successfully handled by existing systems. As in RTE-2, data types
corresponding to the following application areas will be used (see the
RTE-3 website for more detail):
Question Answering (QA):
Simulating a QA scenario in which the hypothesized answer has to be
inferred from the candidate text passage.
''Propositional'' Information Retrieval (IR):
Propositional queries (e.g. ''Women are poorly represented in Parliament'')
from IR evaluation datasets are chosen as hypotheses, and (correct and
incorrect) sentences retrieved by IR systems are proposed as texts.
Information Extraction/Relation Extraction (IE):
Existing systems will be trained on several IE-style relations, and
positive and negative examples from the system's output will be picked to
generate T-H pairs.
Using the output of multi-document text summarization systems, sentence
pairs that have high content overlap are converted into T-H pairs. We also
plan to exploit the Pyramid method introduced as an evaluation methodology
in the DUC 2005 competition.
ANSWER VALIDATION PILOT TASK
We are tentatively planning to introduce a pilot task where T/H pairs are
taken from system results from the Answer Validation Exercise in the QA
track at CLEF 2006. This data is contributed by UNED (Universidad Nacional
de Educación a Distancia).
This year, the aim will be to include a limited proportion of longer texts
-one or two paragraphs long- moving toward more comprehensive scenarios
which require discourse analysis.
THE RTE RESOURCE POOL AT NLPZONE.ORG
One of the key conclusions at the 2nd RTE Challenge Workshop was that
entailment modeling requires vast knowledge resources that correspond to
different types of entailment reasoning. Examples of useful knowledge
include ontological and lexical relationships, paraphrases and entailment
rules, meaning entailing syntactic transformations and certain types of
world knowledge. Textual entailment systems also utilize general NLP tools
such as POS taggers, parsers and named-entity recognizers, sometimes posing
specialized requirements to such tools. With so many resources being
continuously released and improved, it can be difficult to know which
particular resource to use.
In response, RTE-3 will include a new activity for building an RTE Resource
Pool, which will serve as a portal and forum for publicizing and tracking
resources, and reporting on their use. We actively solicit both RTE
participants and other members of the NLP community who develop or use
relevant resources to contribute to the RTE Resource Pool. Contributions
include links and descriptions of relevant resources as well as
informational postings regarding resource use and accumulated experience.
Utilized resources will be cited and evaluated by the RTE-3 participants
and their impact will be reviewed in the RTE-3 organizers paper, which we
hope will reward contributors of useful resources.
The RTE Resource Pool is hosted as a sub-zone of NLPZone.org, a new
community portal. The resource pool has been seeded with a few resources,
however its usefulness relies on the community’s (including your!)
contributions. Details on how to contribute to the RTE Resource Pool are
available at http://www.NLPZone.org.
Development Set Release: Early December, 2006.
Test Set Release and Submissions: Early March, 2007.
Workshop: Early Summer, 2007.
(We plan to propose having the RTE-3 workshop as an ACL 2007 workshop, to
be held late June in Prague).
Danilo Giampiccolo, CELCT (Trento), Italy (coordinator)
Bernardo Magnini, ITC-irst (Trento), Italy (advisor)
Ido Dagan, Bar Ilan University, Israel (supervisor and scientific advisor)
Bill Dolan, Microsoft Research, USA
Patrick Pantel, ISI, USA (RTE Resources Pool)
Danilo Giampiccolo: infocelct.it, and put [RTE3] in the subject line.
The preparation and running of this challenge has been supported by the
EU-funded PASCAL Network of Excellence on Pattern Analysis, Statistical
Modelling and Computational Learning.
Microsoft Research and CELCT will provide assistance in the creation and
annotation of the data sets.
Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.