FYI: 6th Recognizing Textual Entailment Challenge
Sixth Recognizing Textual Entailment Challenge at TAC 2010
The Recognizing Textual Entailment (RTE) task consists of developing a
system that, given two text fragments, can determine whether the meaning of
one text is entailed, i.e. can be inferred, from the other text.
Since its inception in 2005, RTE has enjoyed a constantly growing
popularity in the NLP community. After the first three highly successful
PASCAL RTE Challenges campaigns held in Europe, in 2008 RTE became a track
at the Text Analysis Conference (TAC), bringing it together with
communities working on NLP applications. The interaction has provided the
opportunity to apply RTE to specific application settings and move it
towards more realistic scenarios. In particular, the RTE-5 Pilot Search
Task represented a step forward, as for the first time textual entailment
recognition was performed on a corpus, instead of isolated H-T pairs, and
on a real NLP application, namely Summarization.
Encouraged by the positive response obtained so far, the RTE Organizing
Committee is glad to launch the Sixth Recognizing Textual Entailment
Challenge, proposed for the third year as a track of TAC.
Organizations interested in participating in the RTE-6 Challenge are
invited to submit a track registration form by May 21, 2010, at the TAC
2010 web site:
What is new in RTE-6
1) RTE-6 does not include the traditional RTE Main Task which was carried
out in the first five RTE challenges, i.e. there will be no task to make
entailment judgments over isolated T-H pairs drawn from multiple applications.
2) A new Main Task based on only the Summarization application setting is
proposed, together with a subtask:
- Main Task: Recognizing Textual Entailment within a Corpus.
A close variant of the Pilot Search Task in RTE-5, the RTE-6 Main Task
differs significantly in two ways:
* Unlike in RTE-5, where the Search Task was performed on the whole corpus,
in RTE-6 a preliminary Information Retrieval filtering phase is performed
using Lucene, in order to select for each H a subset of candidate entailing
sentences to be judged by the participating systems.
* In the RTE-6 data set some of the H's have no entailing sentences.
- Novelty Detection subtask. This task has the same structure as the Main
Task, but it is separated out as a subtask to allow participants to
optimize their RTE engines for detecting novelty, i.e. judging whether the
information contained in each H is novel with respect to the information
contained in the corpus. A novel H is defined as one that has no entailing
sentences in the set of candidate T's. Systems' outputs will have the same
format as for the Main Task but will be specifically scored using metrics
designed for assessing novelty detection.
3) A KBP Validation Pilot, set in the Knowledge Base Population scenario,
is also proposed.
4) The exploratory effort on resource evaluation will be extended also to
tools. Mandatory ablation tests for both knowledge resources and tools will
be required to participants in the new RTE-6 Main Task.
RTE-6 Main Task - Recognizing Textual Entailment within a Corpus
In the RTE-6 Main Task given a corpus, a hypothesis H, and a set of
'candidate' entailing sentences for that H retrieved by Lucene from the
corpus, RTE systems are required to identify all the sentences that entail
H among the candidate sentences.
The RTE6-Main data set is based on the data created for the TAC 2009 Update
Summarization task, consisting of a number of topics, each containing two
sets of documents, namely i) Cluster A, made up of the first 10 texts in
chronological order of publication date, and ii) Cluster B, made up of the
last 10 texts. H's are standalone sentences taken from Cluster B documents,
meanwhile candidate entailing sentences (T's) are the 100 top-ranked
sentences retrieved for each H by Lucene from the Cluster A corpus, using H
verbatim as the search query. While only the subset of the candidate
entailing sentences must be judged for entailment, these sentences are not
to be considered as isolated texts, but the entire Cluster A corpus, to
which the candidate entailing sentences belong, is to be taken into
consideration in order to resolve discourse references and appropriately
judge the entailment relation.
The example below presents a hypothesis referring to a given topic and some
of the entailing sentences found in the subset of candidate sentences (the
first entailing sentence entails H because 'new hurricane' can be seen to
resolve to 'Hurricane Rita' from the context in which it occurs in its
Cluster A document):
prices fell further on Tuesday, despite a new hurricane powering towards
oil facilities in the Gulf of Mexico, and as OPEC pledged to supply more
crude from the start of October if required.
headed toward the Gulf of Mexico, threatening Texas and Louisiana with
winds of 160 kilometers per hour (100 mph).
Rita pounded the fragile Florida Keys islands Tuesday as it barreled toward
the oil-rich Gulf of Mexico.
RTE-6 Novelty Detection Subtask
The Novelty Detection subtask is based on the Main Task and is aimed at
specifically addressing the interests of the Summarization community, in
particular with regard to the Update Summarization task, focusing on
detection of novelty in Cluster B documents.
The task consists of judging if the information contained in each H (drawn
from the cluster B documents) is novel with respect to the information
contained in the set of Cluster A candidate entailing sentences. If for a
given H one or more entailing sentences are found, it means that the
content of the H is not new. On the contrary, if no entailing sentences are
detected, it means that the information contained in the H is regarded as
The Novelty Detection Task requires the same output format as the Main Task
- i.e. no additional type of decision is needed. Nevertheless, the Novelty
Detection Task differs from the Main Task in the following ways:
1) The H's are only on a subset of the H's used for the Main Task;
2) The system outputs are scored differently, using specific scoring
metrics designed for assessing novelty detection.
The Main and Novelty Detection Task guidelines for participants, together
with one sample topic taken from the Development Set, are available at the
RTE-6 Website (http://www.nist.gov/tac/2010/RTE/).
RTE-6 KBP Validation Pilot Task
Based on the TAC Knowledge Base Population (KBP) Slot-Filling task, the new
KBP validation pilot task is to determine whether a given relation
(Hypothesis) is supported in an associated document (Text). Each slot fill
that is proposed by a system for the KBP Slot-Filling task would create one
evaluation item for the RTE-KBP Validation Pilot: the Hypothesis would be a
simple sentence created from the slot fill, while the Text would be the
source document that was cited as supporting the slot fill.
The guidelines and the Development Set will be available by the end of
April 2010 at the RTE-6 website (http://www.nist.gov/tac/2010/RTE/).
Resource and Tool Evaluation through Ablation Tests
The exploratory effort on resource evaluation started in RTE-5 will
continue on the new RTE-6 Main Task and will be extended to tools. Ablation
tests are required for systems participating in the new RTE-6 Main Task, in
order to collect data to better understand the impact of both knowledge
resources and tools used by RTE systems and evaluate their contribution to
systems' performance. An ablation test consists of removing one module from
a complete system, and rerunning the system on the test set with the other
modules (excluding the module being tested). Comparing the results to those
obtained by the complete system, it is possible to assess the practical
contribution given by the individual module.
The RTE Resource Pool at ACLwiki
The RTE Resource Pool, set up for the first time during RTE-3, serves as a
portal and forum for publicizing and tracking resources, and reporting on
their use. All the RTE participants and other members of the NLP community
who develop or use relevant resources are encouraged to contribute to this
The RTE Resource Pool has been updated with a section specifically
dedicated to knowledge resources. The new page
) contains a list of the 'standard' RTE resources, which have been selected
and exploited majorly in the design of RTE systems during the RTE
challenges held so far, together with the links to the locations where they
are made available. Furthermore, the results of the ablation tests carried
out in RTE-5, and their description, is also provided.
April 23 KBP Validation Pilot: Release of Development Set
April 30 Main Task: Release of Development Set
May 21 Deadline for TAC 2010 track registration
September 2 Main Task: Release of Test Set
September 9 Main Task: Deadline for task submissions
September 10 KBP Validation Pilot: Release of Test Set
September 16 Main Task: Release of individual evaluated results
September 17 KBP Validation Pilot: Deadline for task submissions
September 24 Main Task: Deadline for ablation tests submissions
September 24 KBP Validation Pilot: Release of individual evaluated results
September 26 Deadline for TAC 2010 workshop presentation proposals
October 1 Main Task: Release of individual ablation test results
October 20 Deadline for systems' reports
Track Coordinators and Organizers
Luisa Bentivogli, CELCT and FBK, Italy (Track coordinator, firstname.lastname@example.org)
Danilo Giampiccolo, CELCT, Italy (Track coordinator, email@example.com)
Hoa Trang Dang, NIST, USA
Ido Dagan, Bar Ilan University, Israel
Peter Clark, Boeing, USA