Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Latin: A Linguistic Introduction

By Renato Oniga and Norma Shifano

Applies the principles of contemporary linguistics to the study of Latin and provides clear explanations of grammatical rules alongside diagrams to illustrate complex structures.


New from Cambridge University Press!

ad

The Ancient Language, and the Dialect of Cornwall, with an Enlarged Glossary of Cornish Provincial Words

By Frederick W.P. Jago

Containing around 3,700 dialect words from both Cornish and English,, this glossary was published in 1882 by Frederick W. P. Jago (1817–92) in an effort to describe and preserve the dialect as it too declined and it is an invaluable record of a disappearing dialect and way of life.


New from Brill!

ad

Linguistic Bibliography for the Year 2013

The Linguistic Bibliography is by far the most comprehensive bibliographic reference work in the field. This volume contains up-to-date and extensive indexes of names, languages, and subjects.


Academic Paper


Title: Datasets for generic relation extraction
Author: B. Hachey
Institution: Macquarie University
Author: Claire Grover
Institution: University of Edinburgh
Author: R. Tobin
Institution: University of Edinburgh
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g. PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database or RDF triplestore that can be more effectively used for querying and automated reasoning. A number of resources have been developed for training and evaluating automatic systems for relation extraction in different domains. However, comparative evaluation is impeded by the fact that these corpora use different markup formats and notions of what constitutes a relation. We describe the preparation of corpora for comparative evaluation of relation extraction across domains based on the publicly available ACE 2004, ACE 2005 and BioInfer data sets. We present a common document type using token standoff and including detailed linguistic markup, while maintaining all information in the original annotation. The subsequent reannotation process normalises the two data sets so that they comply with a notion of relation that is intuitive, simple and informed by the semantic web. For the ACE data, we describe an automatic process that automatically converts many relations involving nested, nominal entity mentions to relations involving non-nested, named or pronominal entity mentions. For example, the first entity is mapped from ‘one’ to ‘Amidu Berry’ in the membership relation described in ‘Amidu Berry, one half of PBS’. Moreover, we describe a comparably reannotated version of the BioInfer corpus that flattens nested relations, maps part-whole to part-part relations and maps n-ary to binary relations. Finally, we summarise experiments that compare approaches to generic relation extraction, a knowledge discovery task that uses minimally supervised techniques to achieve maximally portable extractors. These experiments illustrate the utility of the corpora.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 18, Issue 1, which you can read on Cambridge's site .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page