* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 18.2645

Tue Sep 11 2007

Diss: Computational Ling/Text & Corpus Ling: Hasler: 'From Extracts...'

Editor for this issue: Hunter Lockwood <hunterlinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Laura Hasler, From Extracts to Abstracts: Human summary production operations for computer-aided summarisation


Message 1: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
Date: 10-Sep-2007
From: Laura Hasler <L.Haslerwlv.ac.uk>
Subject: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
E-mail this message to a friend

Institution: University of Wolverhampton
Program: School of Humanities, Languages and Social Sciences
Dissertation Status: Completed
Degree Date: 2007

Author: Laura Hasler

Dissertation Title: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation

Dissertation URL: http://clg.wlv.ac.uk/papers/hasler-thesis.pdf

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics

Dissertation Director:
Michael Hoey
Ruslan Mitkov
Constantin Orasan

Dissertation Abstract:

This thesis is concerned with the field of computer-aided summarisation,
which has emerged at the confluence of the separate but related fields of
human and automatic summarisation. Due to the poor quality of the
readability and coherence of automatically produced extracts,
computer-aided summarisation (CAS) is a viable working option to fully
automatic summarisation. CAS allows a human summariser to post-edit
automatically produced extracts to improve their readability and coherence.
In order to best utilise the concept of computer-aided summarisation,
reliable ways of improving the coherence and readability of extracts when
transforming them into abstracts must be established.

To achieve this, a corpus-based analysis of the operations a human
summariser applies to extracts to transform them into abstracts is
presented. The corpus developed here is a corpus of pairs of news texts
annotated for important information (i.e., human-produced extracts) and the
human-produced abstracts corresponding to these extracts. The creation of
this corpus simulates the computer-aided summarisation process to enable a
reliable investigation into the operations used. A detailed classification
of human summary production operations is proposed, with examples which
highlight the common linguistic realisations and functions of the
operations identified in the corpus. The classification is then used as a
basis for guidelines which can be given to users of computer-aided
summarisation systems in order to ensure that the summaries they produce
are of a consistently high quality.

The human summary production operations are applied to extracts using the
guidelines in order to evaluate them. Evaluation is performed using a
metric developed for Centering Theory, a discourse theory of local
coherence and salience, which constitutes a new evaluation method. This is
appropriate because existing methods of evaluating summaries are
unsuitable. A set of both automatic and human-produced extracts and their
corresponding abstracts are evaluated, and a comparison is made with
evaluations given by a human judge. The evaluation shows that when the
operations are applied to extracts using the guidelines, there is an
improvement in the readability and coherence of the resulting abstracts.





Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.