Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Vowel Length From Latin to Romance

By Michele Loporcaro

This book "draws on extensive empirical data, including from lesser known varieties" and "puts forward a new account of a well-known diachronic phenomenon."


New from Cambridge University Press!

ad

Letter Writing and Language Change

Edited By Anita Auer, Daniel Schreier, and Richard J. Watts

This book "challenges the assumption that there is only one 'legitimate' and homogenous form of English or of any other language" and "supports the view of different/alternative histories of the English language and will appeal to readers who are skeptical of 'standard' language ideology."


Academic Paper


Title: Automated unsupervised authorship analysis using evidence accumulation clustering
Author: Robert Layton
Institution: University of Sheffield
Author: Paul Watters
Homepage: http://www.comp.mq.edu.au/~pwatters
Institution: University of Sheffield
Author: Richard Dazeley
Institution: The University of Ballarat
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 19, Issue 1, which you can READ on Cambridge's site or on LINGUIST .



Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page