| EDITORS: Torsello, Carol Taylor; Ackerley, Katherine; Castello, Erik
TITLE: Corpora for University Language Teachers
SERIES: Linguistic Insights: Studies in Language and Communication. Vol. 74
PUBLISHER: Peter Lang
Wendy Anderson, Department of English Language, University of Glasgow, UK
This volume should appeal to a wide audience, from language teachers intrigued
by the potential of corpora in the classroom, to corpus linguists looking for an
overview of projects and research being undertaken in universities across Italy.
The volume arose from an event, ''Corpora: Seminar and Workshops'', held at the
University of Padua in March 2007. The event was to have been opened by John
Sinclair, but he died a couple of weeks before it took place and instead the
proceedings are offered as a tribute to his pioneering work in corpus
linguistics. In addition to the editors' introduction, which summarizes each
paper, Carol Taylor Torsello offers personal comments on the pleasure of working
with John Sinclair, and Guy Aston offers a fitting tribute to Sinclair's
contribution to corpus linguistics.
The papers are presented in three groups: Part I ''Getting Started'' contains four
papers which together aim to provide an overview of aspects of corpus
linguistics which will be useful for readers unfamiliar with this growing field.
Part II ''Ideas and Suggestions for Corpus Work'' contains seven papers which are
likely to stimulate ideas which can be applied to language teachers' own
context. Part III ''Corpora in University Foreign Language Teaching'' is a group
of six papers united by a focus on successful corpus projects. This part also
presents more in-depth discussion of techniques, such as mark-up and multimodal
concordancing, and of types of corpora, including learner corpora.
Part I begins with a paper which will no doubt encourage readers unfamiliar with
corpora to try out some analysis for themselves. This is Elena Tognini Bonelli's
paper on ''Corpora and LSP: Issues and Implications''. It offers a beautifully
concise overview of Sinclair's contextual theory of meaning, introducing
concepts such as collocation, colligation, semantic preference and semantic
prosody, and demonstrating the ways in which they work together in the creation
of meaning in texts. It is often assumed that scientific terms do not enter into
co-selection relationships with other words, instead standing as independent
units of meaning; however, using examples from the field of economics, Tognini
Bonelli shows that patterning can be identified through corpus methodology and
that even terms acquire pragmatic dimensions through repeated use. This
naturally raises the question of whether a word can ever be said to exist in
After this accessible introduction to corpora, and corpora of Language for
Specific Purposes, there follows an article which will appeal strongly to the
university language teachers picked out as the target audience in the title of
the volume. Maria Teresa Prat Zagrebelsky's paper, ''Learner Corpora at the
Crossroads of Computer Corpus Linguistics, Foreign Language Pedagogy and Second
Language Acquisition Research'', provides a personal perspective on the Italian
component of the International Corpus of Learner English (ICLE, for more on this
see Granger, Dagneux and Meunier, eds, 2002), as well as an overview of the
value of learner corpora and current research trends. Admittedly, the article
offers little which is not available elsewhere, but as an introduction to work
using learner corpora aimed at enthusiastic novices, this repackaging is useful.
The next two papers, Maria Teresa Musacchio and Giuseppe Palumbo's ''Shades of
Grey: A Corpus-driven Analysis of LSP Phraseology for Translation Purposes'' and
Francesca Coccetta's ''Multimodal Corpora with MCA'' highlight the value of two
further types of corpus. These are the comparable corpus, which has applications
in translation and translator training, and the multimodal corpus (also
discussed by Baldry, this volume). To an extent, Musacchio and Palumbo's article
develops ideas presented by Tognini Bonelli, with its focus on phraseology and
collocation in Language for Specific Purposes, again selecting examples from the
field of economics. The article goes into little depth, but offers some useful
thoughts on corpora and translation studies. Coccetta, on the other hand,
discusses the very real issues involved in using spoken corpora for promoting
communicative competence: representing spoken data through orthographic
transcription alone means that features such as stress patterns which contribute
to the dynamicity of language are lost. She also introduces the multimodal
concordancer tool and tagging system used in the Padova Multimedia English
Corpus, showing how the corpus can be interrogated to study language functions
which rely on more than one mode. She draws on the example of the expression of
negative appreciation, which is achieved by language users through combinations
of verbs, constructions, and gesture.
Part II begins with one of the highlights of the volume, Alan Partington's
article entitled ''The Armchair and the Machine: Corpus-assisted Discourse
Research''. This shows, through an analysis of Clinton and Bush press briefings,
the interaction of quantitative and qualitative, observation and contemplation
needed for non-obvious meaning to emerge from texts.
The next paper, Caroline Clark's ''A CADS Analysis of Television Reports from
Iraq: Were Embeds 'in Bed' with the Coalition?'' follows naturally, presenting an
extended example of a corpus-assisted discourse approach to the question of
reporter objectivity. In particular, Clark demonstrates how corpus methodology
offers a different picture from the media commission analyses of the same
question. In both Partington's and Clark's articles, language teachers may find
inspiration for student projects and activities.
Sara Gesuato's discussion of ''Linguistic Research with Large-scale Corpora''
turns the focus from detailed analysis of small genre-specific corpora to
consider the syntactic and lexical patterns which emerge from large general
corpora. Gesuato uses the online version of the Bank of English to demonstrate
how teachers and students can obtain a different perspective on constructions
like _be going_ + progressive infinitive, and nuances of meaning between pairs
of near synonyms like _incredible_ and _unbelievable_. The relevance of the
discussion to the language classroom is made evident (though language teachers,
and indeed linguistics researchers, may find that clearer illustration would be
useful, perhaps through extracts of concordances).
Margherita Ulrych and Amanda Murphy, in their contribution, ''Descriptive
Translation Studies and the Use of Corpora: Investigating Mediation Universals'',
use a monolingual parallel corpus to show how descriptive translation studies
and contrastive linguistics can be brought closer together with the use of
corpora. The article prefigures more in-depth work on mediated discourse which
aims to distinguish mediation universals from language differences.
Continuing the focus on translation, Christopher Taylor, in ''Predictability in
Film Language: Corpus-assisted Research'', explores the nature of film language
compared with other genres of written and spoken language. He makes the
intriguing finding that film transcriptions show less discrepancy from authentic
spoken language than do scripts and subtitles, and suggests that actors
consciously or unconsciously approximate real usage. The predictability of film
language means that translation memory can be useful, but features such as
foreignization cannot be easily approached in this way. Here, instead, corpora
may have a role to play.
Careful corpus design is at the core of the research described by Erik Castello
in his paper, ''A Corpus-based Study of Text Complexity''. Through a corpus of
texts used in language testing, Castello shows how it is possible to compare
texts according to three types of feature (lexical richness, syntactic
complexity, readability) to establish their relative difficulty levels. His
preliminary findings suggest that density and length are the more crucial factors.
The final paper in Part II is Larissa D'Angelo's ''Creating a Corpus for the
Analysis of Identity Traits in English Specialised Discourse''. The specialized
discourse in question is academic discourse, specifically as represented in
CADIS (Corpus of Academic Discourse), with its subcorpora of texts from the
fields of economics, legal studies, linguistics and medicine. The paper will be
useful for university language teachers who want to build their own corpus, as
it covers the basics well, and makes interesting suggestions about how to
implement ideas. For example, she suggests that it could be useful for students
to be involved in the creation of the corpus they are to use.
The papers in Part III are grouped to highlight applications in the language
classroom. These are prefaced by an overview of the XML edition of the British
National Corpus (BNC) by Guy Aston—which may in fact have sat better in Part I,
but which is likely to be relevant to all groups of readers, those who use or
plan to use the BNC in teaching, and those who are creating corpora and need to
be aware of mark-up standards. Aston presents a very clear and balanced
description of the BNC, regretting the lack of a more recent counterpart which
could be used for comparative analysis, but highlighting the advantages which
come from the use of a corpus which has undergone a lot of correction. In
particular, he draws attention to the new features of the Xaira analysis
software which accompanies the BNC-XML.
Giuseppe Brunetti tackles the subject of ''Tagging the Lexicon of Old English
Poetry'', with a description of his online electronic edition of Beowulf designed
to facilitate students' access to the lexis and grammar of Old English. The
resource will be of use in a small number of language classrooms, and the
step-by-step discussion of the mark-up system developed is easily applied to
The final four papers return to themes already touched on. Anthony Baldry
explores the use of multimodal corpora for language courses, in ''Turning to
Multimodal Corpus Research for Answers to a Language-course Management Crisis''.
After a discussion of the nature of multimodal corpora, Baldry takes three
extended examples of how these can help overcome specific problems of syllabus
construction, such as how to integrate resources.
Picking up again the theme of English for Specific Purposes, Katherine Ackerley,
in ''Using Comparable Expert-writer and Learner Corpora for Developing
Report-writing Skills'', shows how corpora can be used to raise student awareness
of genre features in report-writing. This exploits word-lists, concordances and
clusters, applied to both learner corpora and corpora of exemplar texts.
Potential corpus users will find good ideas here, expressed clearly and without
assuming much prior familiarity.
The recurrent focus on translation appears again in Silvia Bernardini's paper,
'''What Students Want'...? Practical Suggestions for Corpus-aided Translator
Education''. Bernardini explains how a corpus can be exploited as a ''technology
companion'' for specialist translation courses. She outlines the MeLLANGE
resources for translation professionals, stemming from an EU Leonardo-funded
project, and works through an example to show how such tools can aid the
translator in making appropriate lexical choices. While little evaluation of the
project is presented here, the paper offers a good illustration of how
translation can benefit from technology.
The final contribution, by Fiona Dalziel and Francesca Helm, is entitled
''Exploring Modality in a Learner Corpus of Online Writing'', and returns to the
notion of a learner corpus, in this case the ''Padova Learner Debate Corpus'' of
online student production. Taking further D'Angelo's suggestion that students be
involved in corpus creation, they describe how it is possible for students to be
involved in the creation of a corpus containing their own written production.
While there are methodological issues to overcome here, it is likely to increase
student engagement with language, and certainly partially overcomes the problem
of the decontextualization of corpus texts. Dalziel and Helm use examples of the
modal verbs 'must' and 'should', and confirm Aijmer's finding (Aijmer 2002) that
there is a ''high degree of topic sensitivity in the use of certain modals'' (p.292).
This volume is broad in scope, and as such gives a very good picture of the
range of uses which corpora, in all their various forms, have found in language
learning and teaching. Many of their applications have drawn inspiration to a
greater or lesser degree from the work of John Sinclair, to whom these
proceedings are dedicated.
The papers themselves are mixed in quality and length. Some papers go into
little depth (e.g. Musacchio and Palumbo, D'Angelo), and while they may well
still provide inspiration for corpus projects in language teaching, novice users
will be obliged to look elsewhere for sufficient information to be able to
implement suggestions. On the whole, however, the discussion in the various
papers is presented in ways which should appeal to language teachers who are not
particularly familiar with corpora but want to find out more. Most papers begin
with a basic-level discussion, demonstrate successful existing work, and provide
ideas of how tools and techniques can be applied to different contexts. There is
a consistency of format in the presentation of papers. A couple of issues remain
where the editors could have intervened more, for example to tone down the very
intrusive and persistent use of italics for emphasis (as opposed to linguistic
examples) in Gesuato's paper. But this is a minor point.
The grouping of papers into three Parts is somewhat arbitrary. Most papers could
quite easily fit into any of the three; on the other hand, a number of papers do
not sit particularly comfortably where they currently are. Coccetta's article,
while excellent in its own right, describes a more specific type of corpus than
one would expect to be treated in detail in a section on ''Getting Started''.
Indeed, Aston's straightforward introduction to the BNC-XML may have been more
appropriately located here. It is difficult in fact to see the rationale behind
the grouping of papers in Parts II and III, as all provide ideas and
suggestions, and almost all make their relevance to university foreign language
teaching evident. It would have been possible to group papers according to the
type of corpus exploited (for example, learner, translation, ESP). If the
editors were keen to retain this order of presentation, however, a more thorough
introductory essay would have served to bring together the various strands of
research. Links are made between papers, but a more ambitious discussion of the
state of the art would have allowed the reader to contextualize the papers more
The steep price of the paperback volume - £39.00 / €52 / $80.95 - is likely to
discourage individuals, but it should find a place in university libraries as a
good source of insight and ideas.
Aijmer, K. 2002. Modality in Advanced Swedish Learners' Written Interlanguage.
In Granger, S., Hung, J. and Petch-Tyson, S. (eds) _Computer Learner Corpora,
Second Language Acquisition and Foreign Language Teaching_. Amsterdam:
Granger, S., Dagneux, E. and Meunier, F. (eds) 2002. _International Corpus of
Learner English_. Louvain-la-Neuve: Presses Universitaires de Louvain.
ABOUT THE REVIEWER
Wendy Anderson is Lecturer in the Department of English Language, University of
Glasgow, Scotland. Her teaching and research interests include: semantics,
corpus linguistics, English, Scots and French, and translation. Between 2004 and
2008, she was Research Assistant for the Scottish Corpus of Texts and Speech
(SCOTS), and Corpus of Modern Scottish Writing projects, at the University of
Glasgow. Recently, with John Corbett, also of University of Glasgow, she
published _Exploring English with Online Corpora_ (Palgrave Macmillan, 2009).