Review of Teaching and Learning by Doing Corpus Analysis.

Reviewer: Svetlana Kurteš
Book Title: Teaching and Learning by Doing Corpus Analysis.
Book Author: Bernhard Ketteman Georg Marko
Publisher: Rodopi
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Subject Language(s): English
Language Family(ies): New English
Issue Number: 13.2739

Date: Sun, 20 Oct 2002 08:00:14 -0700 (PDT)
From: svetlana kurtes
Subject: Applied Linguistics: Kettemann and Marko (2002) Teaching and Learning by Doing
Corpus Analysis (2002)

Kettemann, Bernhard and Georg Marko (eds) 2002.
Teaching and Learning by Doing Corpus Analysis, Rodopi, viii+390pp,
hardback ISBN 90-420-1450-4, Language and Computers: Studies in
Practical Linguistics 42.
Announced at

Reviewed by Svetlana Kurtes, Language Centre,
University of Cambridge, UK


'Teaching and learning by doing corpus analysis',
edited by Bernhard Kettemann and Georg Marko
(henceforth the editors), represents the proceedings
of the Fourth International Conference on Teaching and
Language Corpora, held in Graz (Austria), 19-24 July
2000. There are 23 paper in the volume, classified in
6 thematic chapters: 'General aspects of corpus
linguistics'; 'Corpus-based teaching material',
'Data-driven learning', 'Learner corpora', 'Corpus
analysis of ESP for teaching purposes', 'Corpus
analysis and the teaching of translation'. In the
editors' introduction it has been pointed out that
'there is [...] a growing number of people who believe
that learning a language, learning about a language
and learning through a language might greatly benefit
from an inductive approach. Through the analysis of
large corpora of authentic language [...], learners do
no longer have to rely on the intuitions of
prescriptive scholars but can inductively draw their
own conclusions, which seems to be a highly desirable
goal in the age of 'learner autonomy' (p.1). There is
also a short introductory word by Tony McEnery
entitled ''TALC 4 ' where are we going?' giving a
historical background on the Teaching and Language
Corpora (TALC) conferences. A list of contributors
(with brief biographical details) and a subject index
are appended.

Guy Aston's paper 'The learner as corpus designer',
opening the first thematic chapter 'General aspects of
corpus linguistics', discusses the pedagogical benefits
of 'home-made' corpora, maintaining that they should
be seen as more appropriate than pre-compiled ones
'insofar as they can be specifically targeted to the
learner's knowledge and concerns [...], permit[ting]
analyses which would not otherwise be readily feasible
[...].' The examples are taken from the BNC Sampler.

In her paper 'The time dimension in modern English
corpus linguistics' Antoinette Renouf highlights the
importance of developing an automated system able to
identify and record features of a language seen as a
synchronic entity, but also take into account
important features of its diachronic change. The
author in particular focuses on an ongoing lexical
change in Modern English, concluding that modern
diachronic English corpus linguistics 'is an area ripe
for growth' (p.39).

Mike Scott's contribution 'Picturing the key words of
a very large corpus and their lexical upshots or
getting at the Guardian's view of the world' reports
on the results of an analysis of some 800,000
newspaper articles taken from 'The Guardian' from 1984
to the present. An extensive key word database has
been compiled and produced as a CD-ROM, also enclosed
with the volume. Interrelationships between the key
words are briefly presented and illustrated with
appropriate examples and further applications for
language teaching are noted.

'Where did we go wrong? A retrospective look at the
British National Corpus' is the title of Lou Burnard's
paper, which reviews the design and management issues
and decisions taken during the construction of the
BNC. It also describes the new World Edition of the
BNC and the associated SARA retrieval package. The
author is of the opinion that it would be very useful
to build a series of BNC-like corpora at regular
intervals, preferably every decade, enabling the
linguists to watch 'the river of language flow and
change across time' (p.68).

Chapter 2 ('Corpus-base teaching material') opens with
Averil Coxhead's paper 'The academic word list: a
corpus-based word list for academic purposes'
outlining the principles of vocabulary learning and
corpus linguistics which guided the development of the
Academic World List (Coxhead 1998) based on a corpus
of approx. 3,500,000 running words of written academic
prose. The author maintains that the most prominent
principles underpinning the study are those claiming
that teachers should teach materials which are
relevant to the learners, that they should teach the
most useful vocabulary no matter what the student
subject area is, and, finally, the most important
words should be dealt with first. The word-lists are

Dieter Mindt's article 'A corpus-based grammar for
ELT' presents major characteristics of a new grammar
(Mindt 2000), appearing as a result of ten years' work
on the English verb system. It is fully corpus-based
and especially geared to the requirements of ELT,
addressing in particular the needs of advanced
learners of English. All examples provided are
authentic and frequency data are given wherever

Tim Johns' article 'Data-driven learning: the
perpetual challenge' opens Chapter 3('Data-driven
learning'). The author outlines the development of an
approach to the use of corpus data in language
learning and teaching, tracing it briefly from the
early 1980s, when the COBUILD project, directed by
John Sinclair, was set up at Birmingham University
(Sinclair 1987).

'Empowering non-native speakers: the hidden surplus
value of corpora in Continental English departments'
is the title of Christian Mair's contribution, in
which the author discusses the role of corpora in
enabling non-native speaking students of English 'to
develop a rational view of the authority and
limitation of native-speaker intuition, thus
dispelling an unfounded and unproductive mystique
frequently surrounding the native speaker and his/her
judgement [...]'(p. 125). English departments in German
universities are taken as an illustration.

Gunter Lorenz's article, entitled 'Language corpora
rock the base: on standard English grammar, perfective
aspect and seemingly adverse corpus evidence',
discusses how the English language corpora, by making
authentic language available for language teaching,
have helped to redefine the notion of standard
language to which language learners should aspire.
Taking the perfective verbal aspect as an example, the
paper re-examines the concept of 'grammatical rule' in
learning and teaching English.

'Toward automating a personalized concordancer for
data-driven learning: a lexical difficulty filter for
language learners' is a contribution by David Wible,
Chin-Hwa Kuo, Feng-yi Chien and C C Wang in which the
authors present a novel teaching tool, called the
Lexical Difficulty Filter (LDF), developed to increase
the control over the examples retrieved from corpus
and concordancing resources, in particular the control
over the level of difficulty of the retrieved
material. The authors also propose further refinements
and extensions to the LDF.

John M Kirk's paper 'Teaching critical skills in
corpus linguistics using the BNC' proposes a
methodology comprising two main pro formas: one for
corpus searching and one for reading scholarly
articles, through which students prepare themselves
for a project-based assessment. The author maintains
that corpora can be used 'for the purpose of enabling
students to learn about the structure of English,
develop a descriptive and theoretical vocabulary, and
cultivate a methodology for dealing analytically with
[...] language' (p. 154; also Kirk 1994:29).

Silvia Bernardini contribution is entitled 'Exploring
new directions for discovering learning'. The author
discusses the role of corpora in providing rich
sources of autonomous learning activities. Learners
'are introduced to a number of corpus tools and guided
to progress from more convergent activities to
autonomous browsing' (p.165). Positive and negative
sides of this approach are discussed and some
suggestions for further improvements are put forward.

'The CWIC project: developing and using a corpus for
intermediate Italian students', a contribution by
Claire Kennedy and Tiziana Miceli, presents major
issues of the compilation of a corpus of contemporary
written Italian (CWIC) and its integration into the
Italian studies programme at Griffith University in
Australia. The authors discuss some linguistic,
pedagogical and practical issues in the selection and
preparation of the material, concluding with some
observations on the evaluation process.

Natalie Kubler ('Linguistic concerns in teaching with
language corpora') discusses how the web-based
environment for language teaching can enable students
to understand sentence segmentation, multi-word units,
ambiguity problems and other linguistic phenomena. The
model was developed at the University of Paris 7 at
the department of Intercultural Studies and Applied

Chapter 4 ('Learner corpora') opens with Ylva Berglund
and Oliver Mason's article 'The influence of external
factors on learner performance'. The authors report on
the initial stage of a research project examining the
relationship of different types of texts exclusively
on the basis of external parameters. The proposed
method will enable the analysis of language learner
data, identifying 'how such data differs from the
production of native speakers' (p.205). The paper
presents the reasoning behind the project and
describes the method developed in more detail.

'How to trace the growth in learners' active
vocabulary?' is the title of Agnieszka
Lenko-Szymanska' article in which the author reports
on a study 'whose aim was to compare the validity,
applicability and meaningfulness of two measures of
lexical richness, lexical variation and lexical
sophistication, for tracing the growth in learners'
free active lexicon' (p. 217). The research was based
on a selection of texts from the PELCRA corpus of
learner English compiled at the University of Lodz.

John Flowerdew's contribution entitled
'Computer-assisted analysis of language learner
diaries: a qualitative application of word frequency
and concordancing software' demonstrates a more
qualitative application word frequency and
concordancing programmes. The author presents the
experience of the English language teacher education
programme at Hong Kong University, where the students
are asked to focus on various aspects of the learning
process and keep a weekly diary in which they record
their reflections. The author analysed the students'
notes and reported on their preoccupation as language
learners and the identification of key words used by
means of a word frequency programme.

Chapter 5 ('Corpus analysis of ESP for teaching
purposes') opens with David Lee's paper entitled
'Genres, registers, text types, domains and styles:
clarifying the concepts and navigating a path through
the BNC jungle'. The author clarifies the notions of
register, text type, domain, style, sublanguage,
message form, etc, checking them against the BNC
files. It has been proposed that a database containing
genre labels will hugely facilitate genre-based
research (such as EAP, ESP, discourse analysis,
lexico-grammatical and collocational studies).

'Some thoughts on the problem of representing ESP
through small corpora', a contribution by Laura
Gavioli, discusses the problem of corpus
representativeness. In particular, the author raises
the issue of small corpus representativeness and
criteria used in design of small corpora of
specialized language used in ESP teaching and learning

In his paper 'Modal verbs in academic writing', Paul
Thompson reports on an investigation of the uses of
modal auxiliary verbs in a corpus of PhD theses
written by native speakers of English. The Reading
Academic Text corpus, established in 1996, is composed
of 39 PhD theses coming from two departments:
Agricultural Botany and Agricultural Economics. It was
established as a resource for research into academic
writing practices and EAP pedagogy.

The last chapter, 'Corpus analysis and the teaching of
translation', opens with Federico Zanettin's article
'CEXI: designing an English Italian translational
corpus'. The author reports on project aiming to
construct a bilingual corpus at the School for
Translators and Interpreters of the University of
Bologna in Forli. It is a bi-directional, parallel,
translation-driven corpus, consisting of over 4
million words found in text samples published between
1975 and 2000.

'Mandative constructions in English and their
equivalents in French: applying a bilingual approach
to the theory and practice of translation' is Noelle
Serpollet's paper, the objective of which is to
analyse those French constructions that are translated
by occurrences of mandative 'should' in English (e.g.
'I insisted that he should change his clothes').
Serpollet reports on a systematic analysis of two
grammatically tagged corpora of British English (The
Lancaster-Oslo/Bergen Corpus and the Freiburg-LOB
Corpus), as well as the bilingual corpus INTERSECT
(The International sample of English Contrastive Texts
Corpus). The author briefly explores the impact of
corpus linguistics on translation studies.

Claudia Claridge's paper is entitled 'Translating
phrasal verbs' and it brings up the question of
phrasal and prepositional verbs in English and German,
focusing in particular on potential problems German
learners of English can encounter while acquiring this
part of English idiomaticity.


The present volume encompasses a variety of papers
raising relevant issues in the theory and practice of
corpus linguistics implemented into language pedagogy.
The editors successfully managed to select a
representative body of contributions delivered at the
4th International Conference on Teaching and Language
Corpora (TALC), involving both practitioners and
theorists from various academic and non-academic
fields. The papers are carefully thematically grouped
into six chapters, the descriptive labels of which
address the central thematic category around which the
papers cluster. In spite of an impressive variety of
topics discussed and approaches deployed, the editors'
choice exhibits a real mastery in maintaining a strong
theoretical and methodological coherence of the
volume. It is no doubt one of the main reasons why it
will attract the attention of a wide audience,
comprising both academics and professionals in the
fields of corpus and computational linguistics,
language pedagogy, theory and practice of translation,
stylistics (genre analysis in particular),
lexicography, information retrieval, etc.

The originality of ideas expressed and their practical
application illustrated and discussed represent a
genuine contribution to the subject fields mentioned,
advancing our understanding of their key issues and
pointing at the possible directions to be taken in
research and its implementation. Therefore, we have no
hesitation in recommending the volume to the attention
of the target audience.

Just one suggestion, perhaps. Although the majority of
contributions deal with the various issues of English
electronic corpora, a number of papers also take into
account work with multilingual corpora, which is
certainly praiseworthy. It would be very informative,
though, to include discussions dealing with the
problems of the corpora of less commonly taught
languages and endangered languages, their compilation,
practical implementation, etc. Maybe one of the future
TALC conferences can tackle the issue.


Coxhead, A 1998. 'The development and evaluation of an
academic world list', unpublished MA thesis, Victoria
University of Wellington, Wellington.

Kirk, J M 1994. 'Teaching and language corpora: the
Queen^Òs approach'. In Wilson A and A McEnery: Teaching
and language corpora, University of Lancaster
Department of Modern English Language and Linguistics
Technical Reports, Lancaster.

Mindt, D 2000. 'An empirical grammar of the English
verb system', Cornelsen, Berlin.

Sinclair, J M 1987. 'Looking up: an account of the
COBUILD project in lexical computing, Collins Cobuild,

About the Reviewer Svetlana Kurtes holds a BA in English Philology and an MA in Sociolinguistics from Belgrade University and an MPhil in Applied Linguistics from Cambridge University. She worked as a Lecturer in English at Belgrade University and is currently affiliated to Cambridge University Language Centre. Her research interests involve contrastive linguistics, sociolinguistics, pragmatics/stylistics, translation theory and language pedagogy.o

