Review of  Small Corpus Studies and ELT. Theory and practice

Reviewer: Raphael Salkie
Book Title: Small Corpus Studies and ELT. Theory and practice
Book Author: Mohsen Ghadessy Alex Henry Robert L Roseberry
Publisher: John Benjamins
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Subject Language(s): English
Book Announcement: 13.884

Ghadessy, Mohsen, Alex Henry, and Robert L. Roseberry, eds.
(2001) Small Corpus Studies and ELT: Theory and Practice.
John Benjamins Publishing Company, xxiii+419pp, hardback ISBN
1-58811-035-4 (US & Canada), USD 114.00, 90-272-2275-4 (ROW),
EUR 125.00, Studies in Corpus Linguistics 5.
Announced in

Raphael Salkie, University of Brighton, England.

Corpus enthusiasts have often had an interest in language
teaching. Traditional grammars, dictionaries and
textbooks tended to offer learners artificial,
decontextualised examples of language. A corpus of real
language enables teachers to use data that is both more
interesting and more natural. General-purpose corpora
have to be large, however, in order for significant
patterns to emerge. This has restricted their use to
people with powerful computers and strong programming

This book offers a way out of that problem. The fourteen
papers collected here aim to show that "the analysis of
small textual corpora ... has yielded discoveries about
language that are no less remarkable or important than
those derived from the study of huge corpora" (from the
editors' introduction). The book is aimed at teachers and
students of language teaching, as well as fellow

In his preface, John Sinclair denies having ever believed
that only large corpora can yield interesting results. He
observes that with a large corpus a common method is to
have the computer do a lot of the preliminary analysis;
while with a smaller corpus it is normal for 'human
intervention' to come at an earlier stage. He notes that
comparison of different text-types or genres is a common
investigative technique with small corpora. This is
certainly true: All except three of the contributors to
this volume work with collections of texts that are
specialised in some way.

Two of the three exceptions are chapters which describe
computational tools for working with small corpora. Paul
Nation's 'Using small corpora to investigate learner
needs: two vocabulary research tools' presents a pair of
Windows programs: 'Vocabprofile', compares the vocabulary
of a text against a list of the most frequent words in
English, or against a list that the user prepares.
'RANGE' compares the vocabulary of up to 32 texts at a
time. The programs can be used to measure the richness of
the vocabulary of a text presented to learners, or of a
text produced by learners. They can be downloaded for
free from <>.

The title of Mike Scott's paper is self-explanatory:
'Comparing corpora and identifying key words,
collocations, and frequency distributions through the
WordSmith Tools suite of computer programs'. Details of
the program are available at:

The third paper which is not about language for
specialised purposes is Ann Lawson's 'Collecting, aligning
and analysing parallel corpora'. Lawson describes the
different types of multilingual corpus and the software
for analysing them, and makes some suggestions about how
they can be used in the language classroom. The paper is
the best survey of this area that I have seen.

The other eleven papers have essentially the same
structure. The first part is always a theoretical
discussion of an issue. The issues are often about text
types: The notion of 'genre' gets a good going-over from
Alex Henry and Robert L. Roseberry in 'Using a small
corpus to obtain data for teaching a genre', and from
Marina Bondi in 'Small corpora and language variation:
reflexivity across genres'. Vincent Ooi discusses the
differences between printed and written texts in his
'Investigating and teaching genres using the World Wide

Several of the contributors are keen on Halliday's
systemic-functional grammar. Peter H. Ragan revisits the
concepts of 'field', 'tenor', 'mode', 'register' and
'thematic structure' in 'Classroom use of a systemic
functional small learner corpus', while 'Small corpora and
translation: comparing thematic organisation in two
languages' by Mohsen Ghadessy and Yanjie Gao provides even
more variations on the notion of theme. Geoff Barnbrook
and John Sinclair begin their paper 'Specialised corpus,
local and functional grammars' with a quote from Halliday
and later describe their analytical framework as 'more of
a functional grammar than [those] of Dik or Halliday'

Some of the authors discuss the theory of language
teaching. Chris Tribble surveys theories about teaching
L2 writing in his 'Small corpora and teaching writing',
while Geoff Thompson expands on trends in English language
teaching more generally in 'Corpus, comparison, culture:
doing the same things differently in different cultures'.
In a similar vein, Lynne Flowerdew discusses the theory
and practice of learner corpora (collections of texts
produced by language learners) in 'The exploitation of
small learner corpora in EAP materials', and Peter H.
Ragan journeys through similar territory.

For two of the papers, the main theoretical issue is the
big one: What is language? Robert de Beaugrande's paper
'Large corpora, small corpora and the learning of
"language"' reflects on this matter, as do Barnbrook and
Sinclair. The only paper of the eleven which almost
resists the urge to theorise is John Flowerdew's
"Concordancing as a tool in course design", although even
here the first part of the paper spends some time
comparing vocabulary size in a specialised corpus, a
traditional dictionary, and the brains of native speakers.

The second part of each paper is a description of a
specialised corpus. In the same order as hitherto, these

- Job applications, and introductions by academics of
guest lecturers (Henry & Roseberry).

- Abstracts of economics articles and introductory
chapters of economics textbooks (Bondi).

- 'Personal advertisements' (known in the UK as 'lonely
hearts') on the web by people from Singapore and the USA

- Instructions by 50 learners of English about a task
involving coloured wooden blocks (Ragan).

- Political commentaries in English and Chinese (Ghadessy
& Gao).

- Definitions from the COBUILD Students' Dictionary
(Barnbrook and Sinclair).

- Adverts on the web for MA courses in Applied Linguistics

- Tourist brochures in English and Chinese, and job
adverts in English and German (Thompson).

- Project reports written by University students who are
non-native speakers (Lynne Flowerdew).

- Novels by Jane Austen, and texts about the geography of
deserts (de Beaugrande).

- Texts about biology (John Flowerdew)

The third and final part of these eleven papers contains
suggestions about how the corpus or the information
derived from it can be used in language teaching and
learning. Sometimes the suggestions are very limited:
Barnbrook & Sinclair, for instance, do little more than
hint that 'new kinds of reference books' for learners of
English could be developed on the basis of their
analytical system. They also note that their semi-
automatic analysis of the definitions in the COBUILD
Students' Dictionary made possible a thorough review of
the way the dictionary was constructed, and that such a
review was used in a project to translate this dictionary
into other languages. Similarly, the brief suggestion by
Ghadessy and Gao about how to use their analysis of
thematic structure in English and Chinese texts
essentially says 'present this material to students', and
Thompson's proposals about how to show students his
comparison of tourist brochures and job adverts in
different languages is heavily modalised ('the primary
focus could be ... it would seem sensible to encourage the
learners to ...' etc).

Some of the contributors describe how they used their
corpus material with a specific group of students. Ragan
compared the instructions written by non-native speakers
with those produced by native speakers for word frequency
and various Hallidayan features; he then used this
material with the non-native speakers to help them improve
their English. John Flowerdew used his biology corpus as
part of an activity in which his students studied how to
use verbs such as ENCLOSE, SUSPEND, and SEPARATE which
occurred frequently in the texts.

A more thoughtful discussion of how the materials were
organised for students is given by Henry & Roseberry.
Aware that simply giving students a concordance with
numerous examples of a word or phrase can be intimidating
or uninteresting, they translated their analysis of job
applications into a hierarchical structure which students
accessed gradually to help them write their own sample
applications. Bondi reproduces a four-page worksheet that
she used with her students to help them study the language
of abstracts of economics articles: It includes
concordances of verbs like SHOW and ARGUE, and asks
students to say, for example, whether the subjects of
these verbs are discourse participants (WE, THE AUTHORS)
or discourse units (THIS PAPER, THE NEXT SECTION).

The contributors to this book are clearly corpus devotees
who get a buzz from compiling and analysing electronic
texts. There is nothing wrong with this: Lively
presentations of research and teaching experiences are
surely to be welcomed. The book also demonstrates
repeatedly that a small collection of texts can reveal
significant linguistic patterns, which is encouraging for
people without the time, expertise or computing resources
to handle large corpora.

Two problems make me doubt whether language teachers will
be convinced. Firstly, several of the papers compare
linguistic features in their specialised corpus with the
same features in a large reference corpus such as the
British National Corpus or the Bank of English. As a
research method this is impeccable, but of course it
requires access to a large corpus, which is precisely what
many language teachers do not have.

Secondly, none of the contributors attempts to show that
their teaching method actually works. This is a serious
weakness: Someone should take two similar groups of
students, teach one of them using corpus-derived material
and the other group in some other way, and measure the
outcomes. Until this type of study has been conducted,
using corpora in the classroom is an act of faith. Such a
study would have to factor in the many hours that teachers
need to spend compiling and analysing the corpus and
preparing teaching materials, when they could be devoting
the time to other useful activities such as resting, or
resisting globalisation. The study would also have to
take account of teacher and student enthusiasm, which is
precious but hard to measure. To use the current medical
buzzword, this is 'evidence-based teaching' - but only in
the sense that it uses authentic evidence in the
classroom, not because it is based on evidence about the
effectiveness of the teaching method.

Will this volume persuade more linguists to use computer
corpora in their research? I think that the title doesn't
help in this respect: An alternative would have been
'Specialised Corpora: Design, Research, Teaching Methods',
which would have been more accurate and might have attracted
linguists who are less interested in L2 pedagogy.

Furthermore, the research presented in the book is entirely
about specialised texts, which is likely to interest people
working on language variation rather than grammarians. On
the other hand, the relationship between descriptions of
sublanguages and descriptions of the general language system
is an issue which many linguists can usefully think about.
Barnbrook & Sinclair borrow the term 'local grammar' from
Gross (1993) for an analysis of a restricted domain of
texts, and it is clear that most of the contributors to this
book are constructing partial local grammars of this kind.
Barnbrook & Sinclair speculate that a battery of local
grammars will be able in a few years time to 'analyse
satisfactorily the bulk of open text', with 'general
grammar' having only a residual role. An initial reaction
might be that it is hard to see how young people could
acquire a set of local grammars plus a general grammar, but
there is certainly food for thought here.

Gross, Maurice. 1993. 'Local grammars and their
representation by finite automata'. In Michael Hoey
(ed.), Data, Description, Discourse: Papers on the English
Language in Honour of John McH. Sinclair (London, Harper
Collins), 26-38.

Raphael Salkie is Principal Lecturer in Language Studies
at the University of Brighton. His interests include
modality, translation and contrastive linguistics, and he
is the author of several papers about parallel corpora.