Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Cognitive Literary Science

Edited by Michael Burke and Emily T. Troscianko

Cognitive Literary Science "Brings together researchers in cognitive-scientific fields and with literary backgrounds for a comprehensive look at cognition and literature."

New from Cambridge University Press!


Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."

Review of  Exploring Corpora for ESP Learning

Reviewer: Philip McCarthy
Book Title: Exploring Corpora for ESP Learning
Book Author: Laura Gavioli
Publisher: John Benjamins
Linguistic Field(s): Applied Linguistics
Text/Corpus Linguistics
Subject Language(s): English
Issue Number: 17.3033

Discuss this Review
Help on Posting
AUTHOR: Gavioli, Laura
TITLE: Exploring Corpora for ESP Learning
SERIES: Studies in Corpus Linguistics 21
PUBLISHER: John Benjamins
YEAR: 2005

Philip M. McCarthy:
Institute for Intelligent Systems, Department of Psychology, The University
of Memphis

As a teacher and researcher in corpus linguistics, I have become
increasingly frustrated over recent years by the mysterious lack of corpus
analysis applications. There has been an avalanche of articles, books, and
conferences dedicated to the glories of corpus linguistics and yet, there
has been a dearth of practical, simple, solid, and (above all) interesting
pedagogical applications. For me then, corpus linguistics in the classroom
has been like Mars glittering in the night sky: so full of possibility and
yet no one can tell us much more than there might be water there. To
justify a journey to Mars, we need to know what we could do with it once we
have invested so much to get there. To justify the investment in building
corpora for the classroom, we need to know, clearly, how we investigate
corpora, and what corpora offer us that a traditional classroom text book
does not. In “Exploring corpora for ESP learning,” I am happy to report
that the Eagle has landed, and that there is an abundance of not only of
water but life itself.

TESOL has grown to considerable prominence over recent years, and despite
its less than glamorous reputation, it has also managed to produce its own
offspring in the form of English for Specific Purposes (ESP). And while ESP
(lacking a fancy academic bloodline) may never be considered a legitimate
heir, the wealth and interest such a field appears to be producing has made
TESOL (in general) and ESP (in particular) a field that academia will, at
the very least, have to send an invitation to for all forthcoming
linguistic balls. Of course, with English being something of a world
language, the wealth generated by TESOL (in general) and ESP (in
particular) will forever mean that these fields cannot be completely
ignored. What few seem to have recognized however, and Laura Gavioli is
clearly a notable exception, is that the golden child of (corpus)
linguistics appears to have taken rather a shine to ESP. That is, corpora
are relatively small collections of samples of a particular language
register (e.g. science texts, public speeches, telephone conversations
etc.) and ESP is a field dedicated to teaching these registers as
specialized, distinct, (almost quirky) modules of the blurry and so often
contradictory superstructure. Thus, what corpus is, is what ESP teaches,
and the two are forever joined at the ''lip''. In this succinct, yet
thoroughly informative offering, Gavioli has managed to demonstrate this
happy marriage of beauty and the beast. In seven concise chapters, Gavioli
outlines just why it is that corpus analysis in an invaluable component of
language research and language teaching, making this text an invaluable
read for any teacher of ESP and any researcher interested in corpus studies.


In the introductory chapter, Gavioli lays out her theme that small
specialized corpora can help students better understand the idiosyncratic
(or specialized) language of a given discourse community. Naturally, the
emphasis on specialized language and specialized corpora gears the text
very much towards an audience of ESP teachers; however, as Gavioli points
out, even the most general of language courses will also feature many
classes specializing in a variety of registers (e.g., letter writing,
telephone calls, and job applications).

Chapter 2 serves as a brief history of corpus linguistics and a brief
outline of the importance that corpus work can play in language pedagogy.
Concerning the history, Gavioli relates the story of modern corpus
development starting in the 60s and 70s and highlighted by the development
of the Brown and LOB Corpora. Gavioli then moves on to the 70s and 80s and
outlines how Chomsky’s work led to what many view as a hiatus in corpus
investigations. In the 1990s, Gavioli argues, there was a revival in corpus
work led by such figures as John Sinclair and his Cobuild project. Gavioli
also explains that the 1990s saw a significant development not only in
technology, but critically, in the availability of that technology. This
technology, Gavioli argues, has allowed researchers (and teachers and
students) to collect, store, and analyze data easily and cheaply.

Of course, the revival of corpus research has not been without its
problems, and Gavioli is careful to detail such considerations. For
example, Gavioli explains that corpora are _samples_ of language, rather
than language itself, and we must be careful to recognize the limits these
samples provide. Gavioli also cites Widdowson’s (1998) argument that
“reality does not travel with the text” (p. 711). That is, a corpus can
give us examples of language, but it does not provide the context from
which the language was produced. As such, the information that the corpus
supplies is not synonymous with all aspects of meaning. From a pedagogical
point, Gavioli also reminds us of two further limitations of the corpus.
First, we must remember that exposing students to the “real language” of a
corpus does not mean that students’ language will improve. That is,
exposure itself is not enough: students also need to be trained how to use
and interpret corpus data. And second, Gavioli cites Carter (1998) who
argues that while corpora often supply excellent authentic examples of
language use, invented language examples are often more concise and more
useful to students.

But while there are many concerns that researchers, teachers, and students
need to consider when using corpora, there are clearly many benefits too.
For example, corpus and concordance work may assist the teachers in
syllabus design (Flowerdew, 1993); in the analyses of discourse markers
(Zorzi, 2001); in translation, synonymy, and issues of false friends
(Partington, 1998); and in helping students to become autonomous
researchers, noticing problems and forming theories as to usage (Johns,
1991). Perhaps of most importance to the theme and goal of Gavioli’s text,
however, is the “idiom principle” (Sinclair, 1991; 1996). This principle
posits that there are numerous and ubiquitous regularities within language
that are indicative of certain registers. These regularities cannot be
explained by grammar or lexical-logical systems alone. They can, however,
be revealed by corpus investigations. This idiom principle, more so than
any other benefit of corpus work, forms the backbone of Gavioli’s argument
for the necessity of presenting corpus analysis in the ESP classroom. While
the remaining chapters sees Gavioli develop many of the arguments listed
above, the idiom principle will be a constant theme, underlining and
emphasizing the marriage of corpus analysis and ESP classes.

Chapter 3 focuses on developing the theory and evidence for the Sinclair
(1991) notion of the “idiom principle” and how, given such a foundation,
students need to be trained to recognize and understand the often
complicated output generated in corpus analysis. The idiom principle is a
sociolinguistic/conventional explanation of language features that
contrasts with the rationalistic principle associated with the work of
Chomsky. Gavioli argues that proponents of the idiom principle supplied
much evidence suggesting that word combinations (collocations) and
lexico-grammatical combinations (colligates) occur as a matter of
convention or fashion rather than as simple logical or grammatical
inevitabilities. In this chapter, Gavioli highlights the evidence for the
idiom principle and outlines the importance of teaching its conclusions to
language students. Gavioli also makes clear that while concepts such as the
idiom principle remain key for demonstrating the importance of corpus work,
we cannot expect students to recognize that importance without training. As
such, the chapter spends much time outlining how, where and why students
need to be guided on the interpretation of data generated from corpus
investigations. Gavioli demonstrates that without this guidance students
will tend to use corpus data poorly and, consequently, will neither benefit
nor enjoy the experience.

In Chapter 4, Gavioli focuses on the emergence of specialized corpora of
registers and how they might be used in comparison to general corpora.
Gavioli begins by outlining the 1990s debate on corpus size, where ‘large
corpora’ were often viewed as not large enough, and ‘quite small corpora’
were equally often viewed as perfectly sufficient. Gavioli points out that
ESP is often served better by the smaller, more specific, “specialized”
variety of corpora. These smaller corpora focus on instances of language
indicative to a particular register. As a consequence, results of such
analyses tend to be more reliable inasmuch as the idiosyncrasies of the
register are not drowned by features that are common in more general corpora.

In chapter 5, Gavioli offers more advice and guidance for introducing
students to interpreting corpus analyses. Gavioli begins by pointing out
that students will probably be unfamiliar with tools such as concordancers
and the output that they generate. Students may also not appreciate that a
concordancer produces results based on samples and that these samples
cannot be relied on to be “the truth.” Students also need to understand
that a concordancer does not explain results, as a dictionary, a book, or a
teacher might. As such, students need to understand that they have a
responsibility in unearthing meaning and usage from the data. A final point
the Gavioli raises is one that might easily have been overlooked: students
need to understand that results generated from corpora are representative
of a certain register, and that whatever is concluded from those results
may not be necessarily be true of the language in general. Once again then,
the marriage of corpus analysis and ESP classes is emphasized.

Gavioli goes on to explain that interpreting concordance data is an
inductive activity. Such activities, she argues, are not untypical of the
classroom where a teacher often presents examples and the students learn to
generalize rules from them. That having been acknowledged, however, Gavioli
also makes clear that there is a difference between “samples” and
“examples.” The blackboard is generally the place of a limited number of
good, clear examples. A concordance, on the other hand, may be a list of
hundreds of cases, few of which are particularly clear examples of a
meaning or usage. Sheer volume of raw ‘samples’ is not good enough to help
students determine the meaning or usage of a phrase or word.

To help begin working with corpus data, Gavioli recommends simply starting
by telling students what a concordance is. This is followed by gradually
introducing how a concordance is different from blackboard examples. Much
of the remainder of the chapter shows how this might be approached. The
examples and tasks provided are clear and broad, and should provide an
adequate base for many teachers to adapt the approach to the needs of their
own classrooms.

In Chapter 6, Gavioli discusses how the student can become part of the
relevant discourse community. Gavioli argues that corpora of specialized
language offer students a collection of language samples that are
indicative of the discourse community, its beliefs, and its conventions.
Corpus tools allow students to filter this information, highlighting the
aspects of language that are particularly relevant to the student. Chapter
6 differs from the previous chapter in that the emphasis on investigation
switches to the student interest rather than the teacher interest. As such,
examples and approaches are presented for a more autonomous student.

Chapter 7 summarizes the goals and findings of the preceding chapters.
Gavioli also takes this opportunity to pose a number of questions for
future research to consider.


While Gavioli presents a succinct, informative, and much needed text that
any teacher of ESP would certainly benefit from, the limited size and scope
of the book (and a not inconsiderable price tag) leave a number of areas
open to criticism. First, Gavioli maintains that her book is more about
'how we learn' than 'how we teach' and that her presentations are only a
guide for teachers and it is for individual teachers themselves to decide
how such activities and resources might suit their own classes. Such
admissions are honest, accurate, fair, and yet insufficient. The text
certainly presented a number of examples of student reaction to class
exercises, but, to this reader at least, the text actually provided little
on 'how we learn', and may have been not so much a guide as it was a
pointer. That is, I am perhaps less convinced than Gavioli that many ESP
teachers can so easily turn her well intended (though limited) examples
into appropriate exercises.

A second criticism of Gavioli’s approach is a criticism common to most
texts concerning corpus analysis. That is, why is there always the
assumption that it must begin and end with a concordancer? Even if we
accept completely Gavioli’s argument that corpus analysis allows us to go
beyond lexico-syntactic language learning and into the idiosyncratic
register-relevant language, then why can we not discuss the abundance of
other kinds of tools that are available? For example, there is Coh-Metrix
(Graesser, McNamara, Louwerse, & Cai, 2004), LIWC (Pennebaker & Francis,
1999), Landscape (Tzeng, van den Broek, Kendou, & Lee, 2005) and many other
(often far more simple) methods through which corpora can be studied and
language registers can be appreciated. This is not to say that Gavioli
should have reported on these tools as often as she did the concordancer;
however, it is to say that corpus analysis for researchers, teachers, and
students must consider the advantages and benefits of tools other than

A final criticism concerns the text’s lack of quantitative evidence.
Obviously, ESP does not have a strong tradition of statistical analysis,
but as demonstrated elegantly in Mind and Context in Adult Second Language
Acquisition (Sanz, 2005), it is more than possible to bring quantitative
terminology and evidence to a text without overburdening even a complete
novice. Texts on corpus analysis that ignore quantitative evidence are
doomed to endless instances of phrases such as “it seems to me”: the
prevalence of this phrase in Gavioli’s text became ever more frustrating
with each encounter, ensuring us that whatever the merits of the text (and
there are many) the real evidence may be little more than Gavioli’s
heartfelt opinion.

While these three criticisms distract from the value of Gavioli’s text, the
book remains an excellent and concise analysis of the benefits and
approaches to corpus studies in ESP classrooms. Perhaps too brief to be a
course book on its own, even for an undergraduate class, ''Exploring Corpora
for ESP Learning'' is certainly an accessible, interesting, and insightful
text that all teachers in any walk of corpus studies should consider.


Carter, R (1998). Orders of reality: CANCODE, communication and culture.
ELT Journal, 52, 43-56.

Flowerdew, J. (1993). Concordancing as a tool in course design. System, 21,

Graesser, A., McNamara, D. S., Louwerse, M., & Cai, Z. (2004). Coh-Metrix:
Analysis of text on cohesion and language. Behavioral Research Methods,
Instruments, and Computers, 36, 193-202.

Johns, T. (1991). Should you be persuaded: Two examples of data-driven
learning. In
Johns & King (Eds.), 1-6.

Partington, (1998). Patterns and Meaning. Amsterdam: John Benjamins.

Pennebaker, J. W. & Francis, M. (1999). Linguistic Inquiry and Word Count:
LIWC. Mahwah, NJ: Erlbaum.

Sanz, C. (Ed.). (2005). Mind and context in adult second language
acquisition: methods, theory, and practice. Washington, D.C.: Georgetown
University Press.

Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: OUP.

Sinclair, J. (1996). The search for units of meaning. Textus, 9, 75-106.

Tzeng, Y., van den Broek, P., Kendeou, P., & Lee, C. (2005). The
computational implementation of the Landscape Model: Modeling inferential
processes and memory representations of text comprehension. Behavioral
Research Methods, Instruments & Computers, 37, 277-286.

Widdowson, H.G. (1998). Context, community and authentic language. TESOL
Quarterly, 32, 705-716.

Zorzi, D. (2001). The pedagogic use of spoken corpora: Learning discourse
markers in Italian. In Aston (Ed.), 85-107.

Philip McCarthy is a research scientist at the Institute for Intelligent
Systems, Department of Psychology, at The University of Memphis. His main
research interests are developing and testing algorithms for textual
analysis. McCarthy also teaches a variety of Linguistics’ courses and has
over 10 years experience in teaching English as a foreign language.