Review of Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching
Date: Mon, 14 Apr 2003 19:21:22 +0700
From: Viatcheslav Iatsko
Subject: Computer Learner Corpora, Second Language Acquisition and Foreign
Granger, Sylviane, Joseph Hung and Stephanie Petch-Tyson ed. (2002)
Computer Learner Corpora, Second Language Acquisition and Foreign
Language Teaching, John Benjamins Publishing Company, Language Learning
and Language Teaching 6.
Viatcheslav Iatsko, Department of English, Katanov State University of Khakasia
The book under review is a collection of articles which focus on
interrelationships between computer learner corpora (CLC), second language
acquisition (SLA) and foreign language teaching (FLT). The contributors are
qualified experts in CLC from different countries. Each contribution is followed
by an extensive "References" section; the book is supplied by useful name and
subject indexes. Since emphasis is made on theoretical as well as practical
aspects of computer learner corpora analysis, this book may be of interest to
researchers, teachers and practitioners engaged in CLC, SLA and FLT studies.
The volume is divided into three sections.
The first section entitled "The role of computer learner corpora in SLA research
and FLT" is an introductory chapter written by Sylviane Granger (Belgium), which
provides a general overview of learner corpus research and situates learner
corpora within SLA studies and FLT. This chapter can be divided into two parts.
The first one deals with different characteristics, typology, methodology of
learner corpora linguistic analysis (contrastive and error analyses) and
software tools applied in the process of such analysis (text retrieval programs,
part-of-speech-tagging, error tagging). This part contains valuable observations
about techniques of CLC analysis obtained from the author's personal experience.
The second part is concentrated on pedagogical aspects of CLC research,
curriculum and materials design.
I can't help mentioning a disputable and perhaps contradictory statement
formulated by Granger. While describing the field of corpus linguistics the
author on the one hand states: "It is neither a new branch of linguistics nor a
new theory of language..." (p.4), on the other hand Granger agrees with the
experts who characterize corpus linguistics as "new research enterprise" (p.4).
This statement seems strange since during at least the last decade corpus
linguistics has been considered a linguistic discipline by the majority of
representative of linguistic community. As Granger correctly writes on the same
page, corpus linguistics has own its own methodology primarily aimed at
quantitative analysis of corpora, at describing frequency features of linguistic
phenomena. The author should have added that corpus linguistics has its own
theory, foundations of which constitute Bradford's law of scattering (Bradford,
1953) and Zipf's law (Zipf, 1935). Finally, the existence of corpus linguistics
as a linguistic subfield is confirmed by numerous books and conferences
regularly announced on the Linguist List.
The second section "Corpus-based approaches to interlanguage" illustrates a
range of corpus based approaches to interlanguage analysis. It comprises three
chapters written by Bengt Altenberg (Sweden), Karin Aijmer (Sweden), and Alex
Housen (Belgium). In the opening chapter "Using bilingual corpus evidence in
learner corpus research" B. Altenberg carries out comparisons of original-
version and translated Swedish to test the hypothesis that overuse of causative
"make" with adjective complements by Swedish L2 writers is due to L1 transfer.
Using an aligned Swedish-English corpus the author finds that the overuse is due
to an overgeneralization of the cross-linguistic similarity between "make" and
its Swedish counterpart. Altenberg's research is based on sound methodology that
comprises thorough contrastive analyses of a given language feature in a
bilingual corpus and checking the results against a learner corpus to see
whether the learners' output shows evidence of transfer from their L1.
In the second chapter "Modality in advanced Swedish learners' written
interlanguage" Aijmer uses computer learner corpora to compare the range and
frequency of some modal words in native English writing and English L2 writing
of advanced level university students. Although the primary focus of her
investigation is Swedish L2 writers, she regularly conducts comparisons with
French and German L2 writers in an attempt to ascertain whether features of
Swedish L2 writing are likely to be L1-induced or more generally shared by L2
writers of different language backgrounds. This investigation compares modal
forms (modal verbs and adverbs) in compositions produced by non-native and
native speakers to reveal a considerable overuse of these forms, a tendency,
which may be partly developmental, partly interlingal.
In the third chapter Housen presents the results of a cross-sectional, corpus-
based study into the acquisition of the basic forms and functions of the English
verb system. Using rather sophisticated techniques of annotated oral CLC data
processing the author managed to single out developmental patterns for
acquisition of verbal morphology by L2 learners grouped into four different
levels of proficiency. Apart from that, Housen investigated patterns of use of
various verb form categories to find out that learners fluctuate between overuse
and underuse as they fine-tune form-meaning associations. It also turned out
that there may be significant individual variation in the route of development,
even between learners of the same proficiency level and L1 background.
Though Housen's study is based on the output of Dutch and French L2 learners,
the results of the investigation are sure to be of interest for researchers and
practitioners who work with L2 learners of different language backgrounds. These
results may be especially important for those who work with L2 learners whose L1
doesn't have such a variety of verb forms as English. For example acquisition of
English verb tense forms presents lots of difficulties for Russian speaking
students since Russian has only three basic tense forms, progressive and
perfective meanings being expressed either lexically of by verb affixes.
The third section of the book "Corpus-based approaches to foreign language
pedagogy" comprises 5 chapters written by Fanny Meunier (Belgium); Angela
Hasselgren (Norway); Ulla Connor, Kristen Precht, Thomas Upton (USA); Quentin
Grant Allan (China); Barbara Seidlehofer (Austria). Meuner's contribution "The
pedagogical value of native and learner corpora in EFL grammar teaching" is
divided into two parts. In part one the author examines the field of EFL grammar
teaching from an SLA perspective, considering current thinking and current
practice within SLA community. Meuner points out that native corpus research has
contributed to a more adequate description of English grammar: frequency of the
same grammatical features' occurrence varies in different text types, that why
English grammar is no longer seen as a monolithic entity but rather as been
comprised of several specific grammars pertaining to different discourse types.
Meuner provides convincing evidence that the development of native and learner
corpus research caused profound changes in curriculum design, reference tools,
and classroom EFL grammar teaching. For example a frequency list of English
irregular verb forms obtained from native corpora enabled teachers to sequence
the study of these verbs in order of frequency instead of presenting them in
alphabetical order; learner corpus research makes it possible to identify forms
problematic for L2 learners and take into account learners' mother tongue;
modern dictionaries provide frequency and register information; native corpora
are a rich source of authentic examples included in modern textbooks.
In the second chapter "Learner corpora and language testing: small words as
markers of learner fluency" Hasselgren analyzes spoken data obtained from 14-15
year old Norwegian L2 learners to demonstrate how the use of small words, such
as "well", can distinguish more fluent speech from less fluent speech.
Automatically retrieving a core group of these words and phrases from the speech
of groups differentiated by mechanical fluency markers, the author provides
evidence that greater fluency is accompanied by greater quantity and variety of
small words. Hasslegren also proposes a possible sequence for the acquisition of
small words and a set of fluency descriptors.
Though Hasselgren's research is innovative in nature, its main thesis seems
doubtful and not well substantiated. Small words (such as "well", "right", "you
know", not really") are treated by the author as discourse markers, which make a
crucial contribution to coherence: "The ability to create coherence in
Shiffrin's terms is compatible with the way fluency is identified in this
article" (p.149). In modern grammars (Downing & Locke, 2002; L. Brinton (2000);
V.Iatsko (2001a), words and phrases indicated by Hasselgren are considered to be
modal words/phrases, modal adverbs, modal parentheses expressing such notions as
possibility, probability, volition, etc. For example "well" expresses hesitation
(Downing & Locke, pp. 554-555), while "really" (in the negative context)
expresses doubt (Downing &Locke, p.384). It's rather unlikely that words
expressing doubt and hesitation contribute to speech fluency. The author should
have provided a more profound analysis of small words' semantic features.
In the third chapter "Business English: learner data from Belgium, Finland and
the US" Connor, Precht, and Upton demonstrate the value of combining traditional
textlinguistic tools of genre analysis, such as the identification of rhetorical
moves, with a genre specific corpus to make broader statements about how
different writers approach writing for a specific purpose. The learner corpus
used in this study is an intercultural collection of letters of job applications
from native and non-native speakers of English. The investigation revealed that
while some rhetorical moves were used by all three groups, others were more
group specific suggesting that different cultural norms might exist for the
genre. Connor et al. highlight the sometimes unexpected impact that such
differences may have for people attempting to apply for jobs across languages
Though the results of Connor et al.'s research are well substantiated some of
its theoretical assumptions seem superficial. For example, the authors state
that "...the interweaving of discourse, syntax and lexicon have been overlooked
by most previous research" (p.176). The point is that such interweaving,
correlation between different planes of discourse (semantic, communicative,
modal, relational) is in focus of integrational discourse analysis conception,
which I have been developing since 1996 (Iatsko, 2001b). According to another
statement "...a great deal of the corpus-based, more applied work has focused on
the lexico-grammatical patterning of text, producing collocations and lists of
fixed phrases; much of this work has centered on the propositional level of
texts with less regard to functional and rhetorical aspects" (p.177). It might
be of interest to the authors that a corpus based methodology for analyzing
rhetorical aspects of discourse has been developed in W.Mann's (1998)
conception. Since both, Iatsko's and Mann's conceptions are available on the
Internet, Connor et al. could have taken the trouble to find and study them.
In the fourth chapter Allan describes Secondary Learner Corpus (TSLC), a
resource which uses corpus data in systematic ways to raise the language
awareness of secondary level English teachers in Hong Kong. TSLC, accessible via
a computer network, is used in conjunction with a number of modern English
corpora. Together, these corpora are an invaluable resource for answering
teachers question about aspects of grammar and usage through Language Corners,
and for systematic linguistic analysis of areas of English in which Hong Kong
students experience difficulty.
To the best of my knowledge, there is nothing like TSLC in my country and
methods described by Allan can be adopted, fine-tuned to local conditions and
fruitfully used in teacher training here, in Russia as well as in some other
In the fifth chapter "Pedagogy and local learner corpora: working with learning-
driven data" Seidhofer suggests a methodologically innovative corpus analytic
approach, which she calls "learner driven data", enabling students to be both
participants in and analysts of their own language. According to this approach
computer tools are used for compiling and collaboratively analyzing a written
learner corpus consisting of short complete texts (summaries and "accounts"
produced by students. Seidhofer describes the success of the approach in
motivating students to adopt corpus analysis techniques for research in
linguistics, for work on language awareness.
It should be noted that because summaries for the corpus were prepared manually
Seidhofer missed a good opportunity to introduce her students to techniques of
automatic text summarization, such as compiling a dictionary of speciality
terms, determining summary size, editing summary (Iatsko 2001c).
An advantage of the publications in this book is a new type of contrastive
analysis, contrastive interlanguage analysis (Granger, 1998) which is aimed at
providing data from L1 (learners' mother tongue), L2 (English), and
interlanguage. To re-enforce interpretative power of this analysis the authors
use output of different groups of L2 learners thus getting more reliable
results. For example Altenberg compares output of French and Swedish L2
learners; Aijmer uses output of Swedish, French, and German L2 writers.
This book is a significant contribution to learner corpus research, the new area
of linguistic inquiry that emerged as an important link between two previously
disparate fields of corpus linguistics and foreign/second language research.
Bradford, Samuel C. (1953) Documentation. London: Crosby & Lockwood
Brinton, L. (2000) The structure of modern English. Amsterdam; Philadelphia:
Downing A., Locke, Ph. (2002) A university course in English grammar. London;
New York: Routledge.
Granger, S. (1998) The computer learner corpus: a versatile new source of data
for SLA research. In: S.Granger, ed. Learner English on Computer. London; New
Iatsko, V. (2001a) English syntax for Russian speaking students. Abakan: Katanov
State University of Khakasia Press
Iatsko V. (2001b). Integrational discourse analysis. Abakan: Katanov State
University of Khakasia http://www.khsu.ru/ida
Iatsko, V. (2001c) Linguistic aspects of summarization. In: Philologie im Netz.
2001. N 18. www.fu-berlin.de/ phin/phin18/p18i.htm
Mann, W. (1998) Rhetorical structure theory.
Zipf, G.K. (1935) Psycho-Biology of Languages. Houghton-Mifflin
| ABOUT THE REVIEWER:
ABOUT THE REVIEWER V. Iatsko is professor in the Department of English and Head of Computational Linguistics Laboratory at Katanov State University of Khakasia located in Abakan, Russia. His research interests include text summarization, text grammar, TEFL, contrastive analysis of English and Russian syntax.