Review of Linguistic Informatics - State of the Art and the Future

Book Title: Linguistic Informatics - State of the Art and the Future
Book Author: Yuji Kawaguchi Susumu Zaima Toshihiro Takagaki Kohji Shibano Mayumi Usami
Publisher: John Benjamins
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Discourse Analysis
Text/Corpus Linguistics
Language Acquisition
Subject Language(s): Dutch
Frisian, Western
French, Old
Issue Number: 16.3263

Date: Wed, 2 Nov 2005 17:40:53 +0700
From: Viatcheslav Yatsko
Subject: Linguistic Informatics - State of the Art and the Future

EDITORS: Kawaguchi, Yuji; Zaima, Susumu; Takagaki, Toshihiro;
Shibano, Kohji; Usami, Mayumi
TITLE: Linguistic Informatics - State of the Art and the Future
SUBTITLE: The first international conference on Linguistic Informatics
SERIES: Usage-Based Linguistic Informatics 1
PUBLISHER: John Benjamins
YEAR: 2005

Viatcheslav Iatsko, Department of English, Katanov State University of

This book is a collection of papers presented at the international
conference held at Tokyo University of Foreign Studies (TUFS) in
December 2003. Before giving a detailed description of the books
content, I'd like to say a few words about "linguistic informatics", since
one of the organizers of the conference claims that this is a "new
synthetic field" (p. 3). When a scientist claims the emergence of a new
subject field he is supposed to give some evidence of its existence;
such evidence in T. Kuhn's (1962) terms may be: theoretical notions
and/or laws/paradigm underlying the field, and common
methodologies employed by the members of new scientific community.
Nothing of the sort can be found in the book that presents a motley
collection, separate papers which may be of interest to experts in
various fields, such as computational linguistics, corpus linguistics,
and applied linguistics.

The book opens with a welcoming speech by S. Ikehata, the President
of Tokyo University of Foreign Studies, in which he tells that TUFS
engages in education and research activities in over 50 languages,
cultures and societies all over the world. The University has
introduced a double-major system that requires that students should
specialize in both a foreign language and a discipline-related course.
TUFS started a Usage-Based Linguistic Informatics project supported
by a grant subsidy from the Japanese Ministry of Education, Sports,
Cultutre, Science, and Technology.

I can note that the double-major system is in line with recent
developments in foreign language education. At Russian universities
such system has been introduced since 2002 when Moscow State
Linguistic University, the leading authority in the field, developed a
syllabus for the specialty "Theoretical and Applied Linguistics" that
involves the study of two foreign languages as well as programming
languages and other subfields of computer science. Thus students
can master foreign languages and acquire substantial programming

S. Ikehata's address is followed by an introductory paper
entitled "Center for User-Based Linguistic Informatics". The author,
Yuji Kawaguchi, outlines activities of TUFS and touches upon the
structure of linguistic informatics. Linguistic informatics is considered
to be a synthetic field resulting from integration of theoretical and
applied linguistics on the basis of computer sciences. TUFS's activities
are based on modularized view of language, according to which each
language unit is composed of four relatively independent modules:
pronunciation, dialogue, grammar, and vocabulary. These modules
are planned to be implemented on WWW to provide Web-based
language education.

It should be noted that the term "linguistic informatics" as well the idea
of modularized language teaching is in no way new. In 1996 I
published a paper (Yatsko, 1996) where I suggested using the
term "linguistic informatics" to denote the subject field that deals with
problems of automatic text summarization, automatic information
retrieval, citation clustering, and hypertext technologies. All these
subfields can be considered parts of one and the same domain
because they employ the same methodologies, such as normalization
algorithms (e.g. stemming); lexicographic techniques; frequency
techniques. The domain of linguistic informatics is united by the same
theoretical assumptions, methodologies and laws, such as Zipf's law
and Bradford's law of scattering. These characteristics distinguish
linguistic informatics from other domains, including computational
linguistics. Though the editors carefully avoid using the
term "computational linguistics", it is this term that suits best of all the
contents of the book under review.

As for modularized language teaching, it has been used at Russian
universities since the 1960s. Students who specialize in foreign
languages have separate classes for phonetics, speech practice,
grammar, and home reading. This approach has proved to be
effective and works well for foreign language learning. The idea to
make language materials available on the Internet is not new either.
The Internet abounds in such materials. I would strongly recommend
to Japanese colleagues that they should read about TelNex network
implemented in Hong Kong. The network, described by Q. G. Allan
(2002), provides English language learners and teachers with access
to large data bases containing information about English grammar and
usage, and teaching materials.

The rest of the papers in the book are grouped into 5 sections:
Computer-Assisted Linguistics, Corpus Linguistics, Applied Linguistics,
Discourse Analysis and Language Teaching, and TUFS Language

The first section "Computer-Assisted Linguistics" comprises 6 papers.
1. "One or Two Phonemes: /ø/ - /u/ in Old French, /s/ - /z/ in Dutch
and Frisian - New Solutions to an Old Problem". The authors, P. van
Reenen and A. Jongkind, resort to a complex statistical and
probabilistic analyses to provide data about phonemes indicated in the
title of the paper. The background for the analysis of French
phonemes was the fact that in Old French poetry such words
as 'dolur' and 'amur' were often used as rhyme words at the end of
lines though in Modern French they do not rhyme representing two
different phonemes - /ø/ and /u/. This means that either Old French
didn't distinguish between the two phonemes, or rhyme in Old French
poetry was not perfect. The authors analyzed several corpora of Old
French poetry to provide convincing evidence that in one part of Old
French speaking are the was no differentiation between /ø/ - /u/; in the
other part of France the poets who were aware of the difference could
respect it or not. Regional differences are also important for the
opposition /s/-/z/ in Flemish Dutch (the area of Belgium where Dutch is
spoken), Dutch-Dutch (spoken in the Netherlands) and Frisian. The
authors analyzed various diachronic and synchronic corpora to find
out that opposition between the phonemes was introduced in Dutch-
Dutch in about the 14th century. It became well established in Flemish
Dutch, has never been observed in Frisian and is disappearing in
present day Urban Dutch. Thus the authors managed to detect some
systematic patterns thanks to the use of contemporary computer

2. "The Lexicon Grammar of French Verbs - A Syntactic Database".
The author, Christian Lecler, describes lexicon grammar of French
verbs developed at Laboratoire d'Automatique Documentaire et
Linguistique. Currently this grammar comprises 60 tables with 15 000
entries with syntactic, semantic and distributional characteristics of 5
000 lexical "simple" French verbs. Each table presents a group of
verbs sharing the same "defining property" i.e. essential syntactic and
semantic characteristics that make up a frame in which given verbs
are used. Apart from the defining property each table lists important
combinatory characteristics for each verb, such as prepositions, noun
complements and their semantic features (e.g. human, non-human).
The authors managed to successfully combine methodologies of
componential analysis and generative grammar to obtain substantial
research results that may be of interest to many grammarians. A
drawback of the paper is clumsy classification of verbs into simple,
support, and compound. In linguistics simple words are traditionally
opposed to derivative and compound according to morphological
criteria. The authors actually use semantic criteria; "simple" verbs in
their terms are lexical verbs, support verbs are desemantized verbs in
composite predicates (Iatsko, 2003a), and compound verb are
desemantized verbs in set expressions.

3. "A Formal Analysis of Spanish Adjective Position". The author, M.
Miyamoto, tests the hypothesis that the syntactic position of Spanish
adjectives depends on their length and the length of modified nouns.
The author employs the following methodology that may be of interest
to the experts in the field. 1. Make up lists of adjectives and nouns
extracting them form dictionaries. 2. Assign to adjectives and nouns
tags to denote part of speech, number of syllables, and accent
position. 3. Create a corpus by extracting adjectives and nouns from
natural language texts. 4. Process the corpus by statistical methods.

As a result the author got frequency distributions for combinations of
short noun + long adjectives, long noun + short adjective, adjective +
noun of the same length and vice versa. Having compared newspaper
corpus with a corpus of spoken Spanish, Miyamoto came to the
conclusion that adjectives of one or two syllables are more frequently
preposed than postposed while adjectives of three or more syllables
are mostly postposed; classifying adjectives are generally postposed. I
wish the author had done a more profound analysis of syntactic
positions of different semantic classes of adjectives. Another
opportunity is contrastive analysis. For example Russian, prima facie,
exhibits the opposite characteristic: longer adjective tend to be

4. "On the language of Portugese 'Estoria do Muy Nobre
Vespesiano'" - Linguistic Change and its Documented Evidence Based
on the Corpus Study" by N. Kurosawa. This paper has 2 essential
faults that diminish its scientific quality.
1) The author doesn't state the aim of his research. The paper opens
with the description of a prose text written in medieval Portuguese and
different studies of this text and then proceeds to the description of
some patterns of phonemic change in Portuguese. The goal of the
research is not clear as well as its correlation with the previous
2) Unlike the author of the previous paper, Kurosawa doesn't describe
the methodology of his research. On the 6th page of the paper the
author mentions that the Portuguese book was converted to an
electronic format and processed by a concordance program. Neither
the methodology nor the aim of the processing is explained. And I
think the analysis of the corpus consisting of one text cannot provide
reliable information about language change.

5. "Analysing Texts in a Specific Domain with Local Grammars" by T.
Nakamura. This paper touches upon some problems of automatic
discourse analysis and recognition. The author conducted substantial
linguistic analysis to reveal semantic and syntactic structures of
sentences in a corpus of 560 reports about stock exchange from the
French daily 'LeMond'. The analysis is based on the detailed
description of sentence patterns that include variation predicates
(e.g. 's'apprecier', 'chuter') and their arguments. Basing on this
analysis the author constructed a local grammar that made it possible
to recognize 22% of syntactic constructions in the corpus. This
research constitutes a good foundation for developing machine
translation systems and systems of automatic discourse analysis and

6. "Multivariate Analysis in Dialectology - A Case Study of the
Standardization in the Environs of Paris" by K. Yarimizu, Y.
Kawaguchi, and M. Ichikawa. This paper is an example of
dialectometrical analysis of the corpus "L'Atlas Linguistique et
Ethnographique de l'Ile-de-France et de l'Orléanais" by means of two
methods: cluster analysis and multi-dimensional scaling. The paper is
richly illustrated with maps showing directions of standardization,
correlation between geographical distribution and dialect distribution,
and diachronic features of standardization basing on two types of
synchronic data - standard language preference data versus non
standard language preference data.

The next section of the book entitled "Corpus Linguistics" comprises
four papers.
1. "Corpora of Spoken Spanish Language - The Representativeness
Issue" by F. Moreno-Fernandes. The paper is a review of existing
Spanish corpora. The author gives extensive lists of Spanish corpora
created for the development of speech technologies and linguistic
study of spoken language, describes requirements for them, and
some of their faults. I wish he had also paid attention to their
architecture, methods of annotation, and characteristics of use
interfaces to make the analysis more profound.

2. "Methods of 'Hand-made' Corpus Linguistics - A Bilingual Database
and the Programming of Analyzers" by H. Ueda. The author describes
a methodology for integration of functions of MS Word and Excel into
processing of a bilingual corpus. He also demonstrates the possibility
to create simple tools to extract collocations of key words using Visual
Basic for Applications language integrated into all Windows versions.
The paper is supplied with macro codes that provide algorithms for
developing such tools. As the author correctly remarks there is always
a choice: to use existing tools to process corpora (e.g. various
concordances) or to develop one's own tools. I agree with the author
that the second option may be preferable in many cases. I can't help
agreeing that it's important for foreign language students to acquire
programming skills.

3. "Multilateral Interpretation of Corpus-based Semantic Analysis - The
Case of German verb of movement 'fahren'" by Y. Muroi. The author
analyzes frequencies of occurrence of different arguments of the
German verb to come to the conclusions that, depending on the
semantic and syntactic structure of the sentence, the emphasis may
be on Goal argument, Path argument, or on the human subject. The
essential fault of this paper is lack of methodological scheme for the
analysis. The author just states that the given argument is Path, or
Goal, or Source without giving any interpretation of to these thematic
roles. If Muroi had consulted numerous works on case grammar he
would have found out that various authors suggest different
inventories of thematic roles and different interpretations of separate
thematic roles. For example Cook (1998) distinguishes between 5
roles and Brinton (2000) between 14 roles. Brinton (2000) and Van
Valin (2001) both distinguish the Theme role but their interpretations
of this role differ. Muroi should have analyzed these conceptions to
explain what scheme of analysis he applies. But the impression is that
the author doesn't suspect of the existence of case grammar domain
since there isn't a single reference to it in the paper. The paper is also
a super-ambitious attempt to revise Saussurean theory: "According to
Saussure (1978)", writes Muroi, "the difference is the principle
constructing the structure of language. The revised concept
introduced here assumes that the principle is not restricted to the
structural level, namely to semantics, but is to be applied to pragmatic
processes" (p. 177). This statement seems strange, to put it mildly. It
is common knowledge that semantic and pragmatic approaches to the
study of language are in complementary distribution; there is nothing
to revise.

4. "Tools for creating Online Dictionaries Judeo-Spanish - A Case
Study" by A. R. Tinoco. The paper describes a methodology for
creating an online dictionary. Since Judeo-Spanish speakers are
scattered all over the world a Web interface was created to collect
data and support collaboration between members of the research
group living in different countries. The architecture of the whole
distributed system (called LAMP) includes Apache Web server,
MySQL relational databases, and PHP for interface and scripts. This
system was used to provide access to and to process a corpus of
900.000 words in Judeo-Spanish. Currently there are three
incomplete bilingual dictionaries on line: Judeo-Spanish - Spanish,
Judeo-Spanish - English, and Judeo-Spanish - Turkish. This project is
an example of coordinated group research via the Internet.

The next section of the book entitled "Applied Linguistics" comprises 5
1. "Socio-pragmatic Aspects of Workplace Talk" by Janet Holmes. This
paper is a result of seven years' research of talk in New Zealand
workplaces. Drawing on the database that comprises more than 2500
interactions the author discusses two aspects of workplace
interaction: the importance of small talk and humour at work, and the
speech act of refusals. The author considers factors influencing
discursive strategies of speakers, such as gender, relative status, the
degree of their personal acquaintance. An interesting piece of the
paper is attempts to conduct contrastive analysis of discourse
strategies of people belonging to different cultures. A special section
of the paper deals with methods for integrating the results of the
research into teaching English as a foreign language.

2. "What Do We Mean by 'second' in Second Language Acquisition"
by D. Block.
The author conducts a componential analysis of the term "second
language acquisition" (SLA) assigning to it such semes as +/-
classroom (SLA in classroom setting or in naturalistic setting) and +/-
language in the community (foreign language community vs. native
language community). The rest of the paper is devoted to a review of
different approaches to and interpretations of SLA. Finally the author
comes to a conclusion that the term "second" is inappropriate,
misleading and must be replaced with the term "additional", i.e.
additional language acquisition.

I can't agree with the author on this point. Of course the meaning of
the term "SLA" may be ambiguous when it is taken out of context, but
when used in a specific research work it acquires a meaning assigned
to it by the author of the work. I personally (basing on Block's
componential analysis) would distinguish between second language
acquisition, second language learning, and second language teaching
as subclasses of a generic term "second language education". Since
the term "acquisition" doesn't imply conscious efforts on a person's
part, its meaning can be restricted to "mastering of a nonnative
language in the environment, in which that language is spoken";
second language learning is a self study process that presupposes
conscious efforts on learner's part (e.g. using CD programs); second
language teaching takes place in a classroom environment. Second
language education differs from foreign language education that can
be interpreted as mastering of a nonnative language in the
environment of one's own language and that can be in its turn divided
into foreign language teaching and foreign language learning.

I think one's aim should bed to specify meanings of existing terms
rather than inventing neologisms that are very unlikely to be accepted
by a linguistic community.

3. "Integrating Applied Linguistics Research Outcome into Japanese
Language Pedagogy - A Challenge in Contrastive Pragmatics" by S.
Nishihara. The paper describes research based on interviews taken
from 1) the Japanese who worked in 6 foreign countries; 2) foreigners
working in Japan. The informants were asked to watch videos on 6
different topics and then were suggested selecting a variant of their
verbal response in case they were in the same situation as a
character in the video recording. This methodology seems interesting
but the author's conclusions are unsubstantiated because he doesn't
give any information about the number of informants and database
size. The paper lacks any methodology for integration of research
results into language teaching.

4. "Computer Assisted Language Learning (CALL) - Moving into the
Network Future" by M. Peterson. This paper is a review of
contemporary network technologies that enable synchronous
interaction between users. The author outlines advantages of Internet
relay chats, multiple user object oriented domains (MOOs), and virtual
realities technologies to come to a conclusion that participation in
network based learning engineers a major shift in classroom dynamics
from the traditional teacher-lead view of learning toward a learner
centered model. The role of the teacher in the online classroom is
transformed to that of facilitator.

Agreeing with that conclusion I must remark that computer assisted
language learning (CALL) is not restricted to network technologies. An
important part of CALL is computer technologies that can be used in
classroom to facilitate interaction between the teacher and the learner
and that can be integrated into existing curricula. An example is a
semi-automatic text summarization system integrated into Text Theory
course described in Yatsko et al (2005).

5. "Beyond the Novelty - Providing meaning in CALL" by M. H. Field.
This paper is in line with the previous one. The author, a lecturer in
English at a Japanese university, describes his experience of creating
a learner centered environment by means of a so-called Bulletin
Board accessible via the university's network. Actually, the Bulletin
Board was a non-interactive chat, where students could discuss
issues suggested by the lecturer and communicate with each other.
The lecturer didn't correct their mistakes to control the learning
process. Then some issues were discussed in class. The
questionnaire poll conducted at the end of the academic year made it
clear that most of the students believed that interacting on the Bulletin
Board helped them learn and use language in other situations. Thus
the authors seems to have reached his goal of preventing the
students to regard computer technologies as toy that at first is given
much attention to be gradually neglected later.

While appreciating Field's efforts, I'd like to draw the author's attention
to another (opposite) way of solving the problem when students are
compelled to learn and use computer technologies. Such approach is
realized in the TITE (translation in teaching English) system developed
at my laboratory. The system stops at every mistake made by the
student and all translation is deleted if the student fails to keep to the
time limit. So the student has to resume the translation again and
again until he/she learns most of it by heart to get credit in the course.
I think the best solution is to combine a friendly learner centered
environment for out of class activities with compulsory use of computer
technologies in class. Of course much depends on cultural traditions.
Compulsory approach taken for granted by Russian students may turn
unacceptable for learners in such highly democratic countries as the
USA. I am not sure about Nippon; to the bets of my knowledge it had
been a totalitarian state for a long time and compulsory approaches
may be applicable there as well (it's a mere conjecture , of course).

The next section of the book "Discourse Analysis and Language
Teaching" contains two papers. Both are focused on contrastive
analysis of three databases: "Talk That Works" (TTW) - a video
communication training kit based on the findings of the "Language in
the Workplace Project" described in Holmes's paper (see
above); "Dialogue Module" (D-Module) developed at Tokyo University
of Foreign Studies (TUFS); "Japanese 2 by Basic Transcription
System for Japanese" (BTSJ) also developed at TUFS. TTW contains
authentic English conversations; D-Module comprises non authentic,
constructed dialogues in 17 languages; BTSJ contains discourse
samples of authentic Japanese conversations.

1. "Why Do We Need to Analyze Natural conversation Data in
Developing Conversation Teaching Materials" by M. Usami. This
paper deals with contrastive analysis of TTW, the Japanese section of
D-Module, and BTSJ. Having analyzed 7 most frequent functions
(e.g. 'Asking for Information' or 'Giving a Reason') in TTW data the
author revealed that a given function may be realized in discourse
together with corresponding linguistic form (type 1), or without them
(type 2), or linguistic forms may be present in discourse, the function
being not realized (type 3). Then the author analyzed the same
functions in BTSJ to find out that most of the functions are realized
with corresponding linguistic forms. After that Usami compared
realizations of requesting speech act in BTSJ telephone conversation
recording and in the Japanese section of D-Module. Authentic
conversation turned out 1) to be longer because of extensive use of
parenthetical phrases and repetitions, 2) in authentic conversation
one linguistic form can manifest different linguistic functions.

Assessing this paper I must note again that it lacks quantitative data
without which author's conclusions are not substantiated. For example
the author writes that in 73.9% of examples extracted from BTSJ
discourse functions are realized with a corresponding linguistic form
(p.283). This statement is pointless because the author gives
quantitative data neither about BTSJ's size nor about the number of
extracted examples. And the paper is full of such pointless
statements:"...the function 'asking for information' is realized
frequently without a corresponding linguistic form"
(p.283); "...'requesting' is a very common function which occurs
frequently..." All these statements should have been substantiated by
exact figures characterizing frequency of corresponding phenomena.

The description of correlation between discourse functions and
linguistic forms is superficial. The author states that a linguistic form
may be used in discourse while the function is not realized by a
corresponding form. I personally share the opinion of representatives
of Prague Linguistic School according to which linguistic form and
linguistic meaning are inseparable. If there are linguistic forms in
discourse they must have a meaning, and must be associated with
some function. Perhaps this function is not the one expected by the
author. In Iatsko 1998a, 1998b I distinguished three types of
correlation between deep and surface structures of discourse: 1)
Deep structure is manifested in surface structure by corresponding
lexical and grammatical units (correspondence between deep
structure and surface structure); 2) deep structure is not manifested in
surface structure (inexplicability of deep structure); 3) non-
correspondence, contradiction between surface structure and deep
structure that takes place when the meaning of lexical and
grammatical units in the surface structure contradicts the nature of the
deep structure.

2. "An Analysis of Teaching Materials Based on New Zealand English
Conversation in Natural Settings - Implications for the Development of
Conversation Teaching Materials" by T. Suzuki, K. Matsumoto, M.
Usami. The paper focuses on contrastive analysis of TTW and English
section of D-Module to investigate how discourse functions featured in
the D-Module are realized in TTW and to seek implications for the
development of conversation textbooks. The authors selected 7
discourse functions (asking for information, stating an opinion, making
a comparison, giving a reason, giving a direction, giving an example,
giving advice) and analyzed their distribution and distribution of
corresponding linguistic forms in the TTW corpus of 21 conversations.
It was revealed that functions that presuppose visual perception of
the object of conversation (e.g. 'asking for information') are used more
often without corresponding linguistic forms while functions that
presuppose mental activity (e.g. 'giving a reason') are more often
accompanied by linguistic forms. The authors reasonably conclude
that form-function mappings must be taken into account in teaching

The last section of the book entitled "TUFS Language Modules"
comprises two papers.
1. "The Creation of TUFS Pronunciation Module" by T. Kigoshi. The
paper deals with the pronunciation module accessible via the Internet
and designed for Japanese-speaking learners of foreign languages.
Currently the module supports 11 languages but the total number of
languages is planned to be 17. Each language section of the module
consists of 4 parts. Studying the introductory part learners familiarize
themselves with the sounds of the target language. Part 1
entitled "For Survival" enables learners to read words, phrases and
sentences. Part 2 "For Smooth Communication" is aimed at improving
listening comprehension. Part 3 "To Master the Pronunciation"
enables learners to acquire the feel of the target language. The
learners can choose with which part to start depending on their
purpose. Functioning of the Pronunciation Module is exemplified by its
Spanish section. Its introductory part contains a five line Spanish
poem; Part 1 comprises 22 units, each dealing with separate Spanish
phonemes; Part 2 consists of 5 units focused upon prosodic features
of Spanish; Part 3 has 16 units focused on combinatory features of
Spanish phonemes and contrastive analysis of some Spanish and
Japanese sounds.

The paper leaves open some essential questions. 1. The author didn't
say a single word about the Web interface used by learners to access
the system. 2. The author didn't describe the effectiveness of the
Pronunciation Module, its impact on learners' skills. 3. The paper is
illustrated with one test. There is no systematic description of
assignments and exercises given to learners. For example,
contemporary systems distributed on CD widely employ speech
recognition programs to assess learners' pronunciation. Does the
Pronunciation Module provide learners with this opportunity? Perhaps
the Pronunciation Module is really a great achievement, but the author
failed to prove that.

2. "Development and Assessment of TUFS Dialogue Module -
Multilingual and Functional Syllabus" by K. Yuki, K. Abe, and Ch. Lin.
The paper deals with process of construction of Dialogue Module that
currently comprises materials for 17 languages: English, German,
French, Spanish, Portuguese, Russian, Chinese, Korean, Mongolian,
Indonesian, Pilipino, Laotian, Cambodian, Vietnamese, Arabic,
Turkish, and Japanese. The materials of each language have 40
lessons, each having one dialogue and concentrating on one target
function. Each dialogue has two interlocutors and is supplied with
explanations of the vocabulary, grammar and exercises.

While developing the Dialogue Module the authors adopted a
functional approach and conducted a large scale research to create
an inventory of 40 functions. Then they conducted a questionnaire
poll among persons who were assigned to write dialogues and
arranged functions in the order of priority. Thus they managed to get
reliable descriptions of 40 discourse functions included in the Dialogue

The book ends with "Concluding Remarks" by Y. Kawaguchi, leader of
COE (Center of Excellence) program launched by Japanese Ministry
of Education , Sports, Culture, Science, and Technology that provided
financial support to TUFS Language Modules project.

In conclusion I'd like to point out the main drawbacks that prevent me
from giving a positive evaluation of the book.

1. Strong inclination to applied aspects of research, underestimation
of its theoretical foundations. The editors of the book failed to provide
theoretical and methodological background for "linguistic informatics";
their declaration of the emergence of new subject field is empty of
content. Since the editors included the term "informatics" in the name
of new discipline they are supposed to apply general methodologies of
informatics, such as architectural specification and functional
specification of studied information technologies. Only 3 papers (by
Miyamoto, Nakamura, and Ueda) have specifications of algorithms.
The authors of papers devoted to TUFS language modules provide
neither architectures nor algorithms to specify functioning of these
modules. Authors of some papers are using methodologies of case
grammar and speech act theory without displaying any knowledge of
theoretical works in these domains.

2. The book is badly and carelessly edited and doesn't conform to
internationally recognized editorial practices. Two years ago I
reviewed "Computer Lerner Corpora, Second Language and Foreign
Language Teaching" (Iatsko, 2003b) also dealing with a new subject
field - computer learner corpora research. In the introductory paper, S.
Granger, one of the book's editors, gave a detailed description of the
subject field, described its methodologies and theoretical notions.
Each section in the book was preceded by a summary as well as each

The book under review lacks these distinctions of a high quality
research work. Only two papers (by Holmes and Block) have
abstracts. Section titles are given in the table of contents but are not
found in the body of the book. Reviewing of papers submitted for
publication in the book cannot have been organized properly, since
some obviously weak papers were accepted for publication.

I was surprised by low graphic quality of the book, not characteristic of
such esteemed publishing house as John Benjamins. Maps in Van
Reen and Jonkind's paper are completely unreadable. The reader is
also baffled by enormous blank spaces interrupting texts of papers,
for example p. 107 has only 9 sentences at the top; the rest of it is
white space.


V. Iatsko (last name also spelt 'Yatsko') is a full professor in the
Department of Information Technologies and Systems, part-time
professor in the Department of English and Head of Computational
Linguistics Laboratory at Katanov State University of Khakasia located
in Abakan, Russia. His research interests include automatic text
summarization and information retrieval, text grammar, computer-
assisted FLT, contrastive analysis of English and Russian syntax,
corpus linguistics.