This book "supplies a vocabulary of English words and idiomatic phrases 'arranged … according to the ideas which they express'. The thesaurus, continually expanded and updated, has always remained in print, but this reissued first edition shows the impressive breadth of Roget's own knowledge and interests."
Gaustad, Tanja, ed. (2003) Computational Linguistics in the Netherlands 2002: Selected Papers from the Thirteenth CLIN Meeting, Rodopi, Language and Computers: Studies in Practical Linguistics 47.
Announced at http://linguistlist.org/issues/14/14-2102.html
Vittoria Prencipe, Università Cattolica ''Sacro Cuore'' di Milano, unaffiliated scholar.
DESCRIPTION OF THE BOOK
This volume is the result of the 13th CLIN (Computational Linguistics in the Netherlands) Meeting, in 2002.
The book opens with the intervention, in Dutch, of Hugo Brandt Corstius, invited speaker; De desillusie van mijn leven of Remember November is a discussion of the impossibility of Machine Translation.
Of the 18 papers delivered at conference only 10 are published in this book and they are ordered alphabetically by author. The topics of paper are very wide, because of the diversity of current research in Computational Linguistics (CL).
The second paper, Extending a Finite State Approach for Parsing Commas in English to Dutch, by Sebastian van Delden and Fernando Gomez, is focused to identify syntactic dissimilarities of comma usages between English and Dutch using a comma-tagging system. ''This approach combines a set of simple deterministic finite state automata and a greedy learning algorithm to assign descriptive tags to the commas in a sentence'' (p. 25) and it is a necessary component of any finite state partial parser. Testing the system on several Dutch and English corpora, shows that as in English a Dutch comma tagger plays a crucial role in a language processing system, resolving crucial syntactic issues.
The next contribution, Handling Disfluencies in Spontaneus Language Models, by Jacques Dutchateau, Tom Laureys, Kris Demuynch, and Patrick Wambacq treats the automatic recognition of spontaneous speech. This is one of the main topics in speech research, and its practical applications include voice operated telephone services, automatic transcription of meetings, automatic closed captioning of TV programmes, control of handheld devices, and so on. The paper is organized as follows. First, the authors enumerate the obstacles to the accuracy of spontaneous speech; next they describe some experiments in spontaneous language modelling for automatic speech recognition. Then they outline the standard architecture of a large vocabulary spontaneous speech recognition (pp. 41-42) and they explain the problems of spontaneous language modelling and present their research (pp. 42-45); finally they describe the experimental set up and give results on a recognition task (pp. 46-48).
In the first experiment on manipulated context, disfluencies are automatically removed: ''this turned out to be beneficial for repetitions, while having a bad effect on contexts containing hesitations'' (p. 48). In the second experiment on the manipulated and non-manipulated prediction context, the result is disappointing: in some cases, in fact, disfluencies are strongly correlated with lexical choice. Then the authors suggest to combine their model with acoustic- prosodic information and to lead the LM to a more accurate automatic context selection.
In the following paper, Learning to Segment Speech with Self- Organising Maps, James Hammerton presents a new approach, employing self-organising map (SOM), to create an unsupervised connectionist model of speech segmentation. ''The SOM was chosen because it is both biologically plausible and is an unsupervised learner'' (p. 53). The author, primarily describes how the standard SOM operate (pp. 53-55); then he adapts the standard SOM to speech segmentation. The first modification deals with the memory of the SOM. The standard SOM, in fact has no memory; it can map individual inputs and cannot map sequences. The modified SOM proceeds as usual at the start of a sequence, but, when next inputs are offered, the value of the previous input and the pattern representing next inputs are added together until the end of the sequence. Then the author presents a series of experiments (pp. 56-60) and discusses the results. The results are encouraging: the SOM is sensitive to the phonotactic regularities in utterances, and can become sensitive to phonotactics in child- directed speech (as the experiments demonstrated), so it could be applied to the problem of speech segmentation with good results. But ''the modelling of speech segmentation is a field in its infancy''... (p. 55).
The next paper, How is Grammatical Gender Processed?, by Christer Johansson, introduces ''a problem for computational models that process language by learning and using generalizations'': the existence of paradigmatic gaps (p. 65). The author analyses the case of Swedish and Norwegian adjective paradigm; he presents a corpus study and a reaction time experiment. ''The corpus study estimated how exclusive the problematic context is. The reaction time experiment shows that the problematic adjectives have significantly longer decision times than congruent or non-congruent...'' (p. 66). Then the experiment shows that the problematic adjectives are perceived differently from ordinary discongruency and nonsense words. So a ''lazy learner is a more plausible model, as it first stores positive exemplars, and later it may find out that there are non examples of a specific combination of factors, some of which factors may have emerged after exemplars are collected'' (p. 74).
In the next paper, BaseNP Chunking using ILP, Stasinos Konstantopoulos discusses ''the application of Inductive Logic Programming (ILP) to the task of BaseNP Chunking'' (pp. 77). The first part of the contribution (pp. 77-80) is devote to the description of ILP: a program that generates ''knowledge, a hypothesis, within the bounds of a given theoretical framework and prior world knowledge, the background knowledge'' (ibid.). The second part (pp. 80-83) examines text chunking, ''a form of shallow parsing that amounts to identifying non-recursive, non-overlapping constituents chunks in a sentence, without assigning internal structure to the chunks'' (p. 80). Finally (pp. 83-88) the author focuses on the experimental using of ILP to construct a NPBase in Prolog, and reports the results of the experiment (pp. 88-90).
The following contribution, A Dutch Chunker as a Basis for the Extraction of Linguistic Knowledge, by Kristina Spranger and Ulrich Heid, describes the functioning of a ''robust and efficient tool for the extraction of linguistic information from large text corpora'' (p. 93). The authors first take a definition of a chunk and describe two grammars providing a deep syntactic analysis (pp. 94-99); then they describe their model and the three level of Chunking Process: 1. the introduction of most of the lexical information and the building of Chunks; 2. the performance of the main chunking; 3. the check of structures and the building of syntactic hierarchies. Finally they describe a real application of the Chunker (pp. 99-102), and present the results (pp. 102-108).
The following paper, Morpho-Syntactic Agreement and Index Agreement in Dutch NPs, by Frank Van Eynde, focuses on the existence of morphosyntactic agreement and index agreement and on the relation between them. Wechster and Zlatic (2000, p. 508) contend that ''the index agreement does not apply to NP-internal elements such as determiners and adjectives'', but Van Eynde argues that ''this claim is too strong for Dutch'' (p. 112). First he analyses the Dutch NPs and makes a distinction between marked and unmarked nominals proposed in Allegranza (1998) and Van Eynde (2003); then he describes the use of the type head-functor-phrase to model the combination of a noun with its prenominal dependents (p. 113). Then he spells out the details of NP-internal morphosyntactic agreement (pp. 114-121) and discusses two instances of NP- internal index agreement: marked Nouns (pp. 122-124) and Predeterminers (pp. 125-126). The conclusions of this paper are twofold: 1. ''the combination of prenominals with unmarked nominals is subject to morphosyntactic agreement in case, declension, number and gender''; 2. ''the combination of prenominals with marked nominals is not subject to morphosyntactic agreement, bu to index agreement'' (p. 126).
In the next paper, Harvesting Dutch Trees: Syntactic Properties of Spoken Dutch, Ton van der Wouden, Ineke Schuurman, Machteld Schouppe, and Heleen Hoekstra treat the word order phenomena in Dutch. They use in their research the Spoken Dutch Corpus (CGN), a major resource for contemporary Spoken Dutch. The word order in Dutch is relatively free, but in practice this is not really true. ''This paper seeks to investigate in a quantitative way some of the peculiarities of Dutch word order'' (p. 129). First the authors describe the corpus (pp. 129- 130) and introduce some of the tools to explore it (pp. 130-131). Then they present the results of exploration of CGN about syntactic aspects of Dutch (pp. 132-138), particularly the position of the subject and the verb cluster. Naturally only the surface of the possibilities has been scratched and the first results corroborate the assumption that in the unmarked case subjects occupy the first position on main clauses (p. 139).
In the last paper, Improving a Spelling Checker for Afrikaans, Menno van Zaanen and Gerhard van Huyssteen describe the development of an improved spelling checker for Afrikaans. First authors examine the existing spelling checkers for Afrikaans and offer an evaluation of them based on user- friendliness and performance (pp. 144-146). Since the results are not very encouraging, they try to improve the model. Then the authors describe the general architecture of existing spelling checkers (pp.148-149) and the of improved one, consisting in adding morphological information, an n-gram analysis and an error lexicon (pp. 151-152). Finally they discuss the remaining problems (p. 154).
The aim of this book clearing mirroring the diversity of current research on CL and achieved. The single contributions are the result of empirical application of different models, all theoretically founded, so the intended audience for this volume IS a professionals and advanced students. All the contributions are very interesting and well organized but the writing is dense and technical.
ABOUT THE REVIEWER:
Vittoria Prencipe, Ph.D., works as a postdoctoral researcher in the field of Translation Studies at the Università Cattolica "Sacro Cuore", Milan (Italy). Her current research deals with the application of a Sense - Text model to the field of linguistic translation.