Date: Sat, 23 Jul 2005 09:37:51 +0800 From: David Deterding Subject: Introducing Speech and Language Processing
AUTHOR: Coleman, John TITLE: Introducing Speech and Language Processing SERIES: Cambridge Introductions to Language and Linguistics YEAR: 2005 PUBLISHER: Cambridge University Press
David Deterding, NIE/NTU, Singapore
This book is an introduction to two separate but related areas: speech analysis and language processing. It aims to provide a straightforward introduction to these two topics, suitable for readers with some knowledge of phonetics and grammar but little or no background in the computer analysis or manipulation of speech and language, and it provides an introduction to such techniques as digital filtering, linear predictive coding, deterministic and non-deterministic parsing, and Markov modelling of speech.
Most of the computer programs that are discussed in the text are provided in an accompanying CD-ROM, including C programs for signal processing and Prolog programs for parsing, and the reader is encouraged not just to run these programs but to modify them so as to become fully familiar with their structure and operation.
After an introductory chapter outlining the contents and aims of the book, Chapters 2, 3 and 4 introduce some signal processing techniques with illustrative programs all written in C. Chapter 2 deals with the generation of a simple cosine wave, Chapter 3 presents basic digital filters, and Chapter 4 covers linear predictive coding for modelling the spectral characteristics of speech. In all these areas, the presentation introduces the techniques step-by-step, making a commendable effort to explain all aspects of the programs in a style that is accessible to readers with no background in signal processing or computer programming.
In Chapter 5 the focus shifts to the use of Prolog programs to demonstrate the implementation of finite-state machines, in order to parse and also generate phonologically well-formed strings of phonemes in English. Once more, the reader is taken through the example programs line by line, to ensure that even those with no previous knowledge of Prolog can easily understand the code and modify it if they choose.
Chapter 6 covers speech recognition techniques, including dynamic time warping and vector quantization. And Chapter 7 deals with the importance of incorporating probability estimates in finite-state models, including a substantial discussion of the need for probabilistic parsing despite the theoretical objections of many linguists such as Chomsky. Neither Chapter 6 not 7 include illustrative programs, presumably because some of the techniques discussed, such as Hidden Markov Models, would be just too long and complicated for an introductory book, though it is not so obvious that a simple implementation of dynamic time warping would not have been feasible.
Chapter 8 introduces syntactic parsing, with some basic programs written in Prolog for parsing of a very limited set of English sentences. And Chapter 9 discusses the practical issues of incorporating probability into the parsing algorithm, clearly demonstrating that there is no reason why sentences that have never been uttered before should pose a problem for probabilistic parsers, as was once claimed by Chomsky. Finally, at the end of Chapter 9, the implementation of a simple probabilistic context-free grammar is illustrated in Prolog.
One issue with regard to this book can be illustrated by the effort to clarify a single line of code in the first C program that is presented:
x = (short int *) calloc(length,sizeof(short int));
Over half a page (pp. 37-38) is spent carefully explaining that this allocates memory for an array of short integers, but it is unfortunately probably true that many potential readers, even some with a substantial interest in the analysis and manipulation of speech, will find some of this explanation impenetrable.
In fact, for the line of code listed above, the text never actually fully explains what the first part of this line does, that (short int *) ensures the calloc function returns a pointer to a short integer, presumably because it is assumed that going into too much detail about the use of pointers in C is not appropriate for an introductory book on speech processing. But this means that those readers who do not have any problems with the technical aspects of the text might end up frustrated when the whole of the code is not explained.
So, has Coleman got it right, in attempting to explain as much as possible about how the code works but not necessarily going into all the details? I think he has, and the level of detail is about right. One probably needs to accept that it is necessary for readers to run the programs and also manipulate them if they are to gain a reasonable understanding of the material covered in this kind of practical textbook, and if some readers find they cannot cope with the analysis and compilation of the code, well so be it.
Another example of technical details that some readers may find a bit daunting is the discussion of big endian and little endian computers (p. 32). Most of us really do not care how our computers store integers so long as they work fine. So is it really necessary to go into these details about how integers are stored? Well, yes it probably is. If readers are to be able to load speech data into programs and then manipulate the data in various ways, then they probably do need to find out if they are working on a big endian machine (Motorola) or a little endian machine (Intel). So, once more, distasteful as this discussion might be to some readers, Coleman probably has got it right. Indeed, throughout the book, he always makes an admirable effort to present the material in a style and format that is accessible even to those with no background in computer programming, and by and large these efforts are probably highly successful, even if it may be necessary to acknowledge that some readers will not be able to grasp all the concepts.
Coleman makes no claims to expertise in syntax. In fact he admits (p. 223) that he probably knows rather less about syntax than many readers. And, indeed, a few aspects of the syntactic models that he presents are a bit suspect. For example, he adopts a rather traditional generative model of English, with rules such as np --> det, adj, n (p. 232), eschewing the use of determiner phrases that are proposed in many more recent models. But then the first rule is ip --> np, vp, and this use of ip to represent a sentence makes no sense when the sentence includes no inflectional component, i, that can act as the head of the ip. It would have been better here to stick to the traditional use of s to represent the top node of a sentence (as indeed is done in Chapter 9, with no explanation for the switch). But such minor quibbles miss the point: this is not a textbook on syntax. It is an introductory text on signal processing and language parsing, and it presents these topics exceptionally well and very clearly.
Occasionally, gaps remain in the implementation of some techniques. For example, the use of a finite state transducer is described (pp. 144-149) for matching simple sequences of vowels and consonants against stored arrays of linear prediction coefficients, but many readers will wonder how the closest match is computed between a new set of lpc values and the stored data. Although this is (partially) resolved when vector quantization is introduced (p. 179), thirty pages is a bit long to leave readers pondering over this rather central issue. Furthermore with regard to the implementation of the finite state transducer, the simple matching algorithm only mentions vowels and fricatives, and this fails to deal with the obvious issue that plosives are characterised by silence so that the only way /b, d, g/ can be differentiated from each other is by means of their transitions from and to the neighbouring sounds, something which cannot be handled by means of single targets for each phoneme. But once more, maybe this is missing the point: the aim of the book is to introduce a wide range of speech processing techniques in a practical and straightforward manner, not to go into all the details of their implementation. And this it does extremely well, so we should not quibble too much about some minor flaws in the simple implementations, or worry if all of the details are not fleshed out.
Overall, Coleman is to be congratulated on this handsomely produced, easily accessible, fascinating book which many, many students of speech and language will undoubtedly find exceptionally valuable.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
David Deterding is an Associate Professor at NIE/NTU, Singapore, where he teaches phonetics, phonology, syntax, and Chinese-English translation.