"In this book, Richard Kern explores how technology matters to language and the ways in which we use it. Kern reveals how material, social and individual resources interact in the design of textual meaning, and how that interaction plays out across contexts of communication, different situations of technological mediation, and different moments in time."
Review of The Handbook of Phonetic Sciences, 2nd Edition
The second edition of the ‘Handbook of Phonetic Sciences’ covers five main sections in phonetic research: experimental phonetics (Part I), a survey on biological perspectives in phonetic research (Part II), models of speech production and perception (Part III), linguistic correlates of phonetic research (Part IV), and a synthesis of the main speech technologies available nowadays (Part V). Across these five parts, a total of 22 chapters were written by experts in each sub-field.
The first part (Experimental Phonetics) is comprised of four chapters dedicated to techniques, measurements, and instruments in phonetics research. Maureen Stone (Laboratory Techniques for Investigating Speech Articulation) offers a survey of methods for investigating the oral/vocal tract both directly and indirectly during speech, with considerations given to applications and weaknesses and strengths of each method. One of the most widespread methods of studying what happens in the mouth during speech is electopalatography (EPG), which basically uses small electrodes embedded in a pseudopalate. This method allows the researcher to see what happens in the mouth in real time, and provides useful information about tongue motion as well as tongue-palate contact. EPG can provide good information for lingual consonants, but not for those sounds which do not involve tongue-palate contact (e.g. vowels). In any case, EPG is especially used for studying tongue movements in different languages and in speakers with speech disorders.
Christine H. Shadle (The Aerodynamics of Speech) reviews the basic concepts of static and dynamic fluid, and applies these concepts to the aerodynamic description of vocal tract behaviours (e.g. breathing, frication, and so on). The author also explores the main methods for measuring pressure in the vocal tract during phonation. The manometer is the basic instrument of studying pressure variation in the vocal tract, but other indirect and medical procedures have been used by scholars (e.g. tracheal puncture). The major disadvantages of such methods are that they are very invasive and require medical supervision in order to carry out an experiment.
Jonathan Harrington (Acoustic Phonetics) examines the basic concepts of experimental phonetics, in particular, as related to differences between vowels and consonants in spectrographic representations and spectral shape. Concerning vowels, the author concentrates on formant analysis, while also taking into account the whole-spectrum approach, which argues that “vowel identity is based on gross spectral properties such as auditory spectral density” (p. 89).
In the last chapter of the section (Investigating the Physiology of Laryngeal Structures), Hajime Hirose takes into account the very latest techniques available for observing the larynx during speech in order to understand laryngeal adjustments in different phonetic conditions. A general difficulty with these techniques is providing high quality images of the vocal tract in addition to a good time-resolution of laryngeal movements. For instance, ultra-high-speed photography was, until recently, one the most widespread systems used to study vocal fold vibration; it provided good resolution images but, on the other hand, forced the researcher to work on a frame-by-frame analysis (p. 132). Other techniques available are, for instance, laryngeal electromyography (EMG), which is also used for the analysis of non-speech gestures such as throat clearing or sniffs. Hirose also points out that research in this particular phonetic field is still attempting to find the best way to describe laryngeal activity through phonation. Moreover, a particular interest has been developed regarding the investigation of the nature of pathological voice production, aiming to find parameters that quantitatively describe “the degree of voice abnormality” (p. 149).
The second section (Biological Perspectives) contains three chapters related to two fundamental apparatuses in speech production: the vocal apparatus and the brain. Janet Mackenzie Beck (Organic Variation of the Vocal Apparatus) analyses differences in speech production in terms of different anatomical characteristics of the vocal organs or different uses of the vocal organs. The author highlights that not only age, but also genetic and environmental factors affect the way people use their vocal apparatus. For instance, the author points out that “given any group of people of the same age and gender, there will still be marked differences in vocal tract morphology” (p. 157) due to genetic and environmental conditioning. Moreover, organic consequences of trauma or disease are another source of variation in the coordination and timing of voice among individuals.
Hermann Ackermann and Wolfram Ziegler (Brain Mechanisms Underlying Speech Motor Control) address the problem of brain organization during speech production through a review of findings related to neural diseases or lesions. One example of such trouble in the organization of speech is the so-called syndrome of apraxia of speech (AOS), which mostly arises from ischemic infarctions within the left hemisphere (i.e. the language-dominant hemisphere). Psycholinguistic experiments have shown that in patients affected by such a disease, the capability of planning speech movements at the level of the syllable may be seriously compromised (Ziegler 2005).
The neural correlates of articulatory movements in speech production are also at the core of Anne Smith’s contribution (Development of Neural Control of Orofacial Movements for Speech). The author presents a vast and accurate review of studies focusing on both central and peripheral mechanisms involved in speech production by taking into account biological differences related mainly (but not limited) to age and gender. Scholars have demonstrated that younger speakers show a high degree of variability in the coordination of speech movements (e.g. lip aperture), and that this variability decreases with age. This pattern has been interpreted as an index of neuromotor maturation.
The third section (Modeling Speech Production and Perception) moves from the organisation of speech production in brain to the issue of modelling speech production and perception. This is the largest part of the handbook, incorporating seven contributions. Barbara L. Davis (Speech Acquisition) reviews past and very recent research paradigms of speech acquisition by contrasting formalist phonological perspectives with functionalist phonetic science perspectives. The main theoretical difference between the two approaches is that from a formalist phonological perspective, every child’s expression underlies an innate system of phonological knowledge, whereas functionalist phonetic perspectives investigate the relation between biology, physiology and the social function of speech forms in the acquisition of phonological patterns. Moreover, Davis also points out that different perspectives influence not only analyses, but also methods of data collection; for instance, small corpora are generally used in formalist perspectives, whereas functionalist research uses a “mosaic of data and information about peripheral subsystem capacities during the process of speech acquisition” (p. 304).
Edda Farnetani and Daniel Recasens (Coarticulation and Connected Speech Processes) explore models analysing connected speech from both a theoretical and an experimental point of view. The authors point out that two general control principles have been repeatedly advocated by scholars dealing with coarticulation: the principle of economy, and the principle of output constraints. The first principle was proposed by Lindblom (e.g. Lindblom 1983), who stated that phonetic variation is “a continuous adaptation of speech production to the demands of the communicative situation” (p. 329). Lindblom also introduced the concept of the acoustic target, which is an ideal spectral configuration in a context-free situation. However, the acoustic target is rarely completely reached; in different speaking situations, speakers adapt their production strategies to approach the target by avoiding or reducing coarticulation. However, the degree of variation is limited by the so-called output constraints, which basically preserve the contrast between phonological units in each language. Thus, the degree of variation in vowel-to-vowel coarticulation is limited by the need to preserve the perceptual contrast between vowels (see also Van der Harst 2011). Given these two general principles, the authors explore the main models that deal with coarticulation in different languages, concluding that for anticipatory coarticulation, “no model in its present version can account for the diverse results within and across languages” (p. 347) because languages differ in anticipatory coarticulation strategies for both the lips and the velum.
Theories and models of speech production are also addressed in Anders Löfqvist’s contribution (Theories and Models of Speech Production), which focuses on how different parts of the vocal tract are accommodated by speakers, and to what degree of control and with how much freedom such accommodation takes place. As a matter of fact, during normal speech production, articulators are synchronized in a proper temporal and spatial sequence to convey the speech signal. When spatial or temporal conditions change (e.g. if speaking rate increases), articulatory movements vary as well in order to maintain a degree of intelligibility.
Christer Gobl and Ailbhe Ní Chasaide (Voice Source Variation and Its Communicative Functions) address the problem of the acoustic representation of phonation, and discuss the role of voice quality in signalling speakers’ mood and attitude. The authors also discuss the sociolinguistic correlates of voice quality, like the identification of a speaker, or differences in either regional, linguistic or social groups based on variation in voice quality. For instance, at the suprasegmental level, tonal languages often show specific voice qualities associated with specific tones, e.g., the fourth falling tone in Mandarin is often associated with creaky voice (p. 407).
Kenneth N. Stevens and Helen M. Hanson (Articulatory-Acoustic Relations as the Basis of Distinctive Contrasts) deeply analyse the acoustic properties of speech sounds by reviewing works within quantal/enhancement theory. Changes in articulators result in changes in the acoustic parameters of sounds produced in the vocal tract, which is captured by quantal theory’s notion that “a feature can be defined by a quantal relation between an articulatory parameter and an acoustic parameter” (p. 429, see Stevens 1972). This means that the acoustic parameters change in a not-monotonic way, and that certain regions in the articulatory space may increase acoustic parameters more than other regions. The authors also explore the acoustic correlates of articulatory variations during phonation, and their perceptual correlates. For instance, the authors analyse variation produced in the acoustic resonator during the production of nasalized vowels, in which the opening of the front cavity is seen as an “enhancing gesture” (p. 442) that affects the formant values of vowels, resulting in reduced amplitudes of formants in nasalized vowels when compared with oral vowels.
Perception and auditory processing are at the core of Brian C. J. Moore’s contribution (Aspects of Auditory Processing Related to Speech Perception). The author takes into account the main features affecting speech perception, and concludes that “speech perception is robust, and resistant to distortion of the speech and to background noise” (p. 454). For instance, during phone calls, the fundamental frequency (f0) of male speakers often gets lost, but hearers are usually able to perceive a low pitch. In this respect, Ritsma (1967) introduced the principle of dominance, according to which hearers select the region in which some harmonics are dominant, even if this region is not absolute.
The last chapter of the section (Cognitive Processes in Speech Perception), by James McQueen and Anne Cutler, addresses the problem of speech perception from a cognitive point of view. The authors analyse how the speech signal is perceived, extracted, mapped, analysed and finally stored as cognitive representations. They focus on the relationship between lexical and prelexical processing, that is, between the recognition of words and the representation of words starting from acoustic input. The degree of interaction between these two features of speech perception has been tested in various ways. For instance, hearers are more precise in distinguishing /d/ and /t/ when they occur in real words. Moreover, the authors show that suprasegmental features influence the ability to discriminate boundaries among words, as demonstrated by Cho et al. (2007).
The fourth section (Linguistic Phonetics) presents five contributions. Janet Fletcher (The Prosody of Speech: Timing and Rhythm) explores suprasegmental features, such as stress and prosody, as related to durational patterns of segments and syllables in various languages. For instance, many studies have investigated differences in rhythm by dividing languages into two main categories: stress-timed languages and syllable-timed languages. Languages in the first group are, for instance, English or Arabic, in which stressed syllables are sources of rhythm that recur at equal intervals of time. On the other hand, syllable-timed languages’ (e.g. Italian or Yoruba) syllables are the source of rhythm even when they are not accentually prominent. This classification has been discussed and criticized, but is still a reference point.
Mary Beckman and Jennifer Venditti (Tone and Intonation) also address speech prosody from the point of view of tone and intonation. The authors provide a vast review of literature concerning both the representation of tone and intonation in phonetic sciences, and the parameters proposed by various scholars in order to reach a formal model of tone and intonation in languages. However, the authors point out that previous studies show many differences in both theoretical orientation and achieved results. Thus, in the concluding remarks of the chapter, Beckman and Venditti ask themselves if it is really necessary to distinguish different linguistic typologies based on features like tone and intonation. In particular, the authors emphasize that in the examples provided in the literature, it is possible to find an equal number of counter-examples. Moreover, it is worth noticing that we have in-depth descriptions of intonation systems of about two dozen languages, which is not much when considering that there are thousands of languages spoken in the world, and thus, a taxonomy or a typology of intonational features is judged as premature (p. 643).
John J. Ohala (The Relation between Phonetics and Phonology) explores how in the last century, phonological paradigms have affected phonetic research by emphasizing the need for both of these fields of research to be informed by the other. The author points out that the split between these two fields was emphasized during the rise of structuralism, even if, historically speaking, it was more common to see the two fields integrated. Ohala then states that “the integration of phonetics and phonology is evident” (p. 664), and that many questions about phonology may benefit from phonetic studies. For instance, the issue of how phonemes are represented in the minds of speakers could be informed by phonetic research on speech processing, as has been fully illustrated in other chapters of the handbook (e.g. Moore’s and McQueen & Cutler’s chapters). Moreover, Ohala points out that phonetic-based analyses may enhance phonological theories of sound change in different languages.
John H. Esling (Phonetic Notation) traces the history of phonetic symbols and diacritics in the International Phonetic Alphabet (IPA) up until the last revised version of the IPA chart of 2005. The author points out that differences in each version of the IPA chart accounted for the debate surrounding articulatory and auditory systems, as well as the structure of the vocal tract during speech. However, vocal tract theories need to be balanced with practical issues, such as avoiding a chart that is too wide. A clear example of this is the decision to include epiglottals in the category “other symbols” in the 1989 IPA chart; although the prevalent theory at that time considered epiglottals as a categorical place of articulation stricture to be placed between pharyngeal and glottal, the editors judged that it was better to not add another column among the pulmonic consonants (p. 688). Thus, phonetic notation is based on the articulatory shape of the vocal tract, which can be reinterpreted thanks to improvements in articulatory and auditory phonetic research. For instance, Esling (2005), among others, has shown that both the larynx and pharynx play a role in the production of “back” vowels, which could be more clearly divided into “raised” and “retracted” (p. 689).
In the last contribution of this section, Paul Foulkes, James M. M. Scobbie, and Dominic Watt (Sociophonetics) explore theoretical and methodological issues in a new field of research, i.e., sociophonetics, whose unifying theme is “the aim of identifying, and ultimately explaining, the sources, loci, parameters, and communicative functions of socially structured variation in speech” (p. 704). The authors discuss the main sources of variation, loci of variation, and then explore some methodological issues related to sociophonetic research. Firstly, the authors emphasize that inter-speaker variation has been judged as “undesirable noise in the data” (p.716) by a large part of phonetic literature. Moreover, they discuss the main loci of variation, that is, which segmental and suprasegmental features may reveal interesting socially structured variation in speech. The authors point out that the majority of studies have focused on vowels (e.g. Van der Harst 2011), whereas there are very few works dealing with consonantal variables. In the remaining part of the chapter, some methodological issues are discussed by emphasizing that data collection in sociophonetic research has “no fixed protocol” (p. 729), since the amount of data and range of samples strongly depend on the main aims of research. Finally, the authors state that the analysis of data in this field is based on fine-grained variation in both production and perception by using statistical analysis to account for complex sources of variation (i.e. linguistic and social variation) that may affect the data.
The fifth and last section of the handbook (Speech Technology) collects three contributions on automated speech processing, synthesis and recognition. Daniel P.W. Ellis (An Introduction to Signal Processing for Speech) first introduces basic concepts related to digital processing and manipulation of the speech signal (e.g. Fourier analysis, spectral analysis, and so on).
Then, Rolf Carlson and Björn Granström (Speech Synthesis) review the main methods of speech synthesis available in phonetic research and discuss the advantages and disadvantages of each method. The most widespread applications of speech synthesis are text-to-speech systems, which are supposed to read aloud a text given as input. The main problem with this is the degree of naturalness and acceptability of the output produced by the synthesizer. In this respect, one interesting solution is provided by articulatory models, which work not only on speech synthesis, but also on the extraction of vocal tract configurations. However, such models need to be integrated with models of speech production based on volumes, masses, and airflow. For the authors, the future of text-to-speech systems lies in the inclusion of these models in a comprehensive system.
Finally, Steve Renals and Simon King (Automatic Speech Recognition) describe the purposes, models and techniques for the automatic transcription of acoustic speech into words. As has been shown in the third section of handbook, speech recognition is a very complex task which must take into account many possible sources of variation. On the other hand, the authors emphasize that automatic speech recognition systems show great improvements annually due to increased computational resources and the availability of transcribed speech corpora. Speech corpora are important in order to train the system with large amounts of data. For instance, Hidden Markov Models (HMMs) are based on very large amounts of data representing a mathematically and computationally clear model. However, these models are based on the unrealistic assumption that each acoustic feature is independent of all past and future observations (p. 810). In the remainder of the chapter, the authors also provide a discussion of models that use acoustic features of the speech signal to reduce errors in speech recognition systems. The main purpose in this respect is to emphasize relevant features while also removing unimportant ones. One example of these models is the very widespread phone model, which is generally based on pronunciation dictionaries as training data; moreover, phone models may be enhanced by adding HMM systems. In the conclusion, the authors point out that there is still a considerable gap between human and automatic speech recognition, and that automatic speech recognition needs to be informed by developments in research on human speech recognition.
The handbook ends with a general index containing notes, figures, and tables.
The paperback edition of ‘The Handbook of Phonetics Sciences’ is not different than the hardcover edition from 2012. Thus, the main differences of interest are between the first edition of the handbook (1997) and the current one; this second edition provides an excellent, updated review of the latest theoretical and methodological approaches to phonetic sciences.
As the editors point out in the Introduction, all chapters were updated by the authors in order to inform the reader of the very latest developments, especially concerning speech technology and phonological theory. Some authors were also asked to offer a new treatment of particular topics, and this resulted in chapters by Harrington (Acoustic Phonetics), Ackermann and Ziegler (Brain Mechanisms Underlying Speech Motor Control), Smith (Development of Neural Control of Orofacial Movements for Speech), Davis (Speech Acquisition), Ellis (An Introduction to Signal Processing for Speech), and Renals and King (Automatic Speech Recognition).
Moreover, new chapters were added to this edition, which are all included in Part IV (Linguistic Phonetics), together with a revised chapter by Ohala. Two commissioned chapters reflect the growing interest in phonetic sciences related to suprasegmental features, which are addressed by Fletcher (The Prosody of Speech: Timing and Rhythm), and Beckman and Venditti (Tone and Intonation). The main strength of these chapters is the discussion of various models used in the analysis of suprasegmental aspects of speech, which is done through numerous examples from different languages (e.g. German, Japanese, and Chinese). Furthermore, Esling’s new chapter on phonetic notation is a very useful and precise update to previous treatises on this subject (e.g. Laver 1994), and also offers helpful advice on the use of phonetic notation, in particular, how to read the phonetic chart. Obviously, this is particularly useful for young scholars, especially for those moving from general to more fine-grained phonetic transcriptions. Finally, the new chapter by Foulkes, Scobbie and Watt (Sociophonetics) introduces this new paradigm by emphasizing its main aim, as well as possible applications of investigating phonetic variation within and between groups of individuals. The authors also emphasize the theoretical implications of sociophonetic research for both sociolinguistic studies and phonological theory (e.g. exemplar based models such as Pierrehumbert 2002). By including this chapter in the handbook, the editors recognize the growing interest in sociophonetics as a branch of phonetic sciences in recent years, and also emphasize that sociophonetics is a rich and promising field of research whose results may positively affect both sociolinguistic and phonetic sciences.
Overall, the main strength of this volume is the high degree of expertise of all the contributors in their respective fields, which is directly reflected in the chapters. Additional useful features include examples from a vast array of languages, as well as the complete lists of references at the end of each chapter. The handbook also fulfills the editors’ main goal of offering an updated and multidisciplinary orientation to the phonetic analysis of speech.
In conclusion, the second edition of ‘The Handbook of Phonetic Sciences’ is an invaluable reference. The clarity of its explanations, its accurate and updated review of theories and methods, and its analysis of both the strengths and weaknesses of each tool at the disposal of researchers will all be of great help to scholars involved in various degrees of speech analysis.
Cho, Taehong & James M. McQueen. 2005. Prosodic influences on consonant production in Dutch: effects of prosodic boundaries, phrasal accent and lexical stress. Journal of Phonetics. 33, 121-57.
Hardcastle, William J., John Laver & Fiona E. Gibbon, eds. 1997. The Handbook of Phonetic Sciences. First Edition. Oxford: Blackwell.
Laver, John 1994. Principles of Phonetics. Cambridge: Cambridge University Press.
Lindblom, Björn 1983. Phonetic invariance and the adaptive nature of speech. In B.A.G. Elsendoorn & H. Bouma, eds. The Production of Speech. New York: Springer Verlag, 217-45.
Pierrehumbert, J. B. 2002. Word-specific phonetics. In C. Gussenhoven & N. Warner, eds. Laboratory Phonology VII. Berlin: Mouton de Gruyter, 101-39.
Ritsma, R. J. 1967. Frequencies dominant in the perception of the pitch of complex sounds. Journal of the Acoustical Society of America. 42. 191-8.
Stevens, Kenneth N. 1972. The quantal nature of speech: evidence from articulatory-acoustic data. In P. B. Denes & E. E. David Jr., eds. Human Communication: a unified view. New York: Mc Graw-Hill, 51-66.
Van der Harst, Sander. 2011. The vowel space paradox. Utrecht. LOT.
Ziegler, Wolfram. 2005. A nonlinear model of word length effects in apraxia of speech. Cognitive Neuropsychology. 22, 603-23.
ABOUT THE REVIEWER:
Chiara Meluzzi is a PhD student in Linguistics at University of Pavia and Free University of Bozen (Italy). After an MA dissertation on the sociolinguistics of Ancient Greek comedy (University of Eastern Piedmont-Vercelli), her PhD thesis provides a sociophonetic analysis of the Italian variety spoken in Bozen (South Tyrol, Italy). Her main research interests include sociolinguistics, sociophonetics, language variation and change, as well as historical linguistics and pragmatics.