Review of  Phonology in Perception

Reviewer: Andrew R Blyth
Book Title: Phonology in Perception
Book Author: Paul Boersma Silke Hamann
Publisher: De Gruyter Mouton
Linguistic Field(s): Linguistic Theories
Issue Number: 21.3465

EDITORS: Boersma, Paul; Hamann, Silke.
TITLE: Phonology in Perception
SERIES TITLE: Phonology and Phonetics [PP] 15
PUBLISHER: Mouton de Gruyter
YEAR: 2009

Andrew Blyth, Faculty of Arts and Design, TESOL, University of Canberra


This volume is the fifteenth in the Phonology & Phonetics series from Mouton de
Gruyter (now de Gruyter Mouton), a collection of nine papers that aim to
contribute to ''the interaction of phonology and phonetics within linguistics''
(back-cover). The publisher acknowledges a 'tumultuous' relationship between
phonology and phonetics, and seeks to engage both fields through academic
dialogue afforded in this series. Though at first this particular volume appears
to focus more on phonology, the theoretical assumptions contributors rely on is
a pragmatic mix of knowledge from both phonetics and phonology. Reassuringly,
some very recognisable names have contributed to the nine chapters, including
Ellen Broselow, Paul Boersma, and James McClelland. For me as a language
teacher, this book provides some comfort for what some of us have felt might be
true: that phonological knowledge assists in perception (p. 19, 60, 103; see
Altenberg 2005, Celce-Murcia, Brinton and Goodwin 1994 p. 10), as opposed to the
common assumption of many linguists that phonological behaviour was influenced
by what was perceivable (back-cover). The editors aim to demonstrate this
concept with these nine papers. Many of the contributors lean heavily on
Optimality Theory as the basis of their work, as well as on Boersma's BiPhon
model described in chapter two.


''Introduction: Models of phonology in perception'', by Paul Boersma and Silke
Hamann, opens the book and the editors clarify the notation for both phonetic
and phonological representations used. They then provide the reader with a brief
historical account of the relationship between comprehension and production,
where in the past, comprehension was assumed to be the reverse of production.
Amusingly, Boersma and Hamann highlight the past lack of interest in
comprehension models by contrasting twenty-three past and present comprehension
and production models, where the comprehension column for the first eight models
was filled in with a small question mark. The editors show Smolensky's (1996,
cited on p. 8) bidirectional grammar model as the first to consider production
and comprehension as separate to one another. Boersma and Hamann conclude with
models representing the perception process as being not the same as the phonetic
articulation process (termed 'phonetic interpretation' and 'phonetic
implementation'), reinforcing to the reader the notion that comprehension is not
the reverse of production.

Chapter one, ''Why can Poles perceive Sprite but not Coca-Cola? A Natural
Phonological account'' by Anna Balas. This chapter attempts to focus on how
Optimality Theory (OT) cannot fully explain perception. It begins with an
anecdote about how, given American English input of /spraıt/, Poles will repeat
back [sprajt]. She then demonstrates that Poles will substitute American English
diphthongs like in Coca-Cola /koʊkәkoʊlә/ with other American diphthongs, rather
than Polish vowel plus glide sequences as would be expected. Balas tests how
Natural Phonology (NP) and OT explain the vowel substitution problem. She first
provides a phonetic description of the phenomena and compares NP with OT. Balas
describes NP-based perception and underlying representations. Balas argues that
NP can account for Polish listeners' perception of American English diphthongs,
and demonstrates how OT cannot. In contrast to OT, which requires the listener
to already have knowledge of the phonological construct of the language being
uttered, NP can deal with ''uncategorised, auditory phonetic input'' (p. 46) and
thus NP accounts for the Polish pronunciation of Sprite and Coca Cola.

Chapter two, ''Cue constraints and their interactions in phonological perception
and production'' by Paul Boersma, has the stated aim of demonstrating ''how one
can formalise the phonology-phonetics interface'' within OT and Harmonic Grammar
(p. 55). Boersma reasserts a tentative model to represent the ecology of
phonology and phonetics within a five-layered system called the BiPhon Model
(Apoussidou 2007, and Boersma 2007). The five levels are ,
|underlying form|, which relate to the lexicon; /surface form/, [auditory form],
are the phonetic-phonology interface; and [articulatory form] (p. 56). Boersma
says that previously phonologists attempted to fit phonetic detail within the
phonological levels (underlying and surface forms), but Boersma suggests that
this should not be the case. The five layered system allows both phonological
and phonetic theories to be represented without compromise, and bidirectionally.
Boersma demonstrates the model using perception of foreign language words,
including loanwords and foreign word perception. Boersma demonstrates Japanese
perception of Russian [tak] ('so'; perceived as 'taku') and English 'drama'
perceived as 'dorama'. He details various alternative perceptions, and explains
the reasons for the failure of these alternatives using the BiPhon model. In
essence, Boersma demonstrates that the perception process is restricted by the
hearer's phonological constraints in the same way as the production process is
restricted by phonological constraints (p. 103).

In chapter three, ''The learner of a perception grammar as a source of sound
change'', Silke Hamann argues that auditory mapping cues and phonological
categories differ from generation to generation. Ohala says that
generational-change is phonetic, though Hamann argues that it involves both
phonetic and phonological knowledge, though perception is phonological
especially 'of auditory cues and their mapping onto language-specific
phonological categories' (p. 111). She argues that this can be mapped on the
BiPhon Model (see chapter 2). Hamann says that sound change of phonemes occurs
due to younger generations assigning different weightings on cues to the
previous generations' weightings. This may be due to some cues being regarded as
less reliable. This would imply that phonological categories are not universal
as some argue, but are 'emerging' (p. 137). Hamann also acknowledges that sound
change does occur within an individual's lifetime.

Chapter four, ''The linguistic perception of SIMILAR L2 sounds'' by Paola
Escudero, attempts to explain native and second language (L2) perception using
the Linguistic Perception Model (LP) originally by Escudero (2005, cited on p.
155), which itself evolved from work by Boersma (1998, cited on p. 155), and
Escudero & Boersma (2003, cited on p. 155). Like in previous chapters, Escudero
uses linguistic arguments to explain this theory. She compares Canadian English
(CE) and Canadian French (CF) and demonstrates how similar but different
phonological categories can be acquired in the L2 for L2 speech perception.
Escudero argues and demonstrates three main points. Firstly, L1 listeners are
optimal perceivers of their own language. Secondly, L2-learner listeners
initially impose their L1 perception categories on the L2. Thirdly, the L2
learner ''adjusts her L1 perception to become an optimal L2 listener'' (p. 184),
which was modelled using both CE and CF. She argues well that the L2LP model is
the most comprehensive account for the acquisition of L2 perception.

Chapter five, ''Stress adaptation in loanword phonology: perception and
learnability'' by Ellen Broselow, attempts to explain loanword adaptation into
Huave (a language from Mexico), and Fijian (a Pacific island language). She uses
Huave and Spanish, and Fijian and English dichotomies to demonstrate how the
respective languages impose their grammar on to the adaptation. The nature of
the adaptations depends heavily on the placement of the stress in the
language-of-origin, and how the borrowing language can adapt the word within
their perception grammar. For instance, the final stressed syllable in 'bazaar',
in Fijian, is perceived as a lengthened vowel, and is adapted in that way (p.
216). Broselow concludes that apparently unlearnable rankings are in fact a
'reflection of input frequency of the working of a perception grammar' (p. 228).

Chapter six, ''Perception of intonational contours on given and new referents: a
completion study and an eye-movement experiment'' by Caroline Féry, Elsi Kaiser,
Robin Hörnig, Thomas Weskott, and Reinhold Kliegl. This chapter describes two
experiments done in German. The first was a sentence completion task testing new
and given referents indicated by accenting. The results show a preference for
completion with 'given-referents', that is, a preference for low tones. The
second experiment is an eye-movement task using the same sample sentences as in
the first experiment. It assumed that test participants could indicate
intonation anticipation by eye movements, as they viewed a picture and heard
intonation laden sentences (p. 251). It was assumed that particular intonational
contours would trigger an expectation in the listener to expect an object (that
appears in the picture) to be included in the remainder of the sentence. Féry et
al. assumed that there would be a fixation on 'discourse-given referent' (p.
251) rather than on the new on unaccented objects. Whereas a late-fall was
interpreted by the listener to expect an accented discourse-new object. The
results of the second experiment indicate listeners were able to respond to
intonational cues and reflect their expectations of the remainder of the
sentence in anticipatory eye-movements across a picture. As a consequence, Féry
et al. argue that this supports Boersma's model that includes phonetic and
underlying representations. This paper concludes strongly stating that listeners
of intonational languages, like German, can use intonation to predict upcoming
sentential constituents.

Chapter seven, ''Lexical access, effective contrast, and patterns in the lexicon''
by Adam Ussishkin and Andrew Wedel, examines Catalan and Hebrew data to
demonstrate that efficient lexical access relies on neighbourhood density, word
frequency, and allomorphy (p. 287). They argue that words like 'orange' have few
words that are phonologically similar, and therefore are recognised more quickly
than words like 'cat', which is also similar to 'mat', 'sat', and 'at'. In the
same way, words that are more frequently used are often recognised more quickly
than less common phonological neighbours. Similarly, the nature of allomorphic
words assist to distinguish the target word from its near neighbours. The
authors acknowledge that this is a relatively new direction of research, but one
that warrants further enquiry.

In chapter eight, ''Phonology and perception: a cognitive scientist's
perspective'', the psycholinguist James L. McClelland reflects on and reviews a
number of questions explicitly or implicitly raised in the preceding chapters.
His enlightening review aids the reader to see cohesion as McClelland attempts
to position this collection within the wider context of speech perception.
McClelland uses the list of questions to guide the reader and explore a number
of pertinent points that are not being explored by other disciplines which also
have an interest in speech perception. McClelland ends the chapter with a call
for a continued effort to describe the structure of languages with
interdisciplinary approaches.


As shown in the chapter descriptions above, this volume represents a variety of
new thought on speech perception from a phonological perspective. This
collection makes clear how the perception process is not the same as nor simply
the reverse of production. This volume elegantly demonstrates this, whilst also
providing valuable new insights into speech perception. The BiPhon Model is the
central element of this book, and forms the basis of some of the theoretical
assumptions in some chapters. The BiPhon model includes phonetic principles
within its design, and looks to cognitive science for some grounding.

This volume was released at a time when there is burgeoning interest into speech
perception from a variety of disciplines. For instance, lexical segmentation as
researched by McQueen, Cutler, Otake, and others at the Max Planck Institute for
Psycholinguistics (see Otake 2006 for a review) is of great interest to some
language teachers and some linguists. Similarly, contributors to this volume
refer to cognitive science researchers especially McQueen & Cutler, though such
citations were limited to a few paragraphs in a few chapters. However, the
contributors to the volume appear not to have looked very widely at research
from the Max Planck Institute. There is a near-plethora of articles produced by
McQueen, Cutler and their colleagues, but this volume mostly referred to only
McQueen and Cutler (1997). Perhaps consequently, some readers may find the
contributors attempts to link their results with the article by McQueen and
Cutler (1997) a little tenuous (see chapters 2, 3, and 6). Though the authors
attempt to find compatibility with cognitive science it would be preferable to
see specific research that explicitly establishes these links much more firmly.

Further, on a number of occasions, some authors refer to unpublished or yet to
be published articles, which perhaps is a reflection of the newness of the ideas
being presented in this volume (see p. 47, 113, 249, 261, 287). It should be of
interest for some readers to note that in the first chapter Balas shows that
Natural Phonology, rather than OT, appears to successfully account for the
treatment of English L2 by Polish L1 listeners, while other chapters use OT.

Despite these criticisms, this volume contains exciting and potentially valuable
new contributions that attempts to expand our understanding of the role of
phonology and phonetics in speech perception. This volume has much to contribute
for not just linguistics, but psycholinguistics more generally, and so concepts
contained in this volume should form the basis of many discussions in future
speech perception studies.


Altenberg, E. (2005) The perception of word boundaries in a second language.
Second Language Research. 21(4), pp. 325-358.

Celce-Murcia, M., Brinton, D., and Goodwin, J. (1996) Teaching Pronunciation: A
Reference for Teachers of English to Speakers of Other Languages. New York, USA:
Cambridge University Press.

Otake, T. (2006) Speech segmentation by Japanese listeners: its
language-specificity and language universality. In M. Nakayama, and Y. Shirai
(eds); P. Li (general editor). The Handbook of East Asian Psycholinguistics,
Volume II: Japanese. New York, USA: Cambridge University Press.

Andrew Blyth is a doctoral student in the TESOL department at the University of Canberra, Australia. His main interests are teaching and researching listening and pronunciation for English language teaching. He currently teaches English as a foreign language at various universities in central Japan.

