This book presents a new theory of grammatical categories - the Universal Spine Hypothesis - and reinforces generative notions of Universal Grammar while accommodating insights from linguistic typology.
Date: Tue, 20 May 2003 14:53:09 +0930 From: "Estival, Dominique" <Dominique.Estival@dsto.defence.gov.au> Subject: Briscoe, ed. (2002). Linguistic Evolution through Language Acquisition
Briscoe, Ted, ed. (2002). Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge University Press. Hardback ISBN 0-521-66299-0, vii+349pp. (This book was announced on LINGUIST List 13.2378, 19 Sept 2002.)
Dominique Estival, Defence Science and Technology Organisation, Australia.
I cannot phrase the book's purpose and contents better than the dust jacket itself: "the volume proceeds from the basis that we should address not only the language faculty per se, but also the origins and subsequent development of languages themselves; languages evolve via cultural rather than biological transmission on a historical rather than genetic time scale. The book is distinctive in utilizing computational simulation and modeling to ensure that the theories constructed are complete and precise. Drawing on a wide range of examples, the book covers the why and how of specific syntactic universals; the nature of syntactic change; the language-learning mechanisms needed to acquire an existing linguistic system accurately and to impose further structure on an emerging system; and the evolution of language(s) in relation to this learning mechanism."
The audience for this volume will primarily be advanced graduate students and researchers in language evolution, language learning, and computational models of language learning and of language change. It would not be suitable as a text-book or an introduction to any of these fields, but could be used as supplementary reading or reference material for advanced courses in these disciplines. The book consists of 10 chapters, with the first being an introduction by the editor, Ted Briscoe, and the last byJames Hurford concluding with a comparison of several systems, including 2 described in this volume (Kirby and Batali). All the chapters, except the Introduction, describe experiments involving autonomous computational agents learning to communicate through some form of language and propose hypotheses concerning the emergence and evolution of language based on the results of those experiments. Following the usual book review format for LINGUIST, I will first summarize the main points of each chapter, before giving an overall evaluation of the volume as a whole. However, my summaries of Ch.1 and Ch.10 also contain more general observations about the other contributions.
1. Introduction (Ted Briscoe)
This first chapter lays out the main themes and issues for the volume. TB emphasises the "centrality of acquisition to insightful accounts of language emergence, development, variation and change" (p.19) and advocates an evolutionary perspective on language. The methodology followed by all the contributors assumes that speakers/hearers are "language agents" and all take languages as dynamical systems, in fact complex adaptive systems. The following quotation from Deacon (1997) sets the scene : this is about language evolution, how this evolution was set in motion and how it is played out every time a child learns to use language. "Languages have had to adapt to children's spontaneous assumptions about communication, learning, social interaction, and even symbolic reference, because children are the only game in town... languages need children more than children need languages". (p.14) One of the main hypotheses explored throughout the volume is that "linguistic universals need not be genetically-encoded constraints, but instead may just be a consequence of convergent evolution towards more learnable grammatical systems." (p.14). Another controversial (cf. Bickerton 1998) hypothesis explored by several contributors is that of genetic assimilation in the evolution by natural selection of the human language learning procedure. This is rejected by Worden (Ch.4) who argues that there is no specific language faculty, and no natural selection for language evolution. On the other hand, Turkel (Ch.8) argues that genetic assimilation is needed to explain convergence on a shared language faculty. In the same vein, Briscoe (Ch.9) proposes a model which integrates Bayesian learning with a Principles and Parameters (P&P, Chomsky 1981) account of language acquisition and in which the language faculty would be refined by genetic assimilation. A related! issue discussed by a number of contributors concerns the effects which a universally shared and preadapted language learning procedure would have on the evolution of language itself. In the section on methodological issues, TB discusses the limitations and benefits of using computer simulation and modeling, and argues that "methodologically rigorous simulation is a critical and indispensable tool in the development of evolutionary dynamical models of language" (p.16).
2. Learned systems of arbitrary reference: The foundation of human linguistic uniqueness (Michael Oliphant)
MO makes a distinction between symbolic and non-symbolic, and between innate and learned, signaling behavior. His main thesis is that "human language is the only existing system of learned symbolic communication" and he explores the computational problems involved in supporting such a learned system of communication, hypothesizing that "maybe the problem of transmitting a learned symbolic system is the primary factor limiting the evolution of the language ability" (p.49). The model he proposes is a learning simulation with no feedback, and no reinforcement, using Hebbian Network (Cumulative Association Networks plus lateral inhibition). Like the other models presented in this volume, it assumes that the agents have access to signal/meaning pairs, but MO is particularly concerned with the question of how the agents can observe meaning. The main conclusion is that it is easier to learn an iconic or indexical system than a symbolic system and that humans must ! be able to use context in order to learn human languages.
3. Bootstrapping grounded word semantics (Luc Steels and Frederic Kaplan)
S&K experiment with a population of "visually grounded robotic agents capable of boostrapping their own ontology [...] while playing a language game". This is called the 'Talking heads experiment', where the robots try to guess what the other robots are trying to communicate (the 'guessing game'). Using what they call 'semiotic dynamics', S&K make a clever use of the distinction between 'meaning' (sense, an internal state of the agent) and 'referent' (the world objects) in the representations their agents use: this is an RMF (referent/meaning/form) graph representation-- instead of the usual meaning/form pairs of most other models. The question of how words get their meaning, an issue also raised in Ch.2. by MO, becomes here the question of how agents acquire word-meaning and meaning-object relations. The solution proposed is that a conceptualisation module creates an ontology from the environment while a verbalisation module creates a lexicon. In the set ! of experiments reported here, the agents have "very minimal forms of intelligence" but usually converge on a shared language. An interesting side effect which is explored is how synonymy and ambiguity arise as emergent properties in the lexicon.
4. Linguistic structure and the evolution of words (Robert Worden) In this very well written and pleasant to read chapter-- no linguist could resist such statements as "The full OED is a museum of (mainly extinct) word species" (p.92)-- RW proposes a theory of language learning formulated in the framework of unification-based grammar, in which the evolution of word feature structures (memes) is driven by the selection pressure of use. In this analogy, "each language is an ecology and each word is one species in the ecology" and "language change can be regarded as a form of evolution - not of the language itself, but of the individual words which constitute the language" (p.75). The only concepts used in this model are those of feature structure, unification and generalization (the complement of unification). The slogan here is "Unify to use, Generalize to learn" (p.80), and RW claims that generalization learning offers an effective solution into the problem of "noisy" input (p.84). From the point of view of the evolution, or emergence, of language, RW claims that "for many features of language, there is no need to suppose that they reflect any innate structure in the brain" (p.76) and that this model leads to a simpler theory of the human mind. He assumes that the feature structures used by the brain for language have evolved from the feature structures used for primate social intelligence and that the learning mechanisms and inference mechanisms are the same for language and for social intelligence (p.89). From the point of view of language change, the hypothesis is that "words replicate via unification and generalization, they evolve over generations; as words evolve, the language changes" (p.91). RW lists 6 selection factors in the evolution of words: useful meaning, productivity, economy, minimum ambiguity, ease of learning and social identification, and proposes that "a language universal may arise just from the convergent evolution of words" (p.92). As an alternative to Hawkins' account (1994) for the rise of language universals, which is attributed to the need for "ease of parsing", RW proposes the "need to minimise ambiguity". However, I was not convinced that this is a case of either/or: the preference for short Constituent Recognition Domains put forward by Hawkins may also lead to better ambiguity resolution, not just to ease of parsing. Nevertheless, RW accounts successfully for the mixture of regularity and irregularity found in languages, and claims to do so better than P&P, for which the 'core' is regular, and irregularity has to be at the 'periphery': in his model, irregularity is fully supported since the theory is fully lexicalized. Three selection factors, productivity, minimum ambiguity and ease of learning lead to an increase in language regularity (p.101) and the theory can also explain the distribution of irregularity (p.103). I found the section discussing speed limits for evolution a bit dry, and too speculative in some places. For instance, statements such as "humans typically produce only about 3 offspring" or "the selection pressure for language proficiency is not more than about 20%" (p.105) leave the reader wondering where those numbers come from. However, it contains interesting speculations about the rate of change for language and the rate at which the language faculty would evolve. The conclusion is that languages change too fast to force the language faculty to adapt. In summary, RW proposes a theory based on the 3 concepts of feature structure, unification and generalization which aims to explain how languages evolve and are learned, i.e. word by word. This model is also shown to account for diverse syntactic means used to define semantic roles, for domains of syntactic regularity and irregularity, and for some language universals. One of the main theses of this paper is that the structure of language reflects the functional requirements for language, not language-specific structures in the human brain (p.103).
5. The negotiation and acquisition of recursive grammars as a result of competition among exemplars (John Batali)
As the title indicates, this paper is an "investigation of how recursive communication systems come to be" and of how such systems can emerge. JB argues that conventional communication systems are learned through negotiation, and that language agents use structure to construct meaning representations. JB makes the strong claim that "the ability to create and manipulate mappings between structures of representations [...] is prior to, and independent of, language" (p.118) ln the model presented here, there are no rules or principles, but learners build 'exemplars' and a communication system emerges through competition between the exemplars. The computational model is based on 'meanings' (internal representations of situations) expressed as formula sets, and paired with 'signals' (sequences of characters), expressed as strings, through 'mappings' expressed as phrases which have numeric cost attached to them. Phrases can be simple tokens or complex phrases (using binary branching). At the start of negotiation, the agents have no exemplars at their disposal, but they have the ability to use embedded phrase structure. They can only communicate with each other when they have negotiated a shared system. An interesting aspect of this paper is the way communicative accuracy is measured, based on precision and recall. However, JB also points out that "the agents' success at communication comes not from their possessing identical sets of exemplars, but from regularities in the structure of phrases constructed from their exemplars" (p.144). Another aspect of the paper which will be of interest to linguists is JB's analysis of the emergent negotiated systems, a "cartoon version of linguistic fieldwork" (p.144), which turns out to reveal some fascinating aspects of those systems. In particular, partitioning variables lead to a rudimentary alignment of syntax/semantics (p.146-147), empty tokens are created by the language agents and used as argument markers and as end markers for sequences of properties in a way which seems strikingly similar to subordination and relative clause syntax (p.151-153), reflexive markers are used to mark the collapsing of arguments, and inverting argument markers function in a way similar to passive (p.154). Finally, although JB calls them "extremely simple" because there is no embedding of meaning and no semantic or syntactic categories, these systems also exhibit some amount of constituent ordering (p.162-163). A study of these negotiated systems also "provides a simple account of how regular and irregular forms can coexist" (p.12). Furthermore, like MO and S&F, JB argues that "learnability depends on the learners having access to the meaning of the string" (p.166). JB concludes that "attention should be directed at understanding the influences that learning mechanisms can have on the languages that emerge from their use" (p.168) and suggests that "a significant fraction of the syntactic phenomena used in human languages can, in principle, emerge among agents with the cognitive and social sophistication required to instantiate the model" (p.170).
6. Learning, bottlenecks and the evolution of recursive syntax (Simon Kirby)
In his model for the evolution of language, SK addresses the interaction of two unique aspects of human language, the way it is learned and its syntactic structure. With respect to learning, SK claims that human language is different from animal communication in that "some of the content of the mapping (from meanings to signals) is learned by children through observation of others' use of language" (p.173). Concerning the structure of language, SK focuses on the two properties of compositionality, whereby an expression's meaning is a function of the meanings of parts of that expression and the way they are put together, and recursion, the property of languages with finite lexica and rule sets in which some constituent of an expression can contain a constituent of the same category. SK's conclusion is that "the basic structural properties of language such as recursion and compositionality will inevitably emerge over time through the complex dynamical process of! cultural transmission, i.e. without being built in to a highly constraining innate LAD [Language Acquisition Device]" (p.174). Unlike other experiments with larger populations of agents reported in this volume, the simulation in SK's set of experiments contains only two agents, an adult speaker and a new learner. At the end of the first cycle, the grammar is non-compositional and non-syntactically structured, but successive cycles lead to the emergence of both internal structure and categories (p.188). SK doesn't point this out, but the emergent categories are strongly reminiscent of Categorial Grammar categories. The next step in the experiment is to allow for infinite languages by including predicates which may take other predicates as arguments. This leads to a dramatic reduction in size of the grammar. SK makes use of the distinction between I-language, the language user's knowledge about language, and E-language, the set of observable utterances. An important concept in this model is that of 'replicators', I-language units or rules that may or may not persist through time. A more general rule, which can express more meaning, is a better replicator than an idiosyncratic rule, even though it is not learned as easily as a more idiosyncratic rule. This explains the success of I-languages which contain general rules. Another important concept in this paper is that of the mapping between spaces, here the space of I-language and the meaning space. SK argues that it is the structure of the mapping between spaces which is important, not the syntactic structures of the language itself. "Constraints on variation are not built into the learner, but are instead emergent properties of the social dynamics of learned communication systems and the structure of the semantic space that the individuals wish to express" (p.196-197). SK claims that his model gives a "neat explanation of why human languages use syntactic structure to compositionally derive semantics, use recursion to express infinite distinctions in a digital way, have words with major categories such as noun and verb, and use syntactic rules of realization (such as ordering rules) to encode meaning distinctions" (p.197) and the sections describing these should be of special interest to linguists.
7. Theories of cultural evolution and their application to language change (Partha Niyogi)
In this chapter, PN looks at language change rather than language evolution, i.e. not how language could have developed phylogenetically but how and under what constraints it can change historically. He addresses the "problem of characterizing the evolutionary dynamics of linguistic populations over successive generations" (p.205) using the framework of Cavalli-Sforza and Feldman (1981) [CF] for the treatment of cultural evolution and map that approach to that of Niyogi and Berwick (1995, 1997) [NB] for language change, with the goal of providing a way in which the P&P approach to grammatical theory is amenable to the CF framework. The comparison between the two approaches of CF and NB is interesting in itself and one result is the observation that the essential difference between their two update rules arises from different assumptions made in the modeling process: NB assume that all children receive input from the same distribution while CF assume that children can be grouped into 4 classes depending on their parental types. The latter approach is better able to arrive at alternative evolutionary dynamics. PN shows that the Triggering Learning Algorithm (TLA, Gibson & Wexler 1994) can be integrated within the NB model and yield the dynamics of linguistic population under the CF model. PN's goal is to "characterize the dimensions along which human languages change over time and explain why they do so" (p.205) especially under conditions of language contact. The second modeling experiment reported in this paper is grounded in a historical example: the evolution of Old English into Modern English and the syntactic changes associated with the different settings of the Head Parameter and the V2 Parameter. The presentation of the data (from Gibson & Wexler 1994) does not make it clear that the +V2 parameter is more than just a restriction on surface word order (i.e., a finite verb must be in 2nd position in root clauses), but a shorthand for an analysis that allows a non-subject to raise to Spec, C position and the finite verb to move to C (Wexler, p.c.). Nevertheless, the analysis of this particular historical change is quite interesting and the simulations make predictions which should be pursued for the study of language evolution. In a final section, PN reports on experiments to partition the children population according to the distribution of languages they are exposed to. If the exposure is limited to the languages spoken by the parents, there are 4 discrete possibilities; if the distribution models neighborhood, there is a linear map. PN then shows that the reorganisation into homogeneous neighborhoods leads to interesting predictions.
8. The learning guided evolution of natural language (William Turkel)
The main thesis of this paper is that "even if one accepts the claim that language cannot exist in intermediate forms, it is still possible that it could have evolved via natural evolution" (p.235). To argue this position, WT makes two assumptions: 1) humans have some degree of plasticity and are capable of learning; 2) successful communication confers a reproductive advantage, i.e. language is adaptive as long as it is shared (p.235). While Pinker & Bloom (1990) argue that there is a continuum of viable communicative systems and that species can gradually increase in communicative ability over time, the assumption that there can be no intermediate forms would seem to lead to the conclusion that language is not a product of selection but rather of preadaptation or exaptation. However, WT argues for the possibility of individual learning after evolutionary search, as 'learning guided evolution'. Although WT stresses that this is not to be confused with Lamarc kianism, but is due to the 'Baldwin effect' (Baldwin 1896), whereby "learning can affect the direction of evolution" and "the learned behavior can become genetically determined", I must admit that the difference seemed rather tenuous and could have been better clarified. However, Pinker (1994), Briscoe (2000) and Deacon (1997) all argue that the Baldwin effect must have played a role in the evolution of human language and Hinton & Nowlan (1987) provide an explanation by showing that "the adaptations learned during a lifetime of an organism guide the course of evolution by altering the shape of the search space" (p.238). Furthermore, although it has been argued that arbitrariness is evidence against adaptation, WT claims that this is a misguided argument and that with complex objects, such as natural languages, history plays a more important role. In fact, as long as arbitrariness is shared, it can be adaptive. WT then presents a set of simulations of a P&P system, with a variant of Hinton & Nowlan's model. An important aspect of this simulation is the distinction between fixed vs. plastic parameters. Here all P&P parameters are plastic, i.e., they can be learned (given starting values of "?" rather than "0" or "1"). The simulations show the Baldwin effect and demonstrate that learning can accelerate the evolutionary process. The results also indicate that the amount of plasticity (i.e., the number of parameters set at "?" at the beginning of the simulation) was inversely proportional to the speed with which the population converged to a single genotype (i.e. a set of parameter settings). From a linguistic point of view, convergence to "0" or "1" represents the evolution of a principle of grammar, while convergence to "?" represents the evolution of a true parameter (p.248).
9. Grammatical acquisition and linguistic selection (Ted Briscoe)
TB's main hypothesis is that an innate LAD (Language Acquisition Device) could have coevolved with human protolanguage, and he tests the ability of the model he proposes against a documented process of creolization. While the standard model of the LAD for grammatical acquisition incorporates "a set of constraints defining a possible human grammar, and a set of biases (partially) ranking possible grammars by markedness", TB's account suggests that "biases as well as constraints can evolve through a process of genetic assimilation" and that "those constraints and biases in turn influence subsequent development of language via linguistic selection" (p.256). In the model of linguistic selection proposed here, languages are taken to be dynamical systems which adapt to their "niche of human language learners and users" and language change is primarily located in parameter setting (reanalysis) during language acquisition. This innate LAD is composed of 1) a theory of Universal Grammar (UG) with a finite set of finite-valued parameters defining the space of possible grammars, 2) a parser for the grammars, and 3) an algorithm for updating parameter settings. For TB, the theory of UG (1) is Generalized Categorial Grammar, where the category set and the rule schemata are defined as a default inheritance network characterizing a set of (typed) feature structures. I would complain that in this paper, TB assumes too much knowledge of GCG on the part of the reader, for instance there is no explanation for a number of terms introduced in the exposition, e.g., "gendir" presumably means "generic direction of functors" and "Vt" presumably means "transitive verb", but the reader shouldn't have to guess. (2), the parser, is a deterministic, bounded-context shift-reduce algorithm. For (3), the parameter setting algorithm, TB proposes a statistical extension of an n-local partially-ordered error-driven parameter setting algorithm utilizing limited memory. Here, I would complain that in spite of the great amount of technical details (pp.263-271) provided about the parsing algorithm, the set-up, and the implementation of the algorithms, we are not given enough explanation of what those technical choices mean theoretically. With respect to the simulation experiments reported here, they are more ambitious in scope and in the phenomena they attempt to model than most of the other contributions to this volume, and for that reason interested me even more. TB first reports on a set of acquisition experiments on "feasible and effective grammatical acquisition" (section 9.3), with both unset and default learners for a combination of parameters, which show convergence at the individual level. This is followed by a set of experiments at the level of the population of language agents (section 9.4), with simulation of different rates of reproduction for the language agents. A set of linguistic selection experiments at the population level (section 9.5), shows that linguistic selection is a viable approach to accounting for some types of language change; it also shows that the Bayesian approach to parameter setting accords with the behavior of learners in situations with either multiple dialects or with contact with speakers of other languages. TB then introduces variations amongst the language agents, in two ways. To investigate the coevolution of the LAD and of language, the simulation uses a population of learners with different parameter settings, the simulation of reproduction being as before, and also including population movements. The results show that "a minimal LAD, incorporating a Bayesian learning procedure, could evolve the prior probabilities and UG configuration which define the starting point for learning, in order to attenuate the acquisition process by making it more canalized and robust" (p.282). The last, even more linguistically realistic, experiment is the simulation of creolization. Bickerton (1988) and Roberts (1998) argue for an abrupt transition from pidgin to creole, but TB wants to investigate whether and how the element of 'invention' assumed to be necessary in creolization could arise in a model where the parameter-setting algorithm is purely selectionist and largely data-driven. I found this section of the paper the most fascinating and the analysis of the linguistic data presented should be of interest to other linguists as well. The hypothesis being tested in the simulation is that "the primary linguistic data that creole learners are exposed to is so uninformative that they retain their prior default-valued parameter settings, as a direct consequence of the Bayesian parameter-setting procedure" (p.287). However, the learners only need minimal exposure to the superstratum language to converge on the creole grammar. Thus, "creolization could result as a consequence of a Bayesian parameter setting learner having default setting for some parameters, acquired via genetic assimilation" (p.293), under the assumption that "richer triggers expressing parameters for more complex categories [are] present in the primary linguistic data" (p.294). TB 's claim is that not only this account of creolization requires no invention or special mechanism at work, besides those already posited for language change, but that the timing for the simulation is remarkably consistent with the time course documented by Roberts (1998). However, he is careful to point out that this has only been validated for cases with SVO superstratum languages. In conclusion, TB proposes that a robust and effective account of parameter setting, broadly consistent with Chomsky's (1981) proposals, can be developed by integrating GCG, embedded in a default inheritance network, with a Bayesian learning framework, which will be compatible with a robust convergence to a target grammar. However, linguistic selection for more learnable variant constructions during language acquisition offers "a promising formal framework to account for language change where language learners converge to a grammar different from that of the preceding generation" (p.295). An extreme example is creolization, which is potentially challenging for a selectional and essentially data-driven account, but the model of the LAD developed here predicts that creolization will occur within the timeframe identified by researchers. The success of the coevolutionary scenario, where "there is reciprocal interaction between natural selection for more efficient learners and linguistic selection for more learnable grammars" (p.295) and whose consequence is highly-biased language learning, leads TB to the conclusion that "there is little reason to retain the parameter setting framework" (p.296), and that a minimal LAD combined with UG is enough. TB argues that this minimal LAD would have required only minor reconfiguration of cognitive capacities in the hominid line, and that (since TB assumes that UG = GCG) "the categorial logic underlying the semantic component of GCG was already in place". Thus, "much of the domain-specific nature of language acquisition, particularly grammatical acquisition, would follow not from the special nature of the learning procedure per se, as from the specialized nature of the morphosyntactic rules of realization for the language of thought" (p.297).
10. Expression/induction models of language evolution: dimensions and issues (James R. Hurford)
JH compares 5 models of language evolution: 2 by Batali (including the one given in Ch.5), 2 by Kirby (including the one given in Ch.6) and 1 by himself (Hurford, 2000). This is a very dense paper, which is nevertheless rewarding for the reader as it illuminates a number of aspects shared by those models and it explores some of the assumptions made by the other contributors. The first commonality between the models discussed here is that they share a large amount of idealization and simplification and the emergent language systems they produce are very simple. The specific questions JH then asks are: In what sense do the evolved systems actually exhibit syntax? To what extent is this syntax truly emergent? In what ways do the evolved systems resemble Natural Language? JH posits an underlying framework for all these models, which he calls 'Expression/Induction' (the E/I acronym is not a coincidence, and is meant to remind the reader of I-language and E-language). Some aspects of E/I are common to many views of language, in particular the view that language is a dynamical system, and the distinction between the mental grammars of individuals (I-language) and their public behavior (E-language). The models discussed here share further assumptions: computational implementation for modeling; use of populations of agents, with agents alternating between speakers/teachers and hearers/learners; and assuming both expression/invention capacity and grammar induction capacity. These models also all start from a situation with no language, thus they are not primarily models of historical change, but of language emergence. They do not invoke biological evolution, but are "models of cultural evolution of learned signaling systems [...] ! not models of the rise of innate signaling systems" (p.304). Unlike in the models proposed by Turkel and by Briscoe (which are not discussed by JH) communicative success is not a driving force, the "basic driving force is the learning of behavior patterns by observation of the behavior of others" (p.305). Another important shared assumption is that of emergence: "the essential dynamic of an E/I model itself produces certain kinds of language structure as a highly likely outcome. The interaction of assumptions produces non-obvious outcomes, explored by simulation" (p.306). An important concept which JH explores throughout this chapter is that of 'bottlenecks' (see also Ch.6). The set of possible meaning-form pairs is infinite, or at least very large, while the set of example utterances used for acquisition is necessarily finite. This leads to two kinds of bottlenecks: a 'semantic bottleneck' where learners onl observe a fraction of all possible meanings, and a 'production bottleneck' where speakers only produce a subset of possible utterances for the meaning. All the models implement a semantic bottleneck, which is crucial otherwise "no agent would ever be forced to generalize beyond its learning experience" (p.333). All the models also implement a production bottleneck, otherwise the model would be unrealistic. JH steps through a few examples to show how the I/E model handles the evolution of vocabulary, without bottleneck (no change, unrealistic model) with only a production bottleneck (leading to the elimination of synonyms over time) and with only a semantic bottleneck (leading to an increase in synonymy). Moving to modeling the emergence of syntax, all the models surveyed have evolved syntactic means of expressing their meanings, and JH identifies 3 phases in the emergence of syntax which all models go through. However, the models differ in the agents' representation of syntax and they contrast "in the degree to which learners postulate autonomous syntactic structure" (p.315). This echoes the debate on the rival merits of rules (Chomsky 1971) and analogies (Bolinger 1968). An important point is that in all the experiments, the population of agents has converged on a set of representations over which a generalization is possible, even if it wasn't actually made. JH also raises the empirical psycholinguistic issue of the degree to which humans store exemplars rather than rules and points out that the issue of rules vs. chunks also arises in computational parsing theory. He argues that both approaches (rules and stored chunks) are "compatible with some of the most basic facts of language organization" and that "computational evolutionary models have a long way to go in complexity before they can begin to shed light on such issues" (p.319). The models all converge on systems which exhibit compositionality, and JH argues that this is achieved in more or less stipulative ways inasmuch as the "semantic representations incorporated into the evolutionary models already have a syntax" (p.321)-- however Kirby claims that compositionality emerges in his model without being deliberately coded in. Concerning the invention and production algorithms, "invention for E/I models is an essentially random process, constrained by the in-built assumptions" (p.325). On the other hand, grammar induction is implemented very differently in these models: in the early stages of the simulation, the mode of learning is incremental in all models, while at later stages they differ as to how much internal rearrangement of an agent's previously stored information takes place. The models also differ in their population dynamics, which can be multi-generational or uni-generational, but they all assume a constant size population, with agents periodically removed from the population, which raises the issue of what would happen with more sophisticated models of population expansion (cf. Ch.9). In conclusion, JH considers that the factors which facilitate the emergence of recursive, compositional syntactic systems in E/I models boil down to: pre-defined syntactic representations; an invention (and/or production) algorithm to construct new expressions in conformity with the principle of compositionality; a learning algorithm to internalize rules generalizing over form-meaning mappings in a compositional way; a strong semantic bottleneck effect; and a production bottleneck. JH moves on to speculate that a hybrid model stripped down to: a flat semantic structure; an invention algorithm with no bias towards compositional structures; a learning algorithm which may, but not only, induce general rules with compositionality; uni-generational population dynamics with only a single agent; and the feedback effect of a production bottleneck would yield an emergent recursive compositional syntactic system, and concludes that this is an experiment waiting to be done!
This is a very interesting book, quite challenging for the reader because of the technical level of most of the papers, and it certainly raises a number of controversial issues. Before addressing these issues, I have to say that I would have expected the distinction between language emergence (phylogenetic) and language acquisition (ontogenetic) to be made clearer throughout the volume by all contributors. The term 'evolution' is itself ambiguous and in some cases, it is not always clear how the results of the experiments should be interpreted: as a model of how language evolved in humans, or of how humans learn language. In other cases, of course, 'evolution' refers to language change, and RW takes the view that not only languages, but individual words, should be regarded as evolving species. However, from both the vantage points of language emergence and of language acquisition, the main issue concerns the status of the human language faculty and of the language acquisition device (LAD). The question is how much of the human language faculty, and of the LAD, is innate to humans, and to what extent the properties of human languages arise from the constraints of communication. Indeed, the very existence of the LAD as a separate cognitive faculty (cf. Chomsky) as opposed to general purpose learning mechanisms is under investigation. The latter view is argued by Worden, while Briscoe argues for a minimal LAD refined by genetic assimilation. Two related issues are those of the co-evolution of language and the language faculty (Batali and Turkel) and of convergent evolution as opposed to genetic encoding (Briscoe, Worden, Turkel). Both the hypothesis that human language and the human language faculty evolved together and the hypothesis that the language faculty emerged through convergent evolution require some amount of genetic assimilation. Genetic assimilation allows for the possibility of intermediate forms of language and of the LAD, in sharp contrast to the hypothesis of saltation, where the human language faculty (and the LAD) is unique and fully specific (genetically encoded) to the human species. Collectively, the contributors to this volume have demonstrated that to a large extent, some of the properties assumed to be constitutive of human language, i.e., the arbitrary pairing of meanings with forms, showing recursion and compositionality, and the syntactic structure of language, can be made to emerge under certain assumptions. The assumptions which are shared by all researchers, and are not likely to be controversial, are that there is a some sort of pairing of meanings with forms, and that successful communication is rewarded in one way or another. Further assumptions regard the particular implementations used for the simulations, and there is room for disagreement as to the extent to which the implementations actually determine some of the observed outcomes (Hurford). Although all the papers in one way or another address the methodology of modeling and simulation, they all assume that such modeling is worth undertaking. However, the reader may still be left with the question of whether we can actually learn something about all these issues from simulation experiments with computational language agents. This volume will convince people who are inclined to believe that such simulations can provide some answers that progress has been made, and it may convince others that such simulations are worth performing and that the results are more illuminating and relevant to linguistics than they might have expected. Having assumed that computational simulation and modeling is one way to explore some of the issues related to the evolution of language, it is interesting to note that although all the contributors use language agents to simulate humans, they group these agents into populations of very different sizes and composition. They also make vastly different assumptions about how learning and teaching across generations is accomplished, with most authors remaining modest and only considering simple models with fixed numbers and limited variety of interaction. They have different models of population growth, contact and migration (see especially the contributions by Niyogi and Briscoe). Finally, they also differ markedly with respect to the nature of the computational representations they use (e.g., trees, RMF, formula sets). All these choices are the factors which determine the observed outcomes. Finally, there are several important assumptions about the nature and properties of human languages which underlie the work reported here: recursion, which most authors assume, and compositionality, which most authors try to account for; and lastly, about the nature of the linguistic representations, which is always some form of meaning-form pairs. As a collection of papers, the volume is very well put together and it forms a coherent whole. The authors obviously know each others' work and cross-reference each other, in some cases as work in progress, so the reader is left with the impression of a lively growing community of researchers. However, references to other work outside this community are also sufficient to allow the reader to explore alternative approaches. The progression between the papers is quite logical and the placement of Hurford's contribution, with its systematic comparison of 5 models, at the end serves as a kind of overall review of many of the concepts and issues introduced or assumed by the other contributors. The book itself is very well produced and, in general, well edited. The index is rather limited, but adequate. I have a few minor quibbles about the layout, in particular in chapter 4, most figures are too far away from their reference in the text (e.g., Fig. 4.3 on p.87 is mentioned on p.78), and in Ch.10 we find "3 exemplars along the lines of the 3 shown on the next page" (p.318) but no examples are given. There are very few typos (missing parenthesis in Table 7.2, p.211; Fig. 7.6 on p.227 is referenced as Fig. 7.4). My worst criticism in this regard would be that the paper by the editor might have benefited from more rigorous editing and trimming, as it is one of the longest and arguably most densely written of the volume, but I also found it extremely interesting and stimulating.
Bibliography Baldwin, J. M. (1896). "A new factor in evolution". American naturalist, 30, pp.441-451. Bickerton, D. (1998). "Catastrophic evolution: the case for a single step from protolanguage to full human language". in Approaches to the Evolution of Language: Social and Cognitive Bases. J. Hurford, M. Studdert-Kennedy, C. Knight eds. pp.341-358. Cambridge: Cambridge University Press. Bolinger, D. (1968). Aspects of Language. New York: Harcourt, Brace and World. Briscoe, E. J. (2000). "Grammatical acquisition: inductive bias and coevolution of language and the language acquisition device". Language, 76(2), pp.245-296 Cavalli-Sforza, L. and M. W. Feldman. (1981). Cultural Transmission and Change: A Quantitative Approach. Princeton, N.J.: Princeton University Press Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1971/1965). Paper read at the Northeast Conference on the Teaching of Foreign Languages, 1965. reprinted in J. P. B. Allen and P. van Buren eds. Chomsky: Selected Readings. Oxford: Oxford University Press. Deacon, T. (1997) . The Symbolic Species: Coevolution of Language and Brain. Cambridge, MA. MIT Press. Gibson, E. and Wexler, K. (1994). "Triggers". Linguistic Inquiry. 26, 3. pp.407-454. Hinton, G. E. and S. J. Nowlan. (1987). "How learning can guide evolution". Complex Systems, 1, pp.495-502. Hurford, James R. (2000). "Social transmission favours linguistic generalization". in Approaches to the Evolution of Language: The emergence of phonology and syntax. C. Knight, M. Studdert-Kennedy, J. R. Hurford, eds. pp.324-352. Cambridge: Cambridge University Press. Niyogi, P. and R. C. Berwick. (1995). The Logical Problem of Language Change. MIT AI Memo no. 1516. Niyogi, P. and R. C. Berwick. (1997). "Evolutionary Consequences of Language Learning". Linguistics and Philosophy, 20, 697-791. Pinker, S. and P. Bloom. (1990). "Natural Language and Natural Selection". Behavioral and brain sciences, 13(4), pp.707-784. Roberts, S. (1998). "The role of diffusion in the genesis of Hawaiian creole". Language, 74(1), pp.1-39.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER Dominique Estival is a Senior Research Scientist at DSTO (the Australian Defence Science and Technology Organisation), working on various aspects of human interfaces and language technologies. She received her PhD in linguistics from the U. of Pennsylvania in 1986 with a thesis on diachronic syntax and since then has been actively involved in Computational Linguistics and Natural Language Processing, in industrial R&D and in academic environments. She has worked on various areas of NLP: linguistic engineering, grammar formalisms for NLP, Machine Translation, reversible grammars, evaluation for NLP, and spoken dialogue systems. Dominique.Estival@dsto.defence.gov.au.