Language Evolution: The Windows Approach addresses the question: "How can we unravel the evolution of language, given that there is no direct evidence about it?"
The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.
Date: Sat, 14 May 2005 15:56:03 +0200 From: Luis Vicente Subject: Efficiency and Complexity in Grammars
AUTHOR: John A. Hawkins TITLE: Efficiency and Complexity in Grammars PUBLISHER: Oxford University Press YEAR: 2004
Luis Vicente, ULCL, Leiden University
"Efficiency and Complexity in Grammars" (ECG henceforth) represents one further step in Hawkins' attempt of incorporating performance factors into the theory of grammar. The opening sentence of the book represents his research programme better than anything else: "Has performance had any significant impact on the basic design features of grammars?". He answers that this is indeed the case. In his view, most of the properties of natural languages are a result of the pressure to optimise parsing and production. ECG is a extended elaboration of this very brief statement. The elaboration is actually so extended that, to keep the review from being nearly as long as the book, I will concentrate on some major points of the argumentation throughout, and simply sketch others. I believe the theory developed in ECG is best by focusing on how particular analyses work, rather than by giving a bird's eye overview of the entire work.
The book can be divided roughly in two parts. In the first one (chapters 1- 4), Hawkins' gives his "big picture" idea of how performance factors interact with grammar. These chapters are heavy on theory, and consequently demand one's full and undivided attention to follow the reasoning -which is, otherwise rather neat, unless one makes the mistake of taking ECG for what it is not. This is a book about how performance factors can influence the shape of languages. It deals with the issue of how typological tendencies reflect parsing preferences, not with what the formal internal underpinnings of the grammar are. If one reads these chapters bearing in mind which questions Hawkins is interested in answering, one will find enough exciting ideas to make the heavy reading more than worth.
In the first two chapters ("Introduction" and "Linguistic forms, properties, and efficient signaling"), Hawkins argues why performance factors must be considered an integral part of the theory of grammar. The main thesis of the book is summarised in the Performance-Grammar Correspondence Hypothesis (p. 3), which runs as follows:
1) Performance-Grammar Correspondence Hypothesis (PGCH): Grammars have conventionalised syntactic structures in proportion to the degree of preference in performance.
Hawkins implements this idea through a number of principles, the most important being Minimise Forms, Minimise Domains, and Maximise Online Processing -MiF, MiD, and MaOP respectively. The definitions are the following:
2) Minimise Forms: The human processor prefers to minimise the formal complexity of each linguistic form F, and the number of forms with unique conventionalised propety assignments, thereby, assigning more properties to fewer forms. These minimisations apply in proportion to the ease with which a given property P can be assigned in processing to a given form F.
3) Minimise Domains: The human processor prefers to minimise the connected sequences of linguistic forms and their conventionally associated syntactic and semantics properties in which relations of combination and/or dependency are processed.
4) Maximise Online Processing: The human processor prefers to maximise the set of properties that are assignable to an item X as X is processed.
Although the wording might be somewhat cumbersome, the leading idea behind those principles is clear: the forms and patterns preferred by grammar as those that place the lesser burden on parsing and production. This can be achieved in several ways: by using morphologically simple forms for frequently used elements, by favouring word order patterns that don't overload the parser, and so on.
Chapters 3 and 4 ("Defining the efficiency principles and their predictions" and "More on form minimisation") consist of a more detailed exploration of how the performance principles introduced earlier on can influence the shape of languages, thus building the stage for later chapters. Some of the topics he deals with here include grammaticalisation (e.g., verbs of wish and desire turning into modals or tense markers, or the stages of the evolution of definiteness marking) and markedness hierarchies (e.g., nominative being less marked than accusative, which is in turn less marked than dative, and so on). Hawkins argues that these phenomena and more can be easily accommodated under his framework. For instance he argues that, in a number of hierarchies, the amount of phonological and morphological complexity will be equal or greater down each hierarchy position. He works his way through number marking, case marking, degree forms of adjectives... showing that the more marked a certain case or number specification is, the more morphophonologically complex it usually is. The obvious question is whether this is not simply circular reasoning -that is, something is more complex because it is more marked, and we know it is more marked because of the extra morphophonological complexity. Although Hawkins doesn't deal with this question directly, he hints at a way in which markedness hierarchies can be derived. Introducing ideas that will be further developed in chapter 8, he claims that some positions in the hierarchy are dependent on others.
For instance, the comparative and superlative forms of adjectives ("bigger", "biggest") are taken to be dependent on the base ("big"). The base form simply requires the adjective to modify an entity. Comparative and superlative forms require this, but also the existence of (at least) a second entity to establish a comparison with. Similarly, nominative case can be assigned to the sole argument of a clause. However, accusative usually requires the presence of an argument that has already been marked as nominative, and datives usually require the presence of nominative and accusative argument (this, of course, ignoring quirky case marking, which is something for which I'm willing to give Hawkins the benefit of the doubt). If this dependency relations can be established in the way argued in this chapter, then it would be possible to avoid circularity. This is, however, a notion that I would be happy to see formalised in a more explicit way.
In the second part (chapters 5-8), Hawkins turns to more detailed examination of certain phenomena. I found here considerably more food for thought, and I can imagine that these chapters will be more interesting than the first four to anyone who wants to focus on fine language-specific data, rather than broad typological tendencies. This said, though, I found chapter 5 somewhat disappointing despite its title ("Adjacency effects within phrases"). I was anticipating a discussion of phenomena such as the obligatory verb-object adjacency in English, or the requirement in several languages that wh-words and foci immediately precede the verb. Instead, Hawkins discusses the conditions under which separation of associated phrases causes parsing degradation (or not). In a nutshell, his idea is that the more dependent on each other two phrases are, the more difficult it will be to separate them. One pair of English examples he discusses (amongst many others) is "take X to the library" vs. "take X into account". His claim is that displacement of the PP is less likely to occur in the latter than in the former, because "take X into account" is what he dubs an "opaque collocation". It has a somewhat idiomatic meaning, so the larger the separation between its parts, the harder it will be to assign a meaning to it. He discusses several similar cases, though one problem that I find is that, while phrases dependent on each other tend to be close together, they cannot appear closer than what rules of grammar allow them to. That is, there are several idioms that consistently forbid word orders that would certainly result in a shorter MaOP domain. For instance:
5) give Mary the sack vs. * give the sack Mary 6) throw John to the lions vs. * throw (to) the lions John 7) drive Peter bananas vs. * drive bananas Peter
See Harley 2002 for discussion of similar examples. The way I understand Hawkins' analysis, the sentences on the right hand side should be grammatical. They certainly bring the idiomatic parts of the VP together, thus resulting in the shorter MaOP domain possible. Yet, despite the gain in processing efficiency, the sentences are totally out. Why should this be so? It seems like the grammar can yield a number of outputs, which are more or less difficult to parse. This is where Hawkins' theory finds its place. The reverse, nonetheless, doesn't seem to be true: parsing considerations cannot force a structure that is not independently generable by grammar. MiOP and other performance constraints can select the most parser-friendly structure amongst a number of alternatives, but they cannot generate the best possible structure independently of syntax.
The discussion of the correlation between dependency and proximity continues in Chapter 6 ("Minimal forms in complements/adjuncts and proximity"), where most of the space is devoted to a discussion to the parsing preferences between "which", "that", and zero relative clauses, and how they can be accommodated under the theory developed so far. The general idea is that there is a tension in relative clause marking: while an overt complementiser/relative pronoun unambiguously identifies the clause as a relative, it also increases the parsing domain. Thus, the conflict between MaOP and MiF gives rise to various patterns of preference that Hawkins explores in detail.
The much longer chapter 7 ("Relative clause and wh- movement universals") is, I think, one of the strongest parts of the book. Of interest in this chapter is the correlation he tries to establish between wh- movement and VO/OV order. It has been long noted that VO languages tend to have overt wh- fronting, whereas OV languages tend to have wh- in situ constructions. Within OV, those languages that have wh- fronting tend to be the ones that are "partial OV" (e.g., West Germanic languages, where V2 coexists together with V-final). Hawkins' claim is that these correlations follow from his general theory: since the wh- word is subcategorised for by the verb, preference is given to word orders where both of them are close together, so that the link can be established without overloading the parser too much. Thus, in VO languages, where the verb tends to stay closer to the left periphery of the clause, wh- fronting can minimise the wh- parsing domain. In OV languages, however, wh- fronting will increase it... except in partial OV languages, where V2 applies in the case of wh- movement. He excludes rightward wh- movement on the general assumption that wh- words must precede their gaps in order to allow for an efficient parsing. A [gap > wh-] order is deemed inefficient enough in parsing to exclude rightward wh- movement nearly across the board (with regard to this last point, one may wonder how prenominal relatives can be accommodated, where the head noun follows the gap. Hawkins' answer is that in these languages, NPs in general tend to be N-final. Thus, the inefficiency of [gap > head noun orders] can be compensated by the creation of a N-final noun phrase, in concordance with the general pattern of the language).
Another point of interest in this chapter is its approach to certain wh- movement restrictions. He claims, for instance, that some phenomena like island violations and *that-trace effects are not the result of violating any grammatical principles, but of processing factors and complexity hierarchies. He assumes the following hierarchy for types of embedding:
where complex NPs represent the type of embedding with the greatest processing difficulty (possibly due to the extra structure). Making a parallel with Keenan & Comrie's (1977) Accessibility Hierarchy, Hawkins claims that individual languages can select a "cut-off" point in this hierarchy. Extraction is possible out of types of embedded phrases below this point, but not above it. Thus, there are languages that allow for extraction only out of non-finite clauses; a subset of these can add finite clauses on top of it; from these, a subset allows for extraction out of complex NPs under some circumstances (e.g., Mainland Scandinavian languages). What is not attested, according to Hawkins, is a language that allows extraction from a certain type of embedded phrase but not from a lower one (i.e., a language that allows extraction out of finite clauses, but bans it out of non-finite clauses).
The discussion continues in chapter 8 ("Symmetries, asymmetric dependencies, and earliness effects"), where Hawkins tackles issues like the preference for subjects to precede objects (including scopal interpretations), or for topics to be left-dislocated. The idea introduced in this chapter is that asymmetric dependencies tend to appear in a specific word order. Consider a pair of elements X and Y, such that Y is dependent on X for interpretation or for other reason. Hawkins claims that, in cases like this, the preferred order is one in which X precedes Y. If the order were the reverse, one would have to hold in memory the variable provided by Y until X is encountered, and a suitable value can be assigned. This would use up the resources of the parser, leading to a less efficient processing. This can be avoided by making X precede Y, so that a value can be provided for the variable as soon as it is introduced.
The general conclusions achieved in ECG appear in chapter 9, which also serves as a general manifesto of sorts for Hawkins' research programme. In this chapter, he repeats time and again that his work is a reaction to Chomskyan linguistics. Hawkins' accuses generative grammarians of neglecting the impact of performance factors on the properties of grammars, focusing exclusively, instead, on grammar-internal theorising. As has already become clear after 250 pages, Hawkins advocates the opposite view, where the pressure for an efficient parsing underlies many of the language-specific and cross-linguistic peculiarities of language. I agree with Hawkins in that a complete theory of language will ultimately have to account for parsing and performance. However, I think the views he expresses in this book show an overconfidence on the power of performance as a means to explain grammatical phenomena (see below)
The basic idea underlying the reasoning in ECG is that there exist a number of complexity hierarchies in language, and that languages tend to aim towards the more unmarked values of these hierarchies. Sometimes, though, I had the impression that this idea is only worked out at an intuitive level. For instance, most of the discussion about word order preferences is based on counting words (see also Hawkins 1994). However, nowhere in the book can we find a definition of "word". I'm not being picky here. This is a serious objection, since the notion of "word" is possibly the fuzziest notion in contemporary linguistics -to the extent that some researchers have claimed that the concept of word is not relevant in any morphosyntactic sense (e.g., Julien 2000). For most of the discussion, he seems to assume something like a "dictionary entry" notion of wordhood. This does the work, but only because Hawkins does not consider cases where a more explicit definition would be needed. One can easily think of several such cases. For instance, should clitics be considered separate words, or should they be counted together with whatever their host is? Should particle verbs in West Germanic count as one single word or two when the particle is not stranded? What is the status of compounds (e.g., "screwdriver" or "overthrow"? How should one treat the output of incorporation (i.e. much of the discussion in Baker 1988 and Hale & Keyser 1993)? What happens with contractions like "would have" --> "would've", or "I have" --> "I've"? One particularly interesting case could be the German negative word "kein" ("no"), which has been argued to actually be a combination of two words (negation plus an indefinite determiner) spelled out as one (cf. Penka 2002). Should the parser have access to this hidden structure or not? Although I haven't thought about these objections in detail, it seems to me that a more explicit definition of word would be needed to handle these cases. Maybe one should also make reference to morphemes, to phonological weight, prosody, syntactic operations like head movement, or even all of the above.
On a different level, I had the impression through most of the book that the proposed complexity metric could be more useful as a theory on language change than as a grammaticality evaluation device in synchronic terms. That is, a sentence is rarely marked as ungrammatical because it violates any of the constraints Hawkins introduces (MaOP, MiF, and so on). It simply ranks as more difficult to process than an equivalent that doesn't violate such constraint, but in and of itself it is not grammatical. Of course, after a certain level of complexity is reached, ungrammaticality results. However, this seems to be just a parsing failure, rather than "real" ungrammaticality rooted in syntax alone (i.e., nobody would want to claim that centre-embedded structures are ungrammatical after the third or fourth level of embedding. The grammar has no problem in generating them, it's simply that the parser's resources cannot cope with them). It seems more like language change can be biased towards structures and patterns that don't overload the parser's resources. I find this a reasonable conclusion, and I wouldn't have much trouble adopting it myself. Nonetheless, I am not convinced that this theory can be used to explain ungrammaticality patterns, as Hawkins tries to do at some points (e.g., his discussion of *that-trace effects at the end of chapter 7).
Notwithstanding these comments, one should take ECG for what it really stands for. Hawkins' makes a strong point that performance factors ought to be incorporated into the general theory of grammar, rather than being used as a waste basket for certain phenomena one cannot explain on grammatical terms alone. This I agree with, and I think Hawkins' work (not only this book, bit his earlier publications as well) represents an important contribution to the understanding of how performance affects language. What I disagree with is Hawkins' somewhat hidden assumption that performance can be held as a universal solution for grammar theory. It is true that an ultimate theory of language must be able to explain the kinds of phenomena discussed in this book, but I think one should not blur the competence/performance dichotomy as easily as Hawkins. As mentioned above, it seems like processing factors cannot force structures that are not allowed by grammatical principles. This can be taken as an indicator that the grammar and parsing are best kept as mainly separate systems, even though they are subparts of the language faculty, and one can see their interaction in specific phenomena.
Baker, Mark (1988), Incorporation: a theory of grammatical function changing, University of Chicago Press, Chicago
Hale, Ken, and Samuel J Keyser (1993), Argument structure and the lexical expression of syntactic relations, in Hale & Keyser (eds.), The view from building 20, 53-109, MIT Press, Cambridge, Massachusetts
Harley, Heidi (2002), Possession and the double object construction, Language Variation Yearbook 2, 29-68, John Benjamins, Amsterdam
Hawkins, John (1994), A performance theory of order and constituency, Cambridge University Press, Cambridge
Julien, Marit (2002), Verbal inflection and word formation, Oxford University Press, Oxford
Keenan, Ed, and Bernard Comrie (1977), Noun phrase accessibility and Universal Grammar, Linguistic Inquiry 8, 63-99
Penka, Doris (2002), Kein muss kein Ratsel sein, MA thesis, Tübingen University
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
I am a 3rd year graduate student at Leiden University, specialising in formal syntax. Topics I've worked on include relativisation, syntax- phonology interface, head movement, remnant movement, scrambling, argument licensing, and the structure of VP.