Review of Word Order

Reviewer: Matthew Reeve
Book Title: Word Order
Book Author: Jae Jung Song
Publisher: Cambridge University Press
Linguistic Field(s): Linguistic Theories
One of the major concerns of any approach to linguistics is to account for cross-linguistic variation in word order. It has long been known that there are statistical correlations between the unmarked orders of certain grammatical categories (e.g. verb and object, adposition and its complement). Yet there are vast differences between and within theoretical frameworks in how they account for these correlations. Jae Jung Song’s book aims at providing a relatively concise and accessible survey of how word order variation is treated in four major frameworks -- linguistic typology, generative grammar, optimality theory (treated separately from generative grammar) and performance-based theories -- as well as a critical evaluation of the various approaches. I summarize the contents of the seven chapters below, and finally, provide a brief evaluation of the book.

Chapter 1: Word order: setting the scene
Song starts by considering the apparently banal fact that in English, a verb precedes its direct object, whereas Korean shows the opposite order. The question arises of whether this kind of word order variation is accidental or whether there are patterns underlying such variation. Song notes that since Behaghel’s (1909) classic work, a number of very different approaches to word order variation have emerged, and practitioners of a given approach are often unfamiliar with other approaches. Song’s aim in this book is to bridge the gap between these various approaches. He argues that within generative grammar, word order has been considered of lesser importance than other aspects of grammar, often being considered merely a matter of phonology. In the ‘linguistic typology’ (LT) tradition, on the other hand, word order takes centre stage, and word orders are important “in their own right” (p. 4). He claims that Optimality Theory (OT) pays more attention to surface properties than what we might call ‘standard’ generative grammar. The central difference between the various approaches, Song claims, has to do with “empirical validity” versus “theoretical explanation” (p. 5). LT is “unequalled” in terms of emphasis on “empirically (or statistically) tested word-order variation” (p. 5). OT and processing-based theories “draw heavily” on data from the LT tradition (the implication being that generative grammar, in general, does not).

Chapter 2: The Linguistic-Typological Approach: Empirical validity and explanation
Song defines LT as “the study of structural variation in human language with a view to establishing limits on this variation and seeking explanations for the limits” (p. 10). While this definition seems to be applicable to any approach to word order variation in terms of structure, Song emphasizes that accounting for variation is the priority of the LT approach, and that, according to LT, cross-linguistic data provide the best basis for generalizations about the nature of human language. This approach is also characterized by a lack of abstract entities and structures, and by an emphasis on ‘functional’ as opposed to ‘formal’ variables. Yet LT practitioners are often also concerned with theoretical elegance, and there have been various attempts to come up with single, simple principles that can account for certain types of word order variation. LT’s object of study is ‘language universals’ in the sense of Greenberg (1963): statistical correlations between different word order properties of a language. The methodology of LT involves large amounts of data, which is claimed to lessen the risk of treating rare things as universals. Song discusses the problem of determining the ‘basic word order’ of a language, and the fundamental importance given to the ordering of subject (S), verb (V) and object (O) as opposed to other orderings, and then goes on to discuss a number of competing LT approaches to word order variation. He begins with the classic work of Greenberg (1963), whose 45 universals have provided the basis for most typological work ever since, and then discusses the work of W. Lehmann (e.g. 1973), Vennemann (e.g. 1974), Hawkins (1983, 1994, 2004), Tomlin (1986) and Dryer (1992), among others. These authors differ on the question of which grammatical categories are the most reliable indicators of other word order properties (e.g. V for Lehmann 1973 and Vennemann 1974, P for Hawkins 1983), the importance of pragmatic/functional factors vs. formal/structural factors (e.g. Tomlin’s 1986 Theme First, Animated First and Verb-Object Bonding principles vs. the Head-Dependent Theory exemplified by Vennemann 1974 and Hawkins 1983 and Dryer’s 1992 Branching Direction Theory) and the relation between verb-initial, verb-final and verb-medial orders (e.g. Is SVO typologically closer to VSO or to SOV?). Also relevant is the question of whether there is a single principle underlying word order correlations (e.g. Hawkins’ 1983 Principle of Cross-Category Harmony or Dryer’s 1992 Branching Direction Theory), or whether we need to recognize heterogeneous factors (e.g. processing efficiency, pragmatics/discourse, grammaticalization) as being responsible for them (e.g. Dryer 2009).

Chapter 3: Entr’acte: historical and conceptual background of Generative Grammar
This chapter sets the scene for Chapter 4, providing background in generative grammar for readers who are unfamiliar with it. For space reasons, I will omit further discussion of this chapter.

Chapter 4: The generative approach: stipulation or deduction
This chapter discusses how word order generalizations are accounted for within what I will refer to as ‘mainstream generative grammar’ (MGG; a term borrowed from Culicover & Jackendoff 2005). Song claims that, while word order has always been central within LT, the status of word order within MGG has often been thought to be “at best, ambiguous, and, at worst, insignificant, if not totally irrelevant” (p. 100). He cites Chomsky’s discussion of structure-dependence and the impossibility of formulating transformations purely on the basis of linear order (e.g. in yes-no questions) as a case where surface word order takes a back seat to more abstract structural properties. On the other hand, while much work in the Minimalist Program (MP) work has relegated word order to the Phonetic Form (PF) interface (i.e. phonology), syntax proper being concerned only with hierarchy (e.g. Chomsky 2004), another important strand of recent MGG work (starting with Kayne 1994), has taken word order to be fully determined by hierarchical structure. This discussion thus illustrates attempts to “move from stipulation to deduction” (p. 103). In Section 4.2, Song illustrates in more detail the treatment of word order at various stages of MGG: the stipulation of word order in phrase structure rules (e.g. Chomsky 1965); the attempt to remove redundancy and capture headedness, as well as the relatedness of verbs and derived nominals, in terms of X-bar theory (e.g. Chomsky 1970); later refinements of X-bar theory, such as the postulation of the Inflectional Phrase (IP) and the Determiner Phrase (DP) (e.g. Abney 1987) and the VP-internal subject hypothesis (e.g., Koopman & Sportiche 1991); the attempt to generalize over word order properties in terms of the Head Parameter (e.g., Stowell 1981); and the use of Case Theory and Theta Theory to account for certain VP-internal word order properties (e.g. Chomsky 1981, Stowell 1981, Travis 1984). As Song notes, subsequent MGG research on word order attempted to eliminate word order stipulations from grammar to a greater extent; he devotes the remainder of the chapter to Kayne’s (1994) proposal (in terms of the Linear Correspondence Axiom, or LCA) that linearization is determined entirely by properties of hierarchical structure (viz. asymmetric c-command). Song presents Kayne’s theory in some detail, as well as a number of his arguments for it, and then moves on to discuss how cross-linguistic word order variation has been accounted for within LCA-based approaches: in short, word orders differing from the underlying base order are derived via movement. Song discusses Cinque’s (2005) account of Greenberg’s Universal 20 in terms of the LCA, as well as Kayne’s (2000) account of the apparent fact that final complementizers are found only in OV languages, and the correlation between postpositions and OV order. In Section 4.5, Song discusses the relation between the LCA and MP, noting that “Kayne’s theory has met with mixed reactions from MP practitioners” (p. 141). He discusses Chomsky’s ‘bare phrase structure’, Uriagereka’s (1999) ‘multiple spell-out’ system, and Epstein et al.’s (1998) attempt to derive c-command directly from Merge and Hornstein’s (2009) proposal that Merge is an asymmetric operation, all of which have implications for the LCA. The chapter ends with a discussion of the increasing importance of ‘interface conditions’ in generative grammar.

Chapter 5: The Optimality-Theoretic approach: violable constraints and constraint ranking
Song chooses to treat OT in a distinct chapter from generative grammar, despite its origin within the generative paradigm. According to Song, OT represents a “radical conceptual shift” (p. 160) from MGG in its use of violable constraints. Furthermore, it does not constitute “a substantive theory of any phenomenon” (Legendre 2001:3); rather, the constraints and representations are “extensively imported directly from other theoretical models” (p. 160) – as Song notes, these could, in principle, involve functional factors, as well as the more ‘formal’ ones characteristic of MGG. What distinguishes OT from other frameworks is the ranking of these constraints in language-specific ways and the fact that they may be violated in principle. Song begins by discussing the move from rule-based, derivational theories of phonology to theories making use of constraints on representations to avoid redundancy; OT represents the culmination of this process in that it eliminates rules entirely in favour of constraints. Song then introduces the basic architecture of OT (i.e. the GEN and EVAL components, the interaction of violable constraints in language-specific ways, faithfulness and markedness constraints, harmonic bounding, the emergence of the unmarked, and factorial typology) and discusses how OT has been extended from phonology to syntax (giving a particularly detailed discussion of Grimshaw’s 1997 treatment of wh-movement, subject-aux inversion and do-support). A crucial aspect of OT that emerges from this discussion is that it shifts explanations of language variation from the lexicon (where it is generally located in modern MGG) to the grammar (i.e. constraint ranking). Most of the rest of the chapter discusses three OT accounts in detail: two accounts of word order in terms of ‘factorial typology’ – Costa (e.g. 1998) and Zepter (2003) – and López’s (2009) attempt to interpret Kayne’s LCA in OT terms. Costa’s main concern is the interaction of information structure with ‘basic word order’. Zepter aims at capturing various basic orders (S, V, O; Gen, N) in terms of head directionality constraints, as well as the orders that can be ruled out as impossible (e.g. ‘reverse German’, TSVO). López (2009) argues that the LCA should be reanalysed as a violable constraint in the sense of OT, basing his argument on Clitic Right-Dislocation in Spanish. Song highlights the important role played in all three cases by ‘the emergence of the unmarked’ (i.e. the idea that lower-ranked constraints are decisive when the effects of all higher-ranked constraints are neutralized).

Chapter 6: The performance-based approach: efficiency in processing (and production)
This chapter focuses on “performance-based research that addresses, from a cross-linguistic perspective, the word-order patterns and their correlations by drawing on processing and/or production (efficiency) as the primary avenue of explanation” (p. 235). The main approaches considered are Hawkins’ (1994, 2004) Early Immediate Constituent (EIC) Theory, the ‘speaker-oriented’ work of Wasow (e.g. 2002) and associates, and Gibson’s (1998) Dependency Locality Theory. He first discusses Hawkins’ treatment of word order within the English VP in terms of the Principle of Early Immediate Constituents (PEIC): immediate constituents must be identified as early as possible. He then discusses various predictions and problems of Hawkins’ theory, concerning correlations between P-NP, A-NP and Gen-NP, and left-right asymmetries in word order (e.g. lack of final complementizers in VO languages, N-Rel order relative to verb position). He compares Hawkins’ EIC Theory with Dryer’s Branching Direction Theory (discussed under LT), concluding that EIC theory has advantages with respect to the frequencies of word orders within complex NPs. Song then considers theories grounded in the idea that the role of the speaker may have a bearing on word/constituent ordering (in contrast to Hawkins’ ‘hearer-based’ approach). Wasow (2002) highlights the frequency of PEIC violations (e.g. with Heavy Noun Phrase Shift (HNPS)) and dysfluencies in speech, proposing that the speaker aims at maximizing production efficiency rather than processing efficiency. Song then considers the role of length, heaviness and complexity, all of which have been proposed as motivations for phenomena such as HNPS, and Wasow’s conclusion that they are “statistically indistinguishable in corpus data as predictors” (2002:32), which is contrasted with Hawkins’ position that structural complexity is the “ultimate explanation” for word/constituent ordering. In addition to these factors, Wasow and associates show that ‘information status’ (esp. given/new) is a crucial factor in ordering, at least when structural complexity is not as relevant. Song next discusses three additional variables that have been identified as possible determinants of word/constituent ordering: semantic dependency (e.g. in collocations, complements vs. adjuncts), lexical bias and valency. Song next discusses Hawkins’ (2004) revision of his EIC theory, which (in contrast to his earlier single-principle-based theory) appeals to multiple performance-based principles (though still retaining processing efficiency as the central principle), and extends its remit to non-word-order phenomena such as relativization strategies, filler-gap dependencies, head- vs. dependent-marking and antecedent-anaphor relations. Finally, Song discusses Gibson’s Dependency Locality Theory (DLT), which is grounded in psycholinguistics. DLT gives more of a role to factors such as the complement/adjunct distinction, as part of an overall focus on “storage” or “memory cost”. Two aspects of memory cost are relevant: storage of the structure built so far, and integration of new words into this structure. Song provides the example of ‘centre-embedding’ of relative clauses to illustrate the workings of DLT. In his conclusion, Song notes the tension that exists in performance-based theories between single-principle-based and multiple-principles-based approaches, and between the roles of comprehension and production. Further issues include the relevance of grammatical knowledge (e.g. subcategorization requirements) and whether the processor constructs constituent structure anew each time or whether some degree of conventionalization of performance principles should be accepted.

Chapter 7: Envoi: whither word-order research?
In the final chapter, Song briefly contrasts the evolution of the various approaches in terms of (1) deduction vs. induction, (2) single vs. multiple principles and (3) competence vs. performance. He then identifies the major challenges for each of the approaches: for LT, how to identify and tease out the various factors underlying word order variation; for MGG and OT, bridging the gap between abstract universal underlying word order and surface diversity; for performance-based approaches, investigating a wider range of performance factors, such as the speaker’s needs and conventionalized grammar. He finishes by identifying a “consensus emerging [...] among the approaches surveyed” (p. 308) on three points: word order does not seem to be something to be explained entirely by a single principle; cross-linguistic variation needs to be taken more seriously; and both formal and functional factors need to be taken into account in explanations of word order variation.


Because of its detailed and wide-ranging nature, I have not been able to do justice to the book’s contents in the relatively short summary above. Of course, the term ‘word order’ can be potentially interpreted as covering most topics in syntax, and anyone looking for a discussion of phenomena such as scrambling, topic/focus-movement or second-position cliticization will not find it here. Rather, the primary topic of the book is what modern-day generativists would call ‘linearization’: in particular, whether specifiers and complements are pronounced to the left or to the right of their heads (e.g. whether subjects and objects are pronounced to the left or right of verbs or auxiliaries), and what factors lie behind the choice. On this topic, the coverage is broad and up-to-date. Although the primary aim of the book is to provide a summary of each approach to cross-linguistic word order variation, some analyses are discussed in considerable detail. For example, Song devotes over 30 pages to Zepter’s (2003) OT analysis of cross-linguistic word order variation in the clausal and nominal domains. While this may seem like overkill for a ‘research survey’, in fact, it provides an admirable case study of OT syntax in action, giving the reader an appreciation of the framework’s advantages and disadvantages, which would not have been possible if the discussion had restricted itself to merely outlining the technicalities of OT and briefly summarizing a couple of analyses. On the other hand, the density of much of the discussion means that readers with no background at all in the relevant areas may find it heavy going. However, the book is so well-referenced that such readers will still find the book useful as a pointer to further reading. Furthermore, those with some background in the relevant framework will find the introduction to that framework very useful as a ‘refresher’. The book also contains a considerable amount of critical discussion. Song is a typologist by background, but is even-handed in assessing the merits and drawbacks of each approach.

One potential danger of grouping chapters by framework is that it might give the impression that there are clear dividing lines between the frameworks, when in fact, there is a good deal of overlap between many of the approaches discussed. For example, although there are two chapters dealing with generative grammar (of the Chomskyan variety), all of the other frameworks discussed make some use of concepts from generative grammar – even if only basic phrase-structural notions – in at least some of their incarnations (e.g. Dryer 1992 from an LT perspective, Hawkins 1994 from a processing perspective, OT approaches in general). While these connections are not explicitly denied, the impression could be given that it is the ‘generative’ part of MGG that distinguishes it from other approaches, whereas in fact, the clearest distinguishing feature of the generative approaches discussed here is probably their mentalistic underpinnings (cf. other generative approaches, such as Head-Driven Phrase Structure Grammar (HPSG), which do not necessarily share these assumptions), which are closely connected with the importance MGG gives to purely ‘formal’ (as opposed to ‘functional’) factors. Furthermore, some of the comparisons Song makes between frameworks seem somewhat simplistic. For example, he states that OT “pays a greater deal of attention to surface (word order) properties than Generative Grammar does” (p. 4), and that it “seems to strike a balance, as it were, between data-driven LT and theory-driven GB and P&P” (p. 183-4). Yet there is nothing about MGG that makes it inherently unsuitable for dealing with large amounts of data or with surface word order variation, and in fact, surface word-order properties are taken very seriously by most MGG practitioners, as emerges in Song’s discussion of the LCA in connection with Greenberg’s Universal 20, for example. In addition, the second quote above implies that OT and LT are not theory-driven, when it seems to me that what Song means is that they make different assumptions about the object of study (with OT and LT happening to share the assumption that surface word orders have a ‘primitive’ status). That is, the differences between different frameworks (to the extent one can generalize) have to do with how their theoretical commitments determine the relevance of particular kinds of data.

In a way, though, the concerns I have about Song’s comparison of the various approaches to word order variation simply reflect the fact that generalizations about theories and frameworks are difficult to make. Song’s discussion of the theories on their own terms, at least the ones with which I am familiar (I wouldn’t like to commit to a view on the others), is careful, detailed and reasonably accurate – an impressive achievement given the limited space. This book is therefore highly recommended for researchers and students interested in linearization and word-order typology.


Matthew Reeve currently teaches at University College London, where he obtained his PhD in 2010. His main research interests are in syntactic theory, and in particular the interface(s) between syntax and interpretation (information structure, ellipsis, binding theory).

