This book presents a new theory of grammatical categories - the Universal Spine Hypothesis - and reinforces generative notions of Universal Grammar while accommodating insights from linguistic typology.
Date: Mon, 8 Sep 2003 23:33:47 +0200 From: Gisbert Fanselow <email@example.com> Subject: Movement in Language: Interactions and Architectures
Richards, Norvin (2001) Movement in Language: Interactions and Architectures, Oxford University Press.
Gisbert Fanselow, University of Potsdam
The insights of Chomsky (1964), and, in particular, Ross (1967) lead to the establishment of a new research topic in syntax: constraints on movement. This new line of research generated an impressive number of empirical insights, and culminated in attempts such as Chomsky (1981), Chomsky (1986), or Baker (1988) of finding one or two simple principles from which all constraints on movement can be derived.
However, empirical data that might prove fatal for a purely syntactic account of the restrictions on movement already came in around 1980. Huang (1982) observed that some of the local domains that restrict movement in English also constraint scope assignment in Chinese questions, in spite of the fact that question words do not undergo (visible/audible) wh-movement in this language but may stay in situ. Earlier, Erteschik (1973) had made the discovery that the degree of acceptability of extractions from certain domains is a function of information structure. These findings did not lead to a dismissal of syntax-based accounts of constraints on movement, however. Rather, models were developed in which the assignment of semantic scope to operators is conceived of as the construction of a formal level of representation (viz., Logical Form. LF), which involves essentially the same type of operations that we find in visible syntax, including movement (see, e.g., Huang (1982)). According to the GB-model (Chomsky (1981)), the ultimate target of a syntactic derivation is Logical Form. Much of the derivation of LF consists of a sequence of movement operations. During the derivation, there is a point (identified as a level of representation, S-structure, in early approaches, and simply called Spellout, nowadays) at which the phonological and the syntactic aspects of the derivation split up. Movement taking place before Spellout has a phonological effect (visible displacement, overt movement), movement taking place after Spellout (covert movement) has no phonological effect, it is invisible/inaudible.
1. Overview of Movement in Language
This classical view of the distinction between covert and overt movement prevailed through the eighties, but in the nineties, its assumptions were questioned: is the difference between overt and covert movement really expressible in terms of a Spellout point in the derivation, or does it have to be specified independently, so that covert operations may precede overt ones? Are overt and covert movement really identical? Norvin Richards has written his book Movement in Language. Interactions and Architectures (MiL) as a contribution to this discussion (chapter 6), and he argues for a neo-classical concept of movement, in which the difference between overt and covert operations is (in principle) one of the timing relative to Spellout.
MiL claims that the neoclassical view is supported by the existence of similarities among languages that only have overt (Bulgarian) or covert (Chinese, Japanese) wh-movement, respectively, as opposed to languages such as English that employ both types of movement. Models in which the difference between overt and covert movement is one of timing are particular in predicting that Bulgarian and Chinese type languages have common properties (because all wh-movement steps take place at the same point in the derivation, before Spellout in Bulgarian, after Spellout in Chinese), whereas the different instances of wh-movement in a multiple question are carried out in different parts of the derivation in English type languages.
In addition, MiL offers analyses for a number of phenomena that are formulated in terms of the neoclassical view and support it to the extent that these are compelling. These detailed analyses of various phenomena related to movement make the book extremely interesting and valuable. Chapter 2 presents evidence for the idea that UG allows two different types of multiple questions: those, in which all wh-phrases cluster in the CP-domain, and those in which this clustering happens within IP. Chapter 3 discusses strict ordering effects among multiple specifiers of the same category (wh-phrases in Bulgarian, clitic sequences, certain types of A-movement, etc.) and argues that they can be derived from the Shortest Move condition and a particular way of encoding cyclicity in grammar.
Chapter 4 is concerned with a fundamental problem of the (neo-) classical model: there seem to exist positions P in natural languages that are normally targeted by covert movement, but are passed through by overt movement to higher positions Q. How can movement (to P) applying after Spellout precede movement (from P to Q) before Spellout? Richards solves this problem by formulating a model in which Pesetsky's Earliness Principle and a constraint related to the phonological realization of links in a chain imply that some instances of "covert" movement may take place before Spellout.
Chapter 5 gives a detailed discussion of "minimal compliance": sometimes, constraints such as subjacency or the superiority condition do not have to be fulfilled by all links created by movement - rather, it suffices that one dependency is in line with the constraint and thereby licenses the later creation of dependencies violating it.
2. Two ways of forming multiple questions
The theory developed in MiL presupposes and elaborates on a proposal originally made by Rudin (1988): in multiple questions, the wh-phrases may either be all adjoined to IP (as in Serbo-Croatian), or they may be made multiple specifiers of CP (as in Bulgarian). If long distance wh- movement proceeds via the specifier position of CP only, we understand why CP-absorption languages tolerate extractions from wh-clauses, while IP-absorption languages do not. According to MiL, "IP-absorption" languages are further characterized by allowing scrambling. They lack superiority effects with local wh-movement (wh-objects may be placed in front of wh-subjects in multiple questions), and they do not show weak crossover-effects (as English does in ?who does his mother like). When multiple wh-phrases from the same clause interact, they have the same scope in IP-absorption languages, but CP-absorption languages are different: wh-phrases with different scope are possible, and they prefer crossing dependencies. Richards argues that this distinction also applies to languages with covert wh-movement only, and to languages such as German and English which combine overt and covert wh- movement in multiple questions.
The discussion in MiL sheds an interesting new light on the possible scope of a proposal that was originally made for languages with multiple fronting of wh-phrases. Two remarks are in order, however. First, some of the properties by which IP- and CP-absorbing languages are distinguished are straightforward consequences of scrambling. That scrambling languages show neither superiority nor weak crossover effects, was, e.g., observed by Haider (1986), and he related this property to the additional ordering possibilities created by scrambling. Since objects may be scrambled to a position P c-commanding the subject, the data in (1) have a derivation compatible with the conditions responsible for superiority and weak crossover: whenever object wh-movement starts in the position P c-commanding the subject, it neither crosses a wh-subject nor a pronoun which it binds.
(1) a. ich weiss wen t-WH [wer t-SCRA liebt] I know who.acc who.nom loves b. wen liebt [t-WH [seine Mutter t-SCRA] who.acc loves his mother
The question arises, then, whether the differences between German and English with respect to the descriptive properties of wh-movement do not just reduce to the fact that German is a free word order language, while English is not. Such a solution would be incorrect only if one could show that A-scrambling must not precede wh-movement, so that wh- movement in (1) cannot start from the position t-WH preceding the subject, but must originate in the object position t-SCRA c-commanded by the subject. Such a constraint on the interaction of scrambling and wh-movement has in fact been postulated by Müller & Sternefeld (1993), and it seems to be a consequence of the general approach pursued in MiL, since a chain resulting from a succession of A-scrambling and wh- movement contains two strong positions (see below). But empirically, the claim that wh-movement must not be preceded by A-scrambling is hard to defend, given data such as (2), in which the movement of the wh- operator was strands the rest of the object noun phrase in front of the subject, i.e., in a scrambling position (see Fanselow 2001).
(2) Wasi hätte denn [DP,acc t für Aufsätze] selbst Hubert nicht what had PTC [ t for papers ] even Hubert not rezensieren wollen review wanted 'What kind of paper would even Hubert not have wanted to review?
Apart from the question of whether the CP-IP-absorption distinction is really supported by English-German-contrasts, it also cannot be taken for granted that the properties in the clusters associated with CP- vs. IP-absorption always go hand in hand. Swedish does not show superiority effects in simple multiple questions (so it should be an IP-absorption language), but it is quite liberal with respect to wh-islands (a property claimed to be characteristic of CP-absorption languages) and does not have scrambling (but Object Shift). Spanish is like Swedish in this respect, but its word order is much more flexible.
Of course, one cannot exclude that the existence of languages that do not fall in line with the clustering of properties predicted in MiL is due to additional parameters and further structural distinctions. Nevertheless, the above remarks concerning German, English, and Swedish relativize the merits of an attempt to extend Rudin's proposal beyond the multiple wh-movement languages.
3. Tucking in
The superiority effect observed in English multiple questions has been a topic of syntactic theorizing for more than thirty years, and a number of diverging theories have been proposed. It was again Rudin (1988) who enriched this discussion with new data from multiple filler languages: in Bulgarian double questions, both wh-phrases must be fronted in overt syntax, and the order in which they appear in clause initial position must be identical to the order in which they were merged in IP. Rudin's own account involves the adjunction of wh-phrases to the specifier position of CP, which is unsatisfactory from a theoretical point of view, given that this analysis violates the strict cyclicity of derivations (but see Grewendorf 2001 for a modern version of this account).
MiL accounts for the contrast in (3) in the following way (chapter 3). Movement to Spec,CP is subject to a Shortest Move/Minimal Link Condition requirement: only the wh-phrase closest to the attracting position moves. Therefore, the subject koj is the first category in the derivation of (3) that moves to Spec,CP. Since Bulgarian is a multiple wh-movement language, the object kogo must be moved as well. The strict order effects in (3) follow if the second specifier position created by moving kogo must be created _below_ the position of the XP moved first, i.e, if XPs are "tucked in" below the phrase moved previously in multiple specifier constructions. This presupposes a specific definition of cyclicity that Richards takes over from Chomsky (1995).
(3) a. koj kogo vizda who whom sees b. *kogo koj vizda
Chapter 3 is particularly interesting because Richards shows that the scope of the phenomenon captured by the tucking in - operation goes beyond multiple questions in Bulgarian: Object shift, cliticization, and certain types of A-scrambling and quantifier raising are further cases in point. It is a fairly new discovery that in quite a number of constructions, the c-command relations among moved phrases must be the same before and after movement!
Unfortunately, MiL does not contain a detailed comparison of its strictly derivational "tucking in"-model with the strictly representational accounts offered by Müller (2001) and Williams (2003). E.g., Müller proposes a (violable) constraint according to which c- command relations among phrases must be identical at all levels of representation (PF, LF, etc.). The representational models are compatible with a derivation of (3a) proceeding in a traditional way (kogo moves first). Independent evidence for the tucking in-idea thus seems to be called for, and would be extremely valuable, since it would strongly support a derivational model of grammar. MiL contains a brief discussion (pp. 49-53) of Bulgarian constructions in which local wh- movement mitigates subjacency violations of later non-local wh- movement. If the licensing local movement must precede the licensed long movement, insights into the order by which wh-phrase move to the specifiers of a CP seem possible, and Richards claims the empirical facts support his view. However, I do not find the "crucial" contrast between a "*" sentence (his (21) on p. 53) and a "??"-sentence (his (20) on p.52) too impressive, in particular, since it is based on the intuitions of a single native speaker only.
4. Strong and weak features
A timing model of the contrast between overt and covert movement in terms of a Spellout point in the derivation is confronted with the problem that some constructions seem to involve an application of covert movement that precedes overt movement steps. Richards dedicates the fourth chapter of his book to a discussion of this problem. His approach is framed in terms of the standard minimalist assumption that movement serves the purpose of feature checking, and that there are two types of features: strong and weak ones. In contrast to "standard" minimalism, the strong-weak distinction is framed in terms of effects on PF-chains (p 105): a strong feature is an instruction that the position (in the chain) that checks this feature must be pronounced. Weak features do not imply any constraints in terms of pronounciation. On this basic assumption, Richards build a simple and elegant algorithm for determining whether movement is overt or covert. The key idea lies in the assumption (p. 105) that PF must receive unambiguous instructions about which element in the chain must be pronounced.
If the chain contains one element only at PF, this condition is trivially fulfilled. A chain <A,B> with A checking a strong feature is licensed since the single strong feature constitutes an umambiguous instruction for PF pronounciation. On the other hand, a chain <A,B> with A checking a weak feature is not a legal PF object, since there are two positions that could be pronounced, and weak feature do not come with any pronounciation instructions. Such illegal PF objects <A,B> with A checking a weak feature can be avoided by postponing movement to a weak position after the Spellout (so that <A,B> will be an LF object only).
Obviously, all chains containing a single strong position are predicted to be grammatical PF-objects in this approach. Chains <A,B,C> in which either A or B are strong positions (but not both) possess a unique pronounciation instruction. In fact, as MiL demonstrates, there are in fact many instances of heads and phrases that need to target weak positions when they undergo overt movement. Here are a few examples:
a. Case checking movement of objects is covert in English, but it needs to precede overt wh- movement in wh-questions. Likewise, objects have to move through specifier positions of participial phrases in French wh-questions (triggering agreement there), although this movement must be covert outside the context of wh-movement [Final position strong, intermediate position weak]
b. There is no V-to-Infl movement in Mainland Scandinavian clauses, i.e., the feature of Infl checking V is weak in Mainland Scandinavian. In verb second clauses, V must pass through Infl on its way to Comp, however. [Final position strong, intermediate position weak]
c. In Malay, there is partial wh-movement: the wh-phrase undergoes overt movement to an intermediate specifier position only, and then moves to its scope position covertly [Final position weak, intermediate position strong]
Particularly strong support for the view developed in MiL comes from the fact that weak movement may also be carried out visibly in the overt component of grammar when ellipsis and other reduction operations apply. Sluicing constructions such as (4) are a case in point: from which has moved to a specifier position of CP on the basis of a weak feature (normally, only one wh-phrase moves in English multiple questions), but this does not imply an illegal PF-chain, because the root position of this movement is deleted in the sluicing construction together with the whole VP, so that the wh-phrase in Spec,CP is the only link in the chain left at PF. Spellout can thus proceed in a unique and unambiguous way.
(4) I know that in each instance one of the girls got something from one of the boys. But they did not tell me which from which.
I think this is a very elegant theory of the covert- overt distinction, and it is quite a pity that a number of data do not really fit into it - in particular, the theory disallows chains involving two strong features (both features force pronunciation, so there is an ambiguous instruction), and substantial part of the book (147 - 195) is dedicated to a discussion of the problems that arise in this context. For example, English subjects move to Spec,IP in overt syntax (attraction by a strong feature), and, at least in long distance questions, they are able to move on to the specifier of CP (attraction by a second strong feature). Indeed, in quite a number of languages. subject wh- movement must start from a "weaker" subject position (perhaps in VP), but still, the grammaticality of (5a) must be accounted for. The solution offered in MiL is that the ban against two strong features in a chain is a violable one - it is violated whenever more important principles must be respected. In the case of (5a), clausal pied piping (who loves Irina do you think) is the competing derivation, but it is excluded because the constraint ruling out the pied-piping of complementizer-less CPs is stronger than the need to have unambiguous pronunciation instructions. Probably, the ban against pied-piping IP- complements is responsible for the possibility of overtly extracting who from an ECM Spec,IP position reached by overt movement itself (5b).
(5) a. who do you think t loves Irina? b. who do you expect t to kiss Irina?
Above, we have seen that overt scrambling may precede wh-movement in German, a constellation that is also incompatible with a ban against two strong features within a chain. Mahajan (1990, 1996) argues that overt A-scrambling may be followed by overt A-bar-scrambling in Hindi. Multiple strong features seem to be in general wellformed in PF-chains (p190) when the strong features are of the same "type": overt V to Infl and Infl to Comp movement may be combined in Icelandic (and phrases may go from one subject position to the next one higher up in cyclic A- movement). MiL leaves an account of these counterexamples open (p. 190).
Covert movement may also have to precede overt movement in constellations different from the ones considered in MiL. If islands are made transparent by head incorporation (as in Baker (1988)), it is often the case that overt movement is licensed by covert head incorporation. The Minimal Compliance effects in English multiple questions (see below) also presuppose that covert wh-movement of X may precede the overt wh-movement of Y (as proposed by Pesetsky (2000 )). The general architecture of the model proposed in MiL allows such constellations: PF-chains with a root position and a position checking a weak feature do not come with a unique pronunciation instruction, but, as we have seen, the ban against such ambiguous chains is a violable one in the MiL model. Nothing in principle excludes that constraints other than the ban against certain types of clausal pied piping override the principle favoring unambiguous PF-chains.
Although it takes over from Chomsky (1981) the idea that covert movement is one that applies after Spellout, the model Richard proposes has various loopholes by which covert movement may be brought forward. An application after Spellout it thus a sufficient, but not a necessary condition for a movement being covert.
5. Minimal Compliance
Examples such as (6) are a notorious problem for all theories of superiority: the presence of a third wh-phrase in a clause renders (certain) violations of the superiority condition possible. Similar facts hold in Bulgarian: while the order of indirect and direct object wh-phrases is not free in double questions, it is in triple questions, where, e.g., both the wh-cluster nom-acc-dat and the cluster nom-dat- acc are grammatical.
(6) what did who buy where
In MiL, Richards develops a very interesting account of such facts: within certain domains, grammatical constraints must be respected by a single grammatical relation only. Once the constraint has been "checked" within a domain, other relations of the same kind in the same domain need not obey the constraint in question. This is the Principle of Minimal Compliance. Again, PMC effects can be observed in a large number of different kinds of constructions: reflexivity in Dutch, weak crossover effects in English, VP-ellipsis, and scrambling.
MiL argues that PMC effects support a derivational approach, because the operation that satisfies the constraint in question must be applied before operations violating it for there to be a PMC effect saving the structure. Japanese and Bulgarian Shortest Move and Subjacency facts seem to support this conclusion. Section 5.6. shows that the PMC may be extended to an impressive number of further constructions.
The PMC proposal sheds a very interesting light on the way how constraints are applied in natural language. The amount of data the PMC seems to characterize correctly is impressive.
One question that does not really find an answer in MiL is the question of what determines whether a principle is checked according to PMC or not. Thus, in Hindi, a wh-phrase in an embedded clause (with matrix scope) is not licensed by the presence of a wh-phrase in the matrix clause (rather, the kyaa-construction must be used, see Mahajan 1990), although short movement of the matrix wh-phrase should be sufficient for satisfying the subjacency requirement of the relevant matrix Comp. Clitic placement appears to be strictly local, independent of how many clitics are attached to a head. An anaphor cannot be bound non-locally just because a further anaphor is bound to the same antecedent in a local fashion. The application of local scrambling within a certain clause does not render long scrambling into that clause grammatical in German. Presumably, one can formulate a model that allows one to predict or at least describe whether a constraint is checked in the PMC way or nor, but such a theory still needs to be developed.
6. General assessment
MiL is certainly one of the most important book-length contributions to minimalist syntax of the last years. It provides fresh insights into the nature of the shortest move / minimal link condition. The Principle of Minimal Compliance represents an original, stimulating way of dealing with the fact that syntactic constraints may fail to be respected by certain dependencies within a clause. And MiL offers an elegant theory of the distinction between overt and covert movement.
The only major weakness of the book I can see is one that is not uncommon in generative (minimalist) syntax: key decisions about the architecture and fundamental properties of the grammatical model are motivated on the basis of fairly complex constructions, the acceptability status of which is not really established beyond doubt. We have already mentioned one such case in section 3 of this review (a single speaker giving a contrast between "*" and "?") , but the situation may even be worse. As Richards concedes himself. "the claim that some covert wh-movement languages exhibit wh-island effects while others lack them is not uncontroversial. There are in fact Chinese speakers who reject wh-island violations and Japanese speakers who violate wh-islands fairly freely [...]. The possibility arises, then, the contrast to be discussed in this chapter is not a real one, but an accident of the particular Chinese and Japanese speakers who provide the data that have become standard in the literature. Refuting a possibility like this is not a simple one, and is beyond the scope of this chapter" (p.12)
Unfortunately, the problem is not really confined to wh-islands. We know that speakers of a language show considerable variation in judging the well-formedness of sentences when these are complex, or when their well-formedness involves aspects of information structure (see, e.g., Schütze 1996)). Both factors play a crucial role in the relative degree of acceptability in a substantial part of the data used in MiL. Richards may be right in attributing certain properties to grammar and languages rather than to processing and individual speakers, but given that we both have the empirical (Schütze (1996), Cowart (1997), Keller (2000)) and theoretical (Keller (2000), Bresnan & Nikitina (2003), among others) means to deal with constellations that are characterized by variation and influences of information structure and processability, the question is whether the readers of MiL can really be satisfied by statements that substantiating certain claims has been "beyond the scope" of a certain chapter or of the book. I really like the theoretical approach taken in MiL, and I hope that it will stimulate the empirical research necessary for establishing whether the factual claims the book makes are correct.
Baker, M. 1988. Incorporation. Chicago.
Bresnan, J. & T. Nikitina. 2003. On the Gradience of the Dative Alternation. Ms., Stanford.
Chomsky, N. 1964. Current Issues in Linguistic Theory. Den Haag: Mouton
Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht.
Chomsky, N. 1986. Barriers. Cambridge, Mass.
Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.
Cowart, W. 1997. Experimental Syntax. Thousand Oaks, CA.
Erteschik-Shir, N. 1973. On the nature of island constraints, MIT: Ph.D. Dissertation.
Fanselow, G. 2001. Features, ?-roles, and free constituent order. Linguistic Inquiry 32,3.
Grewendorf, G. 2001. Multiple wh-movement. Linguistic Inquiry 32: 87- 122.
Haider, H. 1986. Deutsche Syntax - generativ. Habilitation thesis, Vienna.
Huang, C.-T., 1982. Move WH in a language without WH Movement. Texas Linguistic Review 1: 369-416.
Keller, F. 2000. Gradience in Grammar. Doctoral dissertation, Edinburgh.
Mahajan, A. 1990. The A/A-bar-distinction and movement theory. Doctoral dissertation, MIT.
Müller, G, 2001. Order Preservation, Parallel Movement, and the Emergence of the Unmarked. In: G. Legendre, J. Grimshaw and S. Vikner, eds., Optimality-Theoretic Syntax. MIT Press, Cambridge, Mass., pp. 279-313.
Müller, G. & W. Sternefeld. 1993. Improper movement and unambiguous binding. Linguistic Inquiry 24: 461-507.
Pesestky. D. 2000. Phrasal Movement and Its Kin. Cambridge, Mass.
Ross, J.R. 1967. Constraints on Variables in Syntax. Doctoral dissertation, MIT.
Rudin, C. 1988. On multiple questions and multiple wh fronting. Natural Language and Linguistic Theory 6: 445-501.
Schütze, C. 1996. The Empirical Basis of Linguistics. Chicago.
Williams E. 2003. Representation Theory. Cambridge, Mass.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER Gisbert Fanselow is a professor of syntax at the University of Potsdam, Germany, His research has a focus in free word order phenomena (scrambling, discontinuous noun phrases), aspects of wh-movement (scope marking constructions, MLC). He has done some experimental work on preferences in local ambiguities and processing influences on grammaticality judgements.