How do you pronounce biopic, synod, and Breughel? - and why? Do our cake and archaic sound the same? Where does the stress go in stalagmite? What's odd about the word epergne? As a finale, the author writes a letter to his 16-year-old self.
Review of Working Memory in Sentence Comprehension
Date: Thu, 21 Apr 2005 17:05:46 -0700 From: T. Florian Jaeger Subject: Working Memory in Sentence Comprehension: Processing Hindi Center Embeddings
AUTHOR: Vasishth, Shravan TITLE: Working Memory in Sentence Comprehension SUBTITLE: Processing Hindi Center Embeddings SERIES: Outstanding Dissertations in Linguistics PUBLISHER: Routledge YEAR: 2003
T. Florian Jaeger, Linguistics Department, Stanford University
Self-center-embedding constructions (henceforth SCEs) as in (1) have received an enormous amount of attention in the psycholinguistic literature because of the difficulty they impose on the human sentence processor. They have, however, only been studied for a small group of languages (mostly Dutch, English, German, Japanese, and Korean).
(1) Don't you find [that, with the right intonation, sentences [that people [that somebody has introduced to you] produce] are relatively easy to understand]?
Shravan Vasishth's Working Memory in Sentence Comprehension (henceforth WM) presents a detailed and carefully controlled psycholinguistic investigation of SCEs in Hindi. Three models of sentence comprehension (Gibson, 2000; Hawkins, 1994; Lewis, 1998) are evaluated. Weighing evidence from seven experiments, Vasishth concludes that none of the models is sufficient to capture the whole range of facts observed for Hindi (this extends to Hawkins, 2004).
Vasishth proposes a new model based on the Retrieval Interference Theory (Lewis, 1998; Lewis, 2001). Comprehenders generate a set of hypotheses (about possible parses) consistent with what they have heard so far. Whenever a word is processed, the associated processing complexity is derived from both memory-related processes (the retrieval/construction of elements from/in working memory) and the consistency of the word currently being processed with the set of established hypotheses.
WM is of considerable interest to psycholinguists working on sentence processing, as well as a must for researchers interested in processing- based constraints on cross-linguistic variation (cf. Hawkins, 2004). In the course if this review, I discuss several findings relevant to ongoing typological research on, e.g. (a) word order freedom in South Asian languages (and to which extent it depends on discourse/information structure constraints); (b) case OCP effects (Obligatory Contour Principle, Leben, 1973); and (c) the cognitive basis of Differential Case Marking (Aissen, 2003). The data presented in WM also pertain to the ongoing debate to which extent comprehension complexity is reducible to the predictability of a word given the information preceding it (e.g. Gibson, submitted; Hale, 2001; Levy, 2005).
WM has 262 pages in seven chapters, an appendix with all experimental stimuli (in Devanagari script only), a brief index, and a list of references, as well as 17 additional pages containing a table of contents, annotated lists of tables, figures, and algorithms, and a brief preface.
I first summarize each of WM's seven chapters (Section III of this review) and then present an overall evaluation (Section IV) and my recommendation. To make this review accessible to an audience beyond psycholinguists, I have attempted to couch the issues addressed in WM in general terms, providing additional background where necessary. Section VI provides information about the reviewer (i.e. me).
CHAPTER BY CHAPTER SUMMARY
In this section, I give a brief summary of each chapter. The first chapter (29 pp.) contains an introduction to the linguistic and psycholinguistic background assumed in WM. Chapter 2 (11 pp.) introduces and summarizes the predictions of three theories of sentence comprehension that WM evaluates. Chapters 3 to 5 (47, 43, and 11 pp.) describe seven experiments comparing the theories introduced in Chapter 2. Chapter 6 (17 pp.) takes the reader on a brief detour related to positional and word length effects on reading time relevant to several of the experiments. Finally, Chapter 7 (36 pp.) constitutes the theoretical heart of WM, presenting Vasishth's theory of sentence comprehension along with a discussion of its empirical coverage (based on data from Dutch, German, Hindi, and Japanese).
Chapter 1, the introduction, contains three main parts. The first part provides the reader with a background on the role of working memory in the processing of SCEs. While all working memory-based processing theories agree that increased processing load of SCEs is due to increased demands on working memory for those constructions, the accounts differ in precisely what is assumed to affect the memory-load and when (during processing) the effects of additional working memory-load surface. These differences are discussed in the summary of Chapter 2.
The second part of Chapter 1 summarizes those properties of Hindi that pertain to the processing of SCEs. The SCEs investigated in WM are control constructions (Bickel and Yadava, 2000):
(2) Siitaa-ne Hari-ko [PRO kitaab khariid-neko] kahaa. Sita-ERG Hari-DAT PRO book buy-INF told 'Sita told Hari to buy a book.'
As seen in (2), Hindi is a dependent-marking head final language. Two aspects of Hindi SCEs are of crucial importance for the current purpose. First, Vasishth claims that both the indirect object (Hari-ko) and the direct object (kitaab) can be fronted without ''rendering the sentence ungrammatical'' (p. 10):
(3a) Hari-ko Siitaa-ne [PRO kitaab khariid-neko] kahaa. Hari-DAT Sita-ERG PRO book buy-INF told (3b) Kitaab Siitaa-ne Hari-ko [PRO khariid-neko] kahaa. book Sita-ERG Hari-DAT PRO buy-INF told 'Sita told Hari to buy a book.'
Crucially, the experiments presented in WM investigate these fronting constructions out of context, which raises the question to what extent they are subject to discourse or information structure-based constraints (I return to this issue in Section IV). The second aspect of Hindi relevant here is Differential Object Marking (Aissen, 2003): While non- prototypical direct objects (e.g. definite human direct objects) must be case-marked in Hindi, the most prototypical direct objects (unspecific indefinite inanimates) must not be case-marked. Some types of direct objects (e.g. indefinite inanimates as 'kitaab' - book) can occur with or without a case marker:
(4) Siitaa-ne Hari-ko [kitaab(-ko) khariid-neko] kahaa. Sita-ERG Hari-DAT book-ACC buy-INF told 'Sita told Hari to buy a book.'
In the final part of Chapter 1, Vasishth argues that (a) direct objects with case-marking are specific and conversationally imply definiteness, and (b) direct objects without case-marking are real indefinites. The comparison of two of the theories discussed in the next chapter (Gibson, 2000 vs. Lewis, 1998) crucially relies on these two assumptions.
Chapter 2 summarizes three models of sentence processing: Hawkins' Early Immediate Constituency (Hawkins, 1994, henceforth EIC), Gibson's Dependency-Locality Theory (Gibson, 1998; Gibson, 2000, henceforth DLT), and two variants of Lewis' Retrieval Interference Theory (Lewis, 1998, henceforth RIT). Since a detailed discussion of these theories is beyond the scope of this review, I limit myself to a summary of the crucial differences. Both Gibson's DLT and Hawkins' EIC (as well as the revised theory in Hawkins, 2004) predict that processing cost increases the more material intervenes between a dependent (e.g. an argument) and the point of its integration (the head of the dependent). This prediction is based on the assumption that the working memory-load at the point of integration is higher the more complex the information that intervenes between the dependent and its head. Thus both of them would predict that (5b) is harder than (5a) since the distance between the direct object argument 'kitaab' (book) and the verb 'kariid-neko' (to buy) is larger in (5b):
(5a) Siitaa-ne Hari-ko [kitaab-ko khariid-neko] kahaa. Sita-ERG Hari-DAT book-ACC buy-INF told (5b) Kitaab-ko Siitaa-ne Hari-ko [khariid-neko] kahaa. book-ACC Sita-ERG Hari-DAT buy-INF told 'Sita told Hari to buy a book.'
Lewis' RIT on the other hand predicts (5a) to be harder to process than (5b). This prediction follows from the assumption that similar items (where similarity in this case is due to surface identical case-marking) interfere in working memory at the point of retrieval (the verb). RIT predicts that this difficulty is amplified if the identical items are adjacent (e.g. the two adjacent '-ko' marked phrases in (5a)).
WM exploits this property of RIT to further distinguish RIT empirically from DLT. DLT predicts that the processing cost at the point of integration is higher (a) the more discourse referents intervene between the dependent and the head (see above) and (b) the less accessible these interveners are (Gibson, 2000; Warren and Gibson, 2002. Note that this is not quite how Vasishth summarizes DLT. I address this discrepancy in Section IV). This predicts that definite interveners (e.g. a -ko marked direct object) cause less processing cost than indefinite interveners (e.g. a direct object without -ko marking). RIT on the other hand does not attribute any processing cost to accessibility of referents. Instead, as stated above, increased processing cost is predicted for cases with two or more -ko marked objects. Thus DLT predicts that (4) with a -ko marked object incur less processing cost on the verb than if the object is indefinite (not -ko marked). RIT makes the opposite prediction.
Chapter 3 presents three experiments testing the effect of identical case marking. The first two experiments (acceptability elicitation and moving window self-paced reading) compare the effect of -ko marking in SCEs with either one level of embedding, as in (4) above, or two levels of embedding, as in (6).
(6) Siitaa-ne Hari-ko [Ravi-ko [kitaab(-ko) khariid-neko] bol-neko] kahaa. Sita-ERG Hari-DAT Ravi-DAT book-ACC buy-INF tell-INF told 'Sita told Hari to tell Ravi to buy a book.'
Both experiments reveal a main effect of nesting (double-nested SCEs are harder then single-nested SCEs) and case-marking: Crucially, -ko marked objects were harder to process (both the object itself and the verb integrating it) than non -ko marked objects. Since -ko marking in experiment 1 and 2 always results in two adjacent -ko marked phrases, these results support Lewis' RIT over DLT and EIC. Recall that, contrary to the facts in Hindi, DLT predicts that definite interveners will result in more processing load at the integrating verb. Vasishth points out that all evidence provided for the validity of this claim (Warren and Gibson, 2002) comes from intervening subjects, whereas all experiments in WM contain intervening objects. The observed difference of definiteness effects on processing complexity could thus be related to (violations of) expectations about prototypical subjects and objects (although Vasishth does not relate this intriguing evidence to the research on Differential Case Marking, his findings provide experimental support for accounts describing Differential Case Marking in terms of harmonic alignment of grammatical functions and markedness hierarchies).
Similar results are found in the third experiment (self-paced reading), which investigates more complex structures. Interestingly, -ko marking in the absence of another -ko marker also leads to a (small) increase in processing load, which is not predicted by any of the theories. Furthermore, adjacency of two ablatives (marked by -se) does not lead to increased processing load. Since several other comparisons remain inconclusive (e.g. adjacent -ko phrases are not harder to process than non- adjacent ones contrary to RIT), Vasishth tentatively concludes that evidence from processing associated with case-marking favors RIT over DLT and EIC but provides ''only limited support for Lewis' similarity-based interference hypothesis [i.e. RIT]'' (p. 102; for an overview of the results, see p.100).
Chapter 4 presents three experiments investigating the effect of object- fronting. One self-paced reading experiment tests the effect of direct object fronting, and another self-paced reading experiment investigates indirect object fronting. Since both experiments yield mostly identical results, I describe only the direct object-fronting cases, illustrated above in (5b) vs. (5a).
The experiment provides support for the sensitivity of DLT and EIC to dependency length: Reading times on the integrating verb ('khariid-neko' - to buy, in (5)) were significantly longer when the direct object was fronted (i.e. when more material intervenes between the dependent and its head). RIT cannot account for this effect without additional assumptions (a potential revision of RIT is discussed in Chapter 5, pp. 156).
Potential but inconclusive support for RIT comes from the effect of -ko marking (as in the first three experiments): for objects that occur in the canonical position, -ko marking results in a slow down. Fronted objects show no effects of case-marking. Under the assumption that adjacent -ko marked NPs interfere more than non-adjacent ones (for which no evidence was found in the first three experiments), this effect is compatible with RIT (and not predicted by DLT and EIC) since the canonical word order results in adjacent -ko marked phrases, cf. (5a) vs. (5b).
Chapter 5 contains the final experiment, which provides evidence explicitly arguing against EIC and RIT, and not predicted by DLT (but see Gibson, submitted). The experiment shows a decrease in reading times on the verb ('khariid-neko' - to buy) if an adverb intervenes before the most deeply embedded verb, as in (7) but not (6):
(7) Siitaa-ne Hari-ko [Ravi-ko [kitaab-ko jitnee-jaldi-ho-sake khariid-neko] Sita-ERG Hari-DAT Ravi-DAT book-ACC as-soon-as-possible buy-INF (7 continued) bol-neko] kahaa. tell-INF told 'Sita told Hari to tell Ravi to buy a book as soon as possible.'
Chapter 6 contains a methodological discussion of the most adequate way to analyze reading time effects (the dependent measure in several of the experiments presented above). Although the evidence presented is of interest to researchers concerned with positional and word length effects on reading times, it is neither particularly strong (it mostly stems from null effects), nor suited for a review intended for a broader audience. Importantly, Vasishth concludes that the effect observed in Chapter 5 is not due to a positional confound (the verb is read later in those examples that contain an intervening adverb and such positional differences have been argued elsewhere to result in a speed-up).
Chapter 7 closes the discussion of the experiments with a concise evaluation of each theory's predictions (see the table on p. 189, overall DLT fares better than EIC and RIT) and introduces a new model of sentence comprehension, termed the Abductive Inference Model (henceforth AIM). Like the revised Retrieval Interference Theory (Lewis, 2001), and in contrast to DLT, EIC, and the original RIT (Gibson, 2000; Hawkins, 1994; Lewis, 1998), AIM combines memory-based principles with the construction of expectations about the structure that has yet to be processed given the information that has already been encountered.
AIM uses abductive reasoning to generate sets of hypotheses about possible parses given the information encountered so far. Crucially, only minimally consistent hypothesis are entertained. That is, AIM assumes that, out of all parses consistent with the current input, comprehenders only consider the minimal ones. For example, comprehenders do not consider any parses that would require more NP arguments to be introduced than minimally necessary to finish the sentence (e.g. in German, if the first word is a nominative case-marked NP, comprehenders would not consider that this could be followed by a transitive verb; instead only an intransitive verb is considered at this point). In other words, comprehenders construct as 'cheap' a hypothesis space as possible given the available input (this bears resemblance to Frazier's (1987) Active Filler Strategy). AIM calculates processing difficulty at each encountered word as a sum of mostly three factors: (a) the construction of referents for each NP encountered; (b) number of predicates expected given current hypotheses about possible parses; (c) the number of available minimally consistent hypotheses. A fourth factor, termed Mismatch Cost, can add to the overall processing cost: whenever a verb is processed, the processing load is increased for each failed attempt to match the verb with one of the hypothesized predicates (see (b) above). The verb-predicate matching algorithm is assumed to proceed from the outmost predicate inwards.
In the final part of Chapter 7, Vasishth discusses evidence in favor of AIM coming from Dutch, German, Japanese, and Hindi.
WM is the result of an impressive research project. Without almost any earlier processing literature on Hindi available, Vasishth presents thoroughly controlled experimental studies that yield intriguing insights into the structure of Hindi as well as the processing of dependencies in head final, dependent-marking languages. Several of the experimental findings pertain to important questions in the processing literature (e.g., the nature of working memory effects in sentence processing; predictability vs. locality effects of the distance between a head and its dependents). The argumentation and presentation of the results are clear and well-structured throughout WM. Thanks to this clarity, the book should be very accessible even to readers so far unfamiliar with the literature on SCEs. Below I briefly discuss three issues raised in WM that I deem of particular interest to a broad community of researchers.
First, the effect of multiple -ko marking and the lack of such an effect for -se relates to the research on case OCP effects in Hindi. Moreover, the theoretical motivation of similarity-based interference in working memory can be seen as providing the motivation for the case marking OCP effects discussed in the linguistic literature. It is somewhat unfortunate that this issue is not raised in WM. Especially, since (Mohanan, 1994: 208) presents a potentially revealing example: (8) is supposedly more acceptable with the additional -ko marked intervener raat-ko:
(8) Ramm-ko (raat-ko) bacco-ko samhaalnaa paadaa. Ram-DAT night-at children-ACC take-care-INF fall-PERF 'Ram had to take care of the children at night.'
This lends support to Vasishth's observation that RIT may be too restrictive if only form-similarity is considered (in which case (8) should be harder with the additional -ko intervener). Apparently, raat-ko is not similar enough to cause interference (due to its different semantic and syntactic status).
Second, the object-fronting effect is relevant for ongoing research on word order freedom in Hindi and other South Asian languages (see several articles in Butt et al., 1994). Interestingly, Vasishth cites several follow-up studies (conducted by him) showing that fronting effects disappear for indirect objects but not for direct object once a proper discourse is provided. This may be taken to indicate that indirect object fronting is subject to discourse/information structure constraints while direct object fronting is not.
The speed-up on the verb observed in the final experiment pertains to predictability-based models of processing such as Hale (2001) (and more recently Gibson, submitted; Levy, 2005) . The effect adds to similar evidence coming from German (Konieczny, 2000) and Japanese and re-iterates the necessity of a predictability-based component in theories sentence processing. While Vasishth (ibid) fairly comments that precise predictions are hard to derive for such accounts given the lack of large parsed corpora of Hindi, it seems rather clear that predictability-based accounts would, at least for some cases, make similar predictions as Vasishth's AIM. WM could therefore have benefited from a more detailed discussion of the role predictability plays in language processing (e.g. Hale's 2001 model is only mentioned in passing, p. 221).
Given the task it takes on, it is unsurprising that WM also has some minor shortcomings. Here, I will briefly mention one: Vasishth presents at times a slightly distorted version of Gibson's DLT (this is pervasive throughout the book and potentially confusing). Contrary to Vasishth's claims, DLT considers indefinite NPs (in this case bare direct objects) to only cause a higher processing load on the verb if they intervene between the verb and its dependent. Definiteness of the dependent itself is not predicted to matter (Gibson, 2000, and personal communication).
Researchers working on aspects of Hindi morphology and/or syntax may find some of the assumptions Vasishth makes in the introduction problematic, but I strongly recommend approaching WM with an open mind, keeping in mind that Vasishth accomplishes what still relatively few even approach: typologically interesting, experimentally well-controlled work on sentence processing. In sum, I highly recommend WM. WM provides crucial insights into the nature of the human language processor that cannot be obtained from the study of English alone.
AISSEN, JUDITH. 2003. Differential Object Marking: Iconicity vs. Economy. Natural Language and Linguistic Theory, 21.435-83.
BICKEL, B. and YADAVA, Y. P. 2000. A fresh look at grammatical relations in Indo-Aryan. Lingua, 110.343-73.
BUTT, MIRIAM; KING, TRACY HOLLOWAY and RAMCHAND, GILLIAN (eds.) 1994. Theoretical perspectives on word order in South Asian languages. vol. 50. CSLI Lecture Notes. Stanford: CSLI.
FRAZIER, LYNN. 1987. Syntactic processing: Evidence from Dutch. Natural Language and Linguistic Theory, 5.519-60.
GIBSON, EDWARD. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68.1-76.
GIBSON, EDWARD. 2000. The Dependency Locality theory: A Distance-based theory of linguistic complexity, 95-126.
GIBSON, EDWARD. submitted. The interaction of top-down and bottom-up statistics in syntactic ambiguity resolution.
HALE, JOHN. 2001. A Probabilistic Earley Parser as a Psycholinguistic Model. Paper presented at Second Meeting of the North American Chapter of the Asssociation for Computational Linguistics.
HAWKINS, J. A. 1994. A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press.
HAWKINS, J. A. 2004. Efficiency and Complexity in Grammars. Oxford: Oxford University Press.
KONIECZNY, LARS. 2000. Locality and parsing complexity. Journal of Psycholinguistic Research, 29.627-45.
LEVY, ROGER. 2005. Processing difficulty in verb-final clauses matches syntactic expectations. Annual meeting of the Linguistic Society of America
LEWIS, RICHARD L. 1998. Interference in Working Memory: Retroactive and proactive interference in parsing. Paper presented at CUNY Sentence Processing Conference.
LEWIS, RICHARD L. 2001. Language. Berkeley Springs, West Virginia
MOHANAN, TARA. 1994. Case OCP: A Constraint on Word Order in Hindi. Theoretical Perspectives on Word Order in South Asian Langauges, ed. by Miriam Butt, Tracy Holloway King and Gillian Ramchand. Stanford: CSLI.
WARREN, TESSA and GIBSON, EDWARD. 2002. The influence of referential processing on sentence complexity. Cognition, 85.79-112.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Florian Jaeger is a Ph.D. student at the Linguistics Department, Stanford University supposedly in the process of writing his thesis on production- driven variation. His current research interests include English prosody (phrasing, as well as post-nuclear prominences), and processing-based models of linguistic variation. This includes work (more often than not with hordes of other researchers) on wh-phrase ordering (and Superiority), work on relativizer and complementizer omission, work on choice of linguistic expressions (the distribution of anaphors vs. pronouns), as well as work on constructional choice (existential vs. canonical subject constructions).