Language Evolution: The Windows Approach addresses the question: "How can we unravel the evolution of language, given that there is no direct evidence about it?"
The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.
Ravin, Yael and Leacock, Claudia, ed. (2002, paperback ed., 1st ed. 2000) Polysemy: Theoretical and Computational Approaches. Oxford University Press.
Eleni Koutsomitopoulou, Georgetown University, Washington DC, and LexisNexis Butterworths Tolley London, UK.
DESCRIPTION OF THE BOOK
This book is a broad survey of the issue of polysemy in theoretical and computational linguistics. It is a collection of 11 papers including an overview of the subject by Ravin & Leacock.
What each paper is about: Each paper in this edition sheds light on a different aspect of this multifarious issue. The theoretical approaches deal with the issue of polysemy as part of semantics (see the papers by Pustejovsky and Dowty), cognitive semantics (Fillmore & Atkins) and Goddard and discourse (Cruse) and grammar (Fellbaum). The computational approaches cover almost the entire spectrum of computational methodologies: from lexical solutions ala WordNet, to NLP and connectionism.
Ravin & Leacock's overview is a thorough survey of the issue and a preliminary introduction to the various approaches that are presented in the book. For instance, in the editors' review, polysemy is discussed vis-à-vis homonymy and indeterminacy. Also discussed is the role of context in sense disambiguation, as well as the various underlying (formal as well as cognitive semantic) theories of meaning and computational practices for word sense disambiguation.
Cruse's paper focuses on the role of context in polysemy, degrees of word dependency on context, and semantic discontinuity and "distinctness". Words that are stand-alone semantically (enough to be relatively unaffected of context) are called "discrete words", whereas words of lower "semantic density" are more easily affected and defined by context. Polysemy in this paper is explored from a lexical-semantic point of view as the result of a "wide spectrum of possibilities for context-dependency" for individual words. The paper is a great typology of context-word relationships with plenty of examples. An interesting ramification of the lexical-semantic perspective is that antonyms and hyponyms cannot assert context-independent meaning, or, worst, there is no such thing as absolute hyponymic or absolute antonymic sense/term. Cruse jokingly calls this new realization about word meaning the "soft semantics" which is definitely on a par with structuralism and perhaps even formalism. At the same time, Cruse also appears pessimistic about prototype theories of meaning, as prototypes are again representations of a chaotically behaving system of word meaning.
Fellbaum discusses "autotroponymy" (dubbed as polysemy) (from the Greek "tropos" which means "manner") in the English verb and noun systems. She argues that in English some verbs refer to specific ways/manners of performing actions denoted by other verbs ("stammering" for "talking", "sneaking" for "walking" etc). She points out that the "manner" relation between verbs is highly polysemous in the English verb system when compared, for instance, to the semantic relation of causative verbs to the corresponding inchoatives (John opened the door. The door opened.) This paper is a typology of changes in syntactic behavior in alignment with the various meanings of polysemous verb and noun forms (The kids behaved. vs. The kids behaved badly.). An interesting aspect of this study is that it examines polysemy/autotroponymy as the conflation between a "semantically specified sense" and its "more general superordinate". The troponyms (i.e. the polysemous terms) differ from their homophonous superordinates in their syntactic arguments. They also differ from their co-troponyms either in their syntactic properties or in their particular lexicalization ways (or both). For instance compare "behave" (semantically specific sense) with "behave well/bad/etc" (superordinate/troponym) and "be a good/bad/etc boy" (co-troponym).
Pustejovski zooms in the issue of argument structure vis-a-via polysemy within his generative lexicon theory. He argues that the known phenomenon of lexical shadowing typically occurring in the case of cognate object verbs such as "butter" (butter the bread) and "dance" (dance a dance) also shows up in other classes of verbs such as those noted in Fillmore and Atkins (1992) where the expression of an argument completely shadows the expression of another argument to the verb (risk my health/life -- risk illness/death).
Pustejovski also discusses various types of relations as denoted by verb argument structures, such as "containment relation" ((in a) book, (on a) disc etc) and "complex relation" (read the book, read the articles, read the articles in the book, read the book of articles). Since polysemy from this point of view refers to the semantic nuances that are due to the presence and various configurations in the argument structure, Pustejovksi also proposes a typology of "optionality" of arguments, which defines the types of arguments that are optionally expressed in a predicate. The article includes a general overview of the basic premises of the author's theoretical framework. Although this is a highly technical discussion that presupposes a fair amount of familiarity of the reader with Pustejovski's particular theory of generative Lexicon (1991, 1995), the article is relatively simple conceptually and the points it makes are well-known in literature.
Fillmore and Atkins provide a lexicographic analysis of word sense variety by examining the contents and structure of four British English language dictionaries (CIDE, COBUILD, LDOCE, OALD). They make the point that the number of different sense corresponding to a unique term in actual corpora far exceeds the number of sense variations pinpointed in the Dictionaries. Also absent in the dictionaries according to Fillmore & Atkins are metaphorical senses of terms. This study also includes crosslinguistic data by examining "matching senses" of a term by its equivalence in bilingual corpora. They close by criticizing traditional lexical semantics attempts to word sense disambiguation and proposing the methods of word sense analysis of the Berkeley FrameNet project (See general info about the project at: http://www.icsi.berkeley.edu/~framenet/book/FNIntro.html).
Dowty casts doubts on the traditional view, that he calls the "fallacy of argument alternation". According to this fallacy differing constructions (syntactic forms) may express identical intended meanings and correspond to identical propositions, an argument for the universal nature of semantic structure in natural language. Dowty instead points out that syntactic permutations serve to convey significant semantic or conceptual variations, and hence they should not be discounted in the name of propositional equivalence. To prove his point he examines a number of argument permutation phenomena such as passivization, tough- construction, middle construction, raising etc. In particular, he focuses on comparing constructions such as the intransitive "swarm"- alternation (Bees swarm in the garden. The garden swarms with bees.) and the transitive "spray-load"-alternation (Mary sprayed paint on the wall. Mary sprayed the wall with paint. Mary loaded hay onto the truck. Mary loaded the truck with hay.) from Fillmore 1968. After presenting the superficial commonality between these two different types of constructions, Dowty argues that they are fundamentally different and focuses on the former. The author goes as far as claiming that the intransitive "swarm"-alternation is a phenomenon of semantic extension and offers some pertinent historical linguistic evidence from German and French languages.
Goddard is a proponent of the Wiersbicka's Natural Semantic Metalanguage theory (NSM). He points out the capacity of NSM theory to tackle both word-level and syntactic-level polysemy. The entire theory is based on the notion of semantic primes that supposedly safeguard the lexicon from obscurity and circularity in lexical sense definition. Substitution is one of the tests for the validity of periphrases used to express alternate meanings corresponding to a unique term in the lexicon. The paper claims to offer a "semantic methodology" for lexical definition and consequently for polysemy. The papers makes the interesting point that grammatical constructions may also manifest polysemy, and it proposes a treatment for figurative language (within the same NSM framework) in relation to polysemy. A drawback of the NSM approach seems to be that meaning is treated as a tractable phenomenon and hence it is considered "accessible, concrete, and determinate", a perception that classical meaning typologies have repeatedly failed to prove true.
In computational linguistics, the treatment of polysemy falls into the class of issues that are tackled under the term "word sense disambiguation". Unlike their theoretical counterparts, the computational approaches are more interested in the development of efficient methods for word sense disambiguation rather than justifying the various historical, stylistic and theoretical issues surrounding polysemy.
Miller & Leacock focus on lexical representations for sentence processing. They argue that what is missing from dictionaries and semantic theories is a "satisfactory treatment of the lexical aspects of sentence processing". They deduce this problem to an examination of various methods for a more efficient representation of context. "Local context" is defined primarily by the syntactic categories of a term, i.e. the noun category of contexts, the verb category of contexts etc. Some terms may belong to more than one syntactic category and hence to more than one local contexts. Simple rule-based systems may address this issue. Miller & Leacock recognize the role of semantic information in determining the local context of a term's sense, and the fact that semantic information is not always present in the local context. For this reason they define a broader or "topical context". Topical context is defined as the general topic of a text or discourse, and the same term may mean different things as topic in different contexts.
For instance, consider the different meanings of "shot" in marksmanship, in a chat with a bartender, or a photographer, in a hospital, or in the context of a game of golf or basketball. The basic hypothesis of the authors is that if the linguistic context provides a clue about the primary discourse topic we can easily decide on the intended meaning of "shot" in the particular linguistic context. They then proceed to define how people define the topic of a discourse, and present some theories that determine the topic based on a statistical classification of the vocabularies and sub-vocabularies of a polysemous word in a discourse (although initial attempts have been applied to homonymous terms such as "crane" and "bass"). The problem is then that polysemy allows for finer distinctions between senses than that in the case of homonymy (for instance, "bass" is not only a distinction between fish and deep voice but also between deep voice and the man who carries it, the lowest frequencies in musical harmony, a bass horn or a bass violin and so on). In other words, in the case of polysemy the information of topical context alone may not be always sufficient.
Additional experimental comparison of three different statistical classifiers (a Bayesian classifier, a content-vector and a back- propagation neural network) showed that as the number of different senses of a term increase so does the difficulty of the algorithm to make accurate distinctions between them. In addition, some contexts seem to be inherently harder to identify than others. Compared to humans the three tested classifiers performed at about the same level of accuracy. In addition, topical information was proved to be useful when the polysemous terms were presented in sentences rather than in the context of co-occurring terms. Combined local and topical information methods may yield better results but still not as good as those yielded in human comprehension tests. The authors suggest that research in sentence processing in particular in argument structure and coreference would help elucidate sense disambiguation issues.
Stevenson and Wilks are concerned with polysemy (or Word Sense Disambiguation, WSD) in large corpora. They particularly point out that evaluation methods for WSD are usually based on small trial selection of text versus large corpora with dubious generality of results and performance. Another problem with current approaches that Stevenson and Wilks point out is the increased chances to meet novel word senses in large corpora, senses not yet lexicalized in existing dictionaries. Finally, the authors of this paper recognize that most research in NLP may use different ways of encoding or conceptualizing information, but in the case of WSD the variety of tools and techniques applied seem to be taken as representing different types of WSD information themselves. The above three issues render WSD a hard problem to solve.
For their experiments the authors used the machine-readable version of the LDOCE dictionary in order to make use of both a large-scale inventory of senses and a broad knowledge base for sense disambiguation. In the process of analyzing the lexical knowledge sources they faced the question of what is context to which they replied by selecting "larger linguistic structures" such as sentences and/or entire discourses, that offer the pertinent topics. For their experiments they also focused on the issue of combining various knowledge sources and they used a "memory-based learning algorithm" that provided a filter that removed senses from consideration thereby simplifying the WSD tasks, and also made use of various partial taggers which uses different knowledge sources from the lexicon in order to suggest a set of possible senses for an ambiguous term. For the evaluation of their experimental results they merged a WordNet list of manually tagged content words with the ontological hierarchy of the LDOCE dictionary, which they used as a "gold standard" of texts.
The authors conclude that both high-level (word-level) and fine-grained (sense-level) WSD is achieved at a level of over 90% accuracy with the high-level WSD tests obtaining higher accuracy between the two.
Dolan, Vanderwende and Richardson present MindNet, part of MS-NLP, which is a "broad-coverage, application-agnostic" NLP system developed by Microsoft Research. MindNet "provides the representation capabilities needed to capture sense modulation". Acquisition of new senses and new words is also possible via MindNet. Context is crucial in MindNEt and understanding of the meaning of a term equals to "producing a response that has been tied to linguistically similar occurrences of that word." The system learns by example (it is characterized as a "highly processed example base"). Inferencing via structured representations is also possible. These representations are "directed labelled graphs" that help overcome the limitations of word order and take advantage of hierarchical relationships outside the realms of syntactic relations (e.g. in order to show the indirect relationship between "car" and "truck", they use the graph: car- Hypernym -> vehicle <- Hypernym-truck, where "car" and "truck" are connected by virtue of their relationship to the same hypernym "vehicle"). The paths between the terms are weighted in a way that reflects their salience. A known current weakness of Mind-Net is that it is a static representation of relations with fixed weights that depend on the current associations the system contains. This means that anything beyond the level of individual words and at best sentences (for instance, inter-sentential relations and hence context and discourse) lies beyond the capabilities of Mind-Net.
Schutze's paper offers us a glimpse at the phenomenon of polysemy from a connectionist point of view. The author reminds us that models such as those of Rumelhart et al. 1986 and McClelland et al. 1986 aim at first to design a disambiguation algorithm that is psychologically plausible and is also applicable at a large-scale. The author explains the notion of semantic priming ("flower" will be read more quickly after the sentence "They held the rose" was presented vs. a sentence like "They all rose." containing an homonymous term) for sentence processing and connectionist methods for disambiguation. He then proceeds to explain word vectors, context vectors and sense vectors of activations in his proposed algorithm. He concludes that similarity in contexts is a crucial factor for determining word-level similarity and hence a reliable guide for grouping (clustering) and disambiguation of word-level senses. This paper presents promising work in polysemy in a manner that is psychologically plausible, but it fails to view polysemy as a generalized phenomenon affecting natural language not only at word-level but also at a level of sentences and discourses.
Finally, each paper comes with a wealth of bibliographical references pertinent to the particular model and strategy of analysis.
The book contains a wealth of useful information (principles, data configurations, methods, strategies and viewpoints) on the manifold problem of word sense ambiguation.
The theoretical linguistics papers focus on offering a typology of linguistic data, which they examine and then group in order to pinpoint the apparent regularities in them. At times (as in Pustejovski's work) theoretical approaches additionally offer a formal descriptive representation of the regularities in the data. The undoubted merit of such a theoretical approach to polysemy lies in the precision of the description and analysis of an otherwise not-so- systematic and homogeneous phenomenon in natural language. The obvious defect of such an approach lies in the nature of polysemy in natural language. Rules cannot adequately describe future behavior or presently undetected patterns in the data, since new rules need to be invented to encompass new data. In addition, rules have no explanatory power or value and describing natural language phenomena such as polysemy in formal rules offers no real understanding of the way language works.
The computational linguistics papers in this book focus on the applicability of various methods and tools proposed from various theoretical and computational sources and they render any pertinent issues of performance and evaluation prominent in research. Word sense disambiguation traditionally has been examined within the realms of word-level analysis, lexica, corpora, thesauri, knowledge bases and related tools and representations for paradigmatic relations. Most researchers in polysemy point out this obvious inadequacy of the computational approach, i.e. that it fails to take into account crucial factors such as (linguistic and pragmatic) context, and instead it is tied to a word-level partial solution of the problem. Computational systems and theories that incorporate disambiguation efforts as part of the set of their offering tool-set are usually more successful for this reason. In such systems, broader linguistic context is taken into account during the disambiguation process.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER Eleni Koutsomitopoulou is a PhD candidate in Computational Linguistics at Georgetown University (Washington DC) and a senior Indexing Analyst at LexisNexis Butterworths Tolley in London, Great Britain, where she currently lives. Her main research interests include Neural network models for Natural Language Processing, cognitive linguistics, indexing and pattern recognition applications for natural language.