This book presents a new theory of grammatical categories - the Universal Spine Hypothesis - and reinforces generative notions of Universal Grammar while accommodating insights from linguistic typology.
“The Handbook of Computational Linguistics and Natural Language Processing,” edited by Alexander Clark, Chris Fox and Shalom Lappin, is a large collection of 22 works that covers the field of Computational Linguistics (CL) and Natural Language Processing (NLP), ranging from the theoretical aspects (formal language theory, language models, among others) to the most concrete applications (machine translation, question answering). The coverage is so broad that the work can be considered a fundamental volume collecting a comprehensive view of applications, methodologies and base theories in the field of CL and NLP.
For the same reason, it can serve as a reference manual for a wide audience, even without assuming all readers to be interested in or specialists in all the different aspects; the different chapters give the theoretical basis, the historical background and the overview of the state of the art on each of the topics. In what follows, the organization of the handbook is detailed chapter by chapter.
In the introduction, the structure of the manual is presented in order to offer to the reader a clear map (section by section and chapter by chapter) of the specific contents. The chapter ends by presenting the goals and aims of the manual, and explaining the reasons for its particular organization, the choice of topics and the development of each chapter.
Chapter 1: “Formal Language Theory” (by Shuly Wintner).
The chapter starts with a basic introduction to formal language, without assuming familiarity with the topic; it is nonetheless advisable, in order to follow it, to have some familiarity with basic features of the language and the notational methods and operations (of mathematical and logical nature). The chapter then goes on with Regular Languages, Finite State Automata, Transducers, Context Free Languages and the Chomsky Hierarchy.
Chapter 2: “Computational Complexity in Natural Language” (by Ian Pratt-Hartmann).
Pratt-Hartmann starts with a review of Complexity theory, stating its goals and presenting the basic methodology. A solid knowledge in mathematics and logic is advisable, even though the chapter provides a step-by-step introduction to the topic. Turing machines, decision problems, parsing and recognition, and semantics are presented through the analysis of theorems and definitions, each one with detailed examples.
Chapter 3: “Statistical Language Modeling” (by Ciprian Chelba).
The third chapter starts with the very beginning steps of Language Modeling (LM), presenting the chain rule and n-grams, and then discussing perplexity, in order to smoothly follows towards the Structured Language Model and its applications in Speech Recognition.
Chapter 4: “Theory of Parsing” (by Mark-Jan Nederhof and Giorgio Satta).
The fourth chapter presents the theoretical bases of Parsing, phrase structure and dependency structure, Probabilistic Parsing and (Lexicalized) Context Free Grammars, leading into a discussion, detailed and rich in examples, of some basics of the most common applications like Translation.
Chapter 5: “Maximum Entropy Models” (by Robert Malouf).
Robert Malouf presents, before entering into a discussion about practical applications, the theoretical basis for the Maximum Entropy Model (MaxEnt). The chapter deals with the theoretical development of MaxEnt, moving from Shannon through probabilities in order to bring the reader to the applications: Parameter Estimation, Regularization, Classification and Parsing, among others.
Chapter 6: “Memory-Based Learning” (by Walter Daelemans and Antal van den Bosch).
Memory-Based Learning (MBL) is presented alongside other methods (MaxEnt, Decision Trees, Artificial Neural Networks) for supervised classification-based learning in the sixth chapter. The work follows up with the discussion of some NLP applications like Morpho-phonology, Syntacto-semantics, Text analysis, Translation, and Computational Psycholinguistics.
Chapter 7: “Decision Trees” (by Helmut Schmid).
Schmid presents another method for annotating linguistic entities through classification: decision trees. The chapter explains through examples how Decision Trees are inducted from training data, and moves then to the applications, like Grapheme-to-morpheme conversion, and POS-tagging. The chapter ends with a discussion about advantages and disadvantages of Decision Trees.
Chapter 8: “Unsupervised Learning and Grammar Induction” (by Alexander Clark and Shalom Lappin).
This chapter addresses two main aspects of Unsupervised Learning: the advantages and disadvantages of unsupervised learning applications to large corpora, and the possible relevance of unsupervised learning for the debate about the cognitive basis of human language acquisition. The topics are presented in an accurate manner, discussing the comparison between supervised and unsupervised learning. Examples in classification tasks and parsing are presented. The last section of the first part of the chapter compares supervised, unsupervised, and semi-supervised learning, taking into account the “accuracy vs. cost” dichotomy, and also discussing the possibilities for future developments of the field.
The second part of the chapter, discussing the new insights that unsupervised learning has brought to human language acquisition studies, presents a broad vision of the state of the art in human language acquisition.
Chapter 9: “Artificial Neural Networks” (by James B. Henderson).
The chapter starts with an introductory background section that presents Artificial Neural Networks (ANN) and Multi-Layered Perceptrons (MLP), the most commonly-used type of ANN in NLP, and statistical modeling. It then moves on to contemporary research in NLP like the improvement of large n-grams, parsing (constituency, dependency, functional and semantic role parsing), and tagging, discussing advantages and disadvantages of ANN and SLM.
Chapter 10: “Linguistic Annotation” (by Martha Palmer and Nianwen Xue).
This chapter presents Linguistic Annotation starting from the early times of the Penn Treebank and the Semcor, to the British National Corpus, and to present-day work in annotation; the discussion also touches on different schemes, presenting “a representative set of widely used resources” (p. 239) such as Syntactic structure, Independent semantic classification, Semantic relation labeling, Discourse relation, Temporal relation, Coreference, and Opinion tagging. The second part of the chapter deals with the annotation process, analyzing it step by step from the choice of the target corpus to the study of efficiency and consistency of annotation, to the presentation of the possible infrastructures and the available tools, and concluding with evaluation and pre-processing.
Chapter 11: “Evaluation of NLP Systems” (by Philip Resnik and Jimmy Lin).
The chapter starts with a broad discussion presenting some fundamental concepts of NLP systems (Automatic/manual evaluation, Formative/summative evaluation, Intrinsic/extrinsic evaluation, Component/end-to-end evaluation, Inter-annotator agreement and upper bounds), then moving on to discussing the partitioning of data and cross-validation advantages, eventually closing the section with a summary of the evaluation metrics and comparison of their performance. The following part of the chapter offers an introduction to the three NLP evaluation categories (one possible correct output, various outputs possible, scalable values outputs). The chapter ends with two case studies, both well explained and detailed, that give the reader a quick and concrete reference for the previously explained theory.
Chapter 12: “Speech Recognition” (by Steve Renals and Thomas Hain).
The chapter deals with Automatic Speech Transcription, starting from statistical frameworks and the usage of corpora for the development and evaluation of the algorithm. After the statistical section, the authors focus on the Acoustic generative modeling of p(X|W) and approach modeling through Hidden Markov Models. The last section deals with the decoding issue (search) and the maximization of the computed probability through the Viterbi algorithm. The chapter ends with the analysis of a case study and the study of the performances of preset day systems, their strengths and their weaknesses.
Chapter 13: “Statistical Parsing” (by Stephen Clark).
The chapter starts by introducing some baseline questions about the grammar, the algorithm, the model and the choice of the best parses from a theoretical point of view. It then presents an historical review of the topic (beginning with the very first attempts in Sampson 1986, down to present-day works).
The author focuses next on Generative (with special attention to Collins models) and Discriminative parsing models. The author then analyzes in detail Transition based approaches presenting various examples in the literature, and concludes the study of Statistical Parsing with Combinatory Categorial Grammar.
Chapter 14: “Segmentation and Morphology” (by John A. Goldsmith).
Goldsmith starts by presenting the basic definition of morphophonology, morphosyntax, and morphological decomposition as a brief overview. The chapter goes on with more technical NLP insights, discussing Unsupervised Learning of Words and “four major approaches” (p. 373), namely Olivier, MK10, Sequitur and MDL. The following section presents Unsupervised Learning of Morphology from the beginning of the studies in the 1950s with Zellig Harris to present-day works. The chapter ends with a discussion about the Implementation of Computational Morphologies, the usage of Finite Stage Transducers and the case of morphophonology.
Chapter 15: “Computational Semantics” (by Chris Fox).
The chapter, after stating the difference between formal semantics and computational semantics, moves on to formal theory and logical grammar, in order to present background on the computability of semantics and different approaches. The author goes on by presenting the state of the art as propaedeutical material for the next section about research issues such as intentionality, non-indicatives, and expressiveness, among others. The chapter ends with a less theoretical topic, namely corpus-based and Machine learning methods in computational semantics, thus putting some distance between the more classical strictly formal logic approach and computational semantics.
Chapter 16: “Computational Models of Dialogue” (by Jonathan Ginzburg and Raquel Fernández).
This chapter starts with discussion of the basics characteristics of dialogue and peculiarities from the point of view of structure, in order to define the methodological challenges of computational modeling of dialogue. Once the theoretical questions are settled, the author presents approaches to Dialog System Design and evaluation through comparison (query and assertion, meta-communication, fragment understanding benchmarks). The second part of the chapter is dedicated to Interaction and Meaning (Coherence, Cohesion, Illocutionary interaction, query and assertion, etc.) and to the models for automatic learning of dialogue management (based on Markovian Decision Processes). It presents “the underlying logical framework [...] [that] provides the formalism to build a semantic ontology and write conversational and grammar rules” (p. 453). The chapter ends with “Extensions”, offering suggestions for further development of the topics treated that could not find space in the manual.
Chapter 17: “Computational Psycholinguistics” (by Matthew W. Crocker).
The chapter presents, at the beginning, an introduction to the topic as a manner of establishing the reach and limitation of the very term “computational psycholinguistics”, in order to specify the basis for the entire chapter. A discussion of Symbolic Models follows, starting from the first examples in the 1980s of computational parsing models, then continuing into a section dealing with Probabilistic Models (touching lexical and semantic ambiguity, syntactic processing, and disambiguation issues, among others). The Sentence Processing section presents the application of Artificial Neural Networks (here called Connectionist networks), discussing advantages and criticism and following into presenting Hybrid Models.
Chapter 18: “Information Extraction” (by Ralph Grishman).
The first chapter among the “Applications” part deals with Information Extraction (IE), and, after a short historical overview, presents its four main tasks: name extraction, entity extraction, relation extraction, and event extraction. For each one of the four sections, the discussion starts from the analysis of some of the first approaches to IE with hand-written rules and with Named Entity tagged corpora for supervised learning, and reaches the presentation of the state-of-the-art methodological approaches in IE.
Chapter 19: “Machine Translation” (by Andy Way).
As stated in the introductory remarks, the chapter is divided into two parts, one presenting the “state of the art in Machine Translation (MT)” and the second presenting research in hybrid MT (p. 531). The first part jumps, in fact, directly into current MT, avoiding the historical background, and directly addressing the Phrase-Based Statistical Machine Translation (PB-SMT), thus presenting all the steps for the development of a corpus-based system (pre-processing data, clean-up, segmentation, tokenization, word/phrase alignment, language models, decoding, among others). This thorough section ends by discussing various approaches to evaluation in MT. The next section discusses some of the currently developed (or under development) alternatives to PB-SMT, such as Hierarchical Models, Tree-based Models, Example-based MT, Rule-based MT and hybrid methods. The second part of the chapter details research at Dublin City University (DCU) in the field of MT, presenting work done in many directions, combining syntax-driven SMT, hybrid statistical and EBMT, tree-based MT, rule based and much more.
Chapter 20: “Natural Language Generation” (by Ehud Reiter).
The twentieth chapter starts with a brief introduction on Natural Language Generation (NLG) and choice making. The subsequent section discusses the problem through the analysis of two NLG systems: SunTime and SkillSum. After a review of some other alternatives to these two, the chapter continues by analyzing the task of NLG into its basic steps: document planning (choice making issues), microplanning (lexical choice, reference, syntactic choice, aggregation) and realization. The chapter ends with a detailed discussion about evaluation for NLG systems and some overview of currently under-development research topics (statistical NLG, affective NLG). The closing section lists some of the resources available in NLG such as software, data resources and further readings.
Chapter 21: “Discourse Processing” (by Ruslan Mitkov).
Mitkov starts with a practical approach to the basic notion of discourse, by presenting an example-based discussion of the coherence-cohesion dichotomy and the different types of discourse. The second section deals with Discourse Structure: organization and segmentation algorithm (TextTiling). The subsequent part of the chapter goes into details, analyzing Hobbs' theory of coherence, Mann and Thompson's Rhetorical Structure Theory (Mann & Thompson, 1988) and Centering (Grosz et al. 1995). The fourth section deals with anaphora resolution, starting from the basic definition of anaphora and reference, then moving to the computational problem of anaphora resolution and the related algorithm (full parsing, partial parsing and their comparison). The chapter ends with a panoramic view of applications in discourse processing (in discourse segmentation, discourse coherence and anaphora resolution). A “further reading” section closes the chapter with a rich presentation of interesting possible amplification and development both from the statistical approach point of view and from the corpus-based approach.
Chapter 22: “Question Answering” (by Bonnie Webber and Nick Webb)
The authors start with a review of Question Answering (QA) systems from their first steps until state-of-the-art implementations. The discussion analyzes the different steps of question typing, query construction, text retrieval and text processing for answer candidates, and evaluation through examples; it goes on with a theoretical development of the topics. The second part of the chapter considers the current developments QA is now addressing. One of the topic is corpus-related research in order to achieve improvements in the “understanding the question” problem; on the other hand, the subsequent sections focus on the improvement of choice of answers through user's analysis, by analyzing how different users might judge different answers as correct, or by solving the semantic ambiguity of the questions. The chapter closes with a discussion on QA systems evaluation, concentrating on the possible need for new and better evaluation methods for QA systems.
The book is a wonderful work both from the point of view of content and form. Compared to other manuals, it probably covers the broadest panorama in state-of-the-art NLP and CL, thus becoming (one of) the most complete manuals on these areas.
Because of the aim of covering such a broad field as NLP and CL, some chapters might seem a bit loosely related to one another. This is inevitable in a work that is organized in 22 chapters that cover something of such an amplitude as (almost all) NLP.
Even though the topics of the chapters range from Formal Language Theory to Machine Learning, to Morphophonology, to Parsing, the structure of the manual itself is solid and the work is well organized. In some cases, a stronger set of cross references could have added to usability, even though the direct linkage between the main topics across the chapters is always present.
Among the book’s qualities, besides its completeness and the wide range of topics treated, other points of strength should be mentioned: the constant development of the chapters making a parallel between theory and practice is definitely a plus, being for the majority of the topics a smooth “crescendo”. Nonetheless, it could be noted for some chapters that the jump from theory to applications might be somehow rough or abrupt for somebody who is unfamiliar with the topic.
It is difficult to find negative points in the work; something that might be observed, more from an editorial point of view, is the reference section condensed at the end of the book, resulting in a little cumbersome 86 double-column pages. Taking into account the broad coverage of the 22 chapters, it would be easier for the user to search through references if they appeared at the end of each chapter, thus limiting the searching to the topic the reader is interested in. It is nonetheless understandable that this choice would have brought to a considerable redundancy in some cases, and therefore it might simply be a space issue.
For the same reason, the editors omit some potentially useful tools, like a list of formulas and equations (maybe also containing algorithms), and even a list of acronyms, which might have increased hugely the usability of the manual. It should be kept in mind, in fact, that it will probably serve as a reference manual, not necessarily a book to be read from beginning to end. On the plus side, the manual provides a complete List of Figures, a List of Tables, an Author Index and an even more useful Subject Index to compensate for the unavoidable density of the chapters.
The last observation refers to some differences between the single chapters, where the structure is sometimes a little different. A somehow slightly firmer template, implying overview, state-of-the-art, further reading sections for all chapters could have helped in giving a more uniform micro-structure thus improving usability and decreasing searching time. Again, this is not a content issue, since each chapter presents all this information, they are just organized (or named) in a slightly different way.
The overall evaluation is therefore definitely very good: the work is solid, complete and definitely an important reference for NLP and CL.
Grosz, B.J., Joshi K. Aravind, & Scott Weinstein. 1995. Centering: a framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203-25.
Mann, William C. & Sara A. Thompson. 1988. Rhetorical Structure Theory: towards a functional theory of text organization. Text 3:243:81.
Sampson, Geoffrey. 1986. A Stochastic Approach to Parsing, in Proceedings of the 11th International Conference on Computational Linguistics, 151-5.
ABOUT THE REVIEWER:
Mauro Costantino is invited professor at the Universidad Mayor de San Andrés (UMSA) of La Paz, at the Universidad Pública de El Alto (UPEA). His main interests range from Second Language Acquisition, comparing the acquisition of the Italian verb system by speakers of different languages, to Translation Studies, to corpus linguistics (focusing on learners corpora). He teaches Italian, translations seminar and introduction to computational and corpus linguistic at UMSA, as well as organizing new introductory experimental seminars in computational and corpus linguistics at UPEA. Besides actively cooperates to the VALICO (www.valico.org) and VALERE (www.valere.org) projects from the University of Torino (Italy) he is working at various projects (one in translation and one in corpus implementation) with the Literature Department at UMSA. In his “free time” he is translator and general secretary of the Società Dante Alighieri of La Paz, Bolivia.