Date: Fri, 6 May 2005 09:15:56 +1000 (EST)
From: Baden Hughes <>
Subject: Visualizing Document Processing

Baden Hughes, Department of Computer Science and Software Engineering,
University of Melbourne


This book adopts as its core the idea that text processing, either
cognitive or computational, is in fact the linguistic realisation of more
abstract information management and processing. By its own admission, this
volume is intended for a specific audience, namely information specialists
whose interests are in the area of document reception and production,
where accuracy and reliability are a crucial factor in information
analysis. This book is likely to be of interest to researchers in the
areas of text linguistics, semiotics, document processing, dialogue
modelling, pragmatics, natural language generation and cognitive science.

In the first chapter, the theoretical background of a new approach towards
language and information is presented - new terminology introduced and
concepts defined. An polysynthetic, interdisciplinary approach is used in
defining both scientific and linguistic paradigms through which
interpretation can be carried out. The main paradigms incorporating
intelligence in machines (knowledge-based systems, artificial neural
networks, evolutionary computing, fuzzy logic and artificial agents) are

In the second chapter, the implications resulting from rethinking of
language and text through illustration both of the theory of text
comprehension and text compression form the body of the work.

The third chapter illustrates in greater detail this new perspective on
communication by familiarizing the reader with a complex visual system for
interpretation of qualitatively different components in natural language,
particularly textual documents. In the analysis of the physical
manifestation, a stratified observation framework is adopted, allowing
focus on different aspects of interpretation at both the macroscopic and
microscopic levels. New (cognitive) tools for observing, describing and
explaining qualitatively different phenomena in natural language are

The fourth chapter illustrates further evolution of the model, in
particular its final output: CTML, a systematic but informal markup
language used for strategic document annotation. This markup language and
its corresponding document model, represent the climax of the research.

Concluding, the fifth chapter contains theoretical reflections about the
requirements for metaphor creation in modern information science, together
with practical suggestions for verifying and augmenting the consistency
and relevance of analogical reasoning.

The main line of argument throughout the volume is as follows. The focus
is on the representation of text procedures in terms of definitions and
their visualizations (Chapter 3). The understanding of these constructions
is prepared by the introduction of the conceptual system and the
discussion of the various scientific paradigms which have an impact on
them (Chapter 1) and the development of a particular theory of language
and text (Chapter 2). The representation of the system itself is followed
by a discussion of the application of the visual system as a kind of
markup language (annotation language for documents) and its role in the
work of information analysts (Chapter 4). The theoretical framework which
starts with Chapter 1 and 2 is completed by a final chapter in which the
crucial role of metaphors and analogies for scientific exploration
receives additional emphasis (Chapter 5).


Much of the motivation for this book appears to have been drawn from
previous research by Tonfoni, who developed the CPP-TRS theory of text
comprehension in the early 1990s.

For researchers who seek a computationally tractable representation with
formal grounding, this text will be found wanting despite the apparently
short distance between the abstractions discussed and such a mode. At a
theoretical level, the "machines" which form the framework for Chapter 3
could equally well be expressed in alternative formalisms - graph theory,
finite state machines being two which come to mind but which are not even
mentioned in passing in the text. At a practical level, the obvious
affinities with hypertext theory are never explored, yet are immediately
apparent when discussing internal linkage and annotations within a
document interpretation instance.

The components of the document annotation language (CTML) appear to be
disengaged from naturally aligned theories which will be familiar to
linguists (for example Rhetorical Structure Theory). While this in itself
is not a major shortcoming, it does contribute to the overall feeling that
this work is sufficiently removed from the core of the linguistics
discipline that its contribution may not be as great as its potential.

At its core, the book proposes CTML, a document markup language.
Disappointingly, CTML is presented as little more than a series of
character based annotations and is not reduced to a computationally
tractable representation, despite this apparently being quite trivial.

At an editorial level, a number of distracting features appear. Aside from
the regular typographical errors, this volume features introverted
citation - referencing itself as a manuscript on a number of occasions
when cross-referencing to appropriate sections would have been more
appropriate. Much of the primary locus of the book, found in Chapter 3,
has a distinctly recycled feel with scarce concern for editorial
contributions. Certain points, such as "CPP-TRS is a methodology and a
language" are unnecessarily repeated throughout. Such oversights are
unfortunate since the overall contribution of the book is unique in its
field and will doubtless be of value to researchers in the areas of text
linguistics, semiotics, document processing, dialogue modelling,
pragmatics, natural language generation and cognitive science.


Baden Hughes is a Research Fellow in the Department of Computer Science
and Software Engineering at the University of Melbourne and a Research
Engineer in the Victoria Laboratory of NICTA, Australia's Centre of
Excellence in Information Technology. His research interests are in the
areas of formal and computational models of human language; statistical
natural language processing; digital libraries; web data mining
documentary linguistics; computer-assisted language learning and
information security.

