LINGUIST List 4.82

Mon 08 Feb 1993

Misc: That's history, Lexical text analysis

Editor for this issue: <>


  1. , Genitive THAT(')S and WHOSE (4 screens)
  2. Stephen P Spackman, Re: 4.51.1 Lexical Text Analysis

Message 1: Genitive THAT(')S and WHOSE (4 screens)

Date: Thu, 4 Feb 93 11:02:53 ESTGenitive THAT(')S and WHOSE (4 screens)
From: <>
Subject: Genitive THAT(')S and WHOSE (4 screens)

 I agree with Dick Hudson that the grammatical import of possessive
"that(')s" (The pencil thats lead is broken) is separate from whether it
is spelled with an apostrophe or not. He goes on to say "My suspicion
is that `that's' is in fact an ancient form, which hasn't yet been
supplanted by the foreign WH form, `whose'." Heres* some more info on
these words. *Caveat: I am eschewing all non-citation apostrophes.
 Forms of WHOSE (and other WH- (then HW-) words) are attested in
English since the 9th c. The OED gives early Middle English instances
of this as a genitive relative pronoun with a non-human (but animate)
antecedent, e.g. in the Trinity College Homilies of ca. 1200: "...he
tedh fordh geres hwile after fox. hwile after wulue. hwile after leun.
hwile after odhre. and on ech of hise deden is iefned to the deore WUAS
[=whose] geres he fordhteodh." Morris (Early English Text Society no.
53 p. 34-36) translates "he practises the wiles, sometimes of a fox, at
other times of a wolf, sometimes of a lion, and at other times that of
other animals, and in each of his deeds he is compared to the animal
WHOSE tricks he exhibits."
 The OED (s.v. WHOSE) further says this word is used "In reference
to a thing or things (inanimate or abstract). Originally the genitive
of the neuter WHAT...; in later use serving as the genitive of WHICH...
and usually replaced by _of which_, except where the latter would
produce an intolerably clumsy form." Example of inanimate WHOSE:
Wyclif, 1382, "The loond of oyle and of hony;..WHOS stones ben yren"
(the land of oil and honey...WHOSE stones are iron).
 Moore and Marckwardts 1969 _Historical Outlines of English Sounds
and Inflections_, p. 154-155, fn. 161: "Modern English [hu] and [hum]
are used only of persons, but [huz] is used also as a singular and
plural neuter genitive form; this use goes back to the Middle English
period, though it does not occur in Chaucer."
 Now for THATS. This gets tricky, very. The OED (s.v. THAT,
relative pronoun) considers that the use of THAT as a relative ("he came
to a river that was broad and deep") is a development from THAT as a
demonstrative ("he came to a river; that was broad and deep.") Whether
this is right or not, they point out that it can be very difficult to
distinguish between the demonstrative and relative uses of this form in
Old English, and they illustrate this with an example from Bede: "Hi
waeron Wihtgylses suna. thaes faeder waes Witta haten. thaes faeder waes
Wihta haten. and thaes Wihta faeder waes Woden nemned." This, according
to the OED, could mean "They were sons of Wihtgyls; _his_ father [lit.
_that's_ father (where _that_ is demonstrative: that ones--EEM-G)] was
called Witta; _his_ father was called Wihta; and this Wihta's father was
named Woden." *OR* it could mean "They were sons of Wihtgyls _whose_
father was called Witta, _whose_ father was called Wihta, and _whose_
(Wihta's) father was named Woden." OED again: "Baeda's Latin has
_cuius_ in all three places, so that the translator apparently used
_thaes_ as a relative. See also Wu"lfing _Syntax Alfreds des Grossen_
I. 275." I find them more convincing as genitive relatives than as
 So it seems that in Old English the genitive relative pronoun THAES
was possible (human antecedent); it was the usual genitive inflection of
the demonstrative pronoun/definite article, also used as the relative
pronoun. I presume that any gender and number of this pronoun could
also be used in the genitive in the same way (the neuter sg. would also
have been THAES, the fem. sg. THAERE). The animate/inanimate
distinction we perceive nowadays would not have been operative; instead
the relative pronoun would agree with the grammatical gender of its
antecedent. Case inflections in this paradigm were lost in Middle
English along with other nominal inflections, and the demonstrative/
relative prn became invariant. Whether for that reason or not, in
Middle English WHOSE is an acceptable animate/inanimate genitive
relative whereas I cant find examples using a form of THAT (someone else
may be able to). I would guess that modern possessive THAT(')S is not
directly descended from OE usage of THAES through subterranean channels,
but is a re-invention of an equivalent form for use with an nonhuman or
inanimate antecedent.
 Jespersen (_Modern Eng. Grammar on Historical Principles_), Visser
(_An Historical Syntax..._), and the above-mentioned Wu"lfing (OED
quote) might well have more examples and a fuller discussion. Mitchell
may have written on the problem. Fisiaks _Bibliography of Writings for
the History of the English Language_ lists a number of articles which
might shed more light on this, including:
Andrew, S. O. 1936. "Relative and Demonstrative Pronouns in Old
 English." _Language_ 12:283-93.
Anklam, E. 1908. _Das englische Relativ im 11. and 12. Jahrhundert._
 Berlin: Mayer and Mu"ller.
Dowsing, A. 1979. "Some Syntactic Structures Relating to the Use of
 Relative and Demonstrative _thaet_ and _se_ in Late Old English
 Prose." _Neuphilologische Mitteilungen_ 80:289-303.
Caldwell, S. J. G. 1974. _The Relative Pronoun in Early Scots_.
 Helsinki: Societe Neophilologique.
Heltveit, T. 1953. _Studies in English Demonstrative Pronouns..._
 Oslo: Akademisk Forlag.
Jack, G. B. 1975. "Relative Pronouns in Language AB." _English
 Studies_ 56:100-107.
Jones, C. 1972. "Syntactic Change in Genetically Related
 Forms...Determiners, Personal and Relative Pronouns in Early Middle
 English." _Edinburgh Working Papers in Linguistics_ 1:116-28.
By the way, Quirk, Greenbaum, et al. (section 6.34) are positive that
only the wh- relative pronouns have case contrast, and that if you want
to avoid the use of WHOSE for a neuter antecedent the only option is "of
Elise Morse-Gagne
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 4.51.1 Lexical Text Analysis

Date: Mon, 01 Feb 93 18:35:19 +0Re: 4.51.1 Lexical Text Analysis
From: Stephen P Spackman <>
Subject: Re: 4.51.1 Lexical Text Analysis

M. Maron of the Institute of Russian Language in Moscow discusses issues
of extracting bibliometric data and actual occurrence lists for
classes of words, for works on the scale of full novels:

|The problem is to perform these activities effectively: concordance
|word crunchers I know need up to analysed_text_volume *10..30 for some
|service indexes, which makes the search process not practical for real

These estimates are not at all commensurate with my experience at the
ARTFL project (American and French Research on the Treasury of the
French Language) at The University of Chicago(*). That database consists
of several thousand works of french literature totalling on the order of
a gigabyte of uncompressed ASCII text. The process of inverting the
database to produce exact (not page-level) indices for all words in the
corpus took on the order of two weeks and consumed only two gigabytes of
temporary storage; after preprocessing was complete, compression enabled
us to fit the text, the indices, additional indices for sentence,
paragraph and page boundaries, and word frequency summaries by work, by
author and by year into less total space than the original input data.
The whole is searchable by arbitrary word patterns (which is not quite
as good as morpheme indexing, but we had licensing difficulties with the
morphological analyser we were using), and for cooccurrences. I estimate
that full parses could be fit into the 1:1 space budget as well, given a
broad-band automatic parser with human post-processing. Of course, this
work was all performed with Unix tools or with software that we wrote
for the purpose; not with off-the-shelf applications.

The implication is that the concordance software that M. Maron is
familiar with has been written completely without regard to space
consumption (a gigabyte of disk space represents to the US market only a
$1500 investment, after all, and will house several novels even at a
3000% overhead). If disk space is at a premium, it is not surprising
that he finds he is better off using standard tools like grep (which
take effort to master but are quite versatile); such small corpora as
single novels are well within the reach of brute force methods, and
commercial concordance software seems primarily to provide a convenient
interface rather than any particular added functionality. Managing
medium scale textual databases, meanwhile, is still an area of active
research, but overhead is (or can be made) much lower than Maron's
experience would suggest.

stephen p spackman +49 681 302 5288(o) 5282(sec)
 dfki / stuhlsatzenhausweg 3 / d-w-6600 saarbruecken 11 / germany

(*) For information relating to the ARTFL project contact Mark Olsen
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue