LINGUIST List 5.1159

Fri 21 Oct 1994

Disc: Comparative Method

Editor for this issue: <>


  1. , Comparative Method (Alleged Ceiling on, and Related Questions)
  2. , Re: 5.1139 Comparative method
  3. Don Ringe, chance similarities (experiment)

Message 1: Comparative Method (Alleged Ceiling on, and Related Questions)

Date: Wed, 19 Oct 94 00:53:43 EDT
From: <>
Subject: Comparative Method (Alleged Ceiling on, and Related Questions)
Very funny, Jacques! Thanks for injecting the first bit of INTENTIONAL
humor into the discussion.
(1) The famous Bergsland reference claimed that some languages show
a LOWER rate of loss for the Swadesh list (which is a certain list
of 100 meanings which is claimed to exhibit a 14% per millenium
rate of loss) in some languages (notably, Icelandic), but not, as
far as I can recall, any examples with a HIGHER rate. Likewise,
Jacques Guy's reference to Muyuw is still unclear. The 20% rate
of loss in one generation is said to refer to the everyday vocabulary
or some such thing. That is irrelevant for our purposes. We need to
use the same list, otherwise we are comparing apples and oranges. M. L.
Bender found that just by changing the Swadesh list a little, he got
27% for a number of Ethiopian languages. So we really must stick to
the Swadesh list, the only one which has been tested on lost of
(that is, lots of) languages. The question then is simple: is there
a language for which the rate of loss from the Swadesh list is more
than 14% per millennium?
(2) Alice Faber says much that right, again, but does not really address
the central issue. Which is NOT whether there are all kinds of hypothetical
problems with hypothetical language classifications that may or may have
been proposed, but
 (a) whether there is a specific technical argument for a specific
 limitation on what the comparative method can do
 (b) more specifically, whether there is any basis for the claim
 that this limitation involves a time limit of somewhere between
 6000 and 10000 years.
As I have said, if you assume a particular rate of loss, a particular
number of languages, and a particular branching, you can mechanically
compute the chances that at least 2 languages will have preserved
a given percentage of the items in the Swadesh list. If anybody cares,
I can post some specific numbers, but in general they indicate that
10000 years is much too stringent a ceiling for any reasonable size
language family.
(3) Alexander Vovin, Larry Trask, and Jacques Guy are asking some
good questions about the possibility of finding apparent regular
correspondences between languages that are unrelated (or at least
not related by the correspondences in question). This is not as
hard as one might imagine. For one thing, it all depends on what
you mean by a regular correspondence. If you don't constrain that
notion, you can indeed have a regular set of correspondences between
any two sets of morphemes in any two languages. Like transformational
grammar (Peters and Ritchie) and compositional semantics (Zadrozny and me),
it is easy to see that regular correspondences is a vacuous notion (unless
something more is required, that is, some well-defined notion of naturalness,
as in the Dolgopol'skij probabilistic method from the 1960's).
As for references which Larry is asking for, the late Wick Miller and
Catherine Callaghan published a paper like this showing a spurious
set of relationships between English and some Amerindian group as a way
of critiquing Swadesh years ago.
A more recent reference is Ringe's book on statistical methods in
comp. ling, in which he develops a method that is supposed to
distinguish related from unrelated languages and which is supposed
to find regular correspondences in the former, but when he tested
it on English and Turkish, he got some (admittedly only some)
regular correspondences which are clearly spurious (even if English
and Turkish are related).
It is also apparent when comparing Illich-Svitych's Nostratic
theory with Bomhard's that spurious correspondences are easy to
find in practice since they propose different correspondences
between Indo-European and Afroasiatic, so that at most one of
them can be right (Illich-Svitych being the better bet, in my view).
Alexis Manaster Ramer
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 5.1139 Comparative method

Date: Wed, 19 Oct 94 12:25:53 EST
Subject: Re: 5.1139 Comparative method
In reply to Larry Trask posting:
 In his posting, Larry Trask wrote:
>In his recent posting on the comparative method, Alexander Vovin raises
an interesting point: how easy is it to find random similarities between
two unrelated languages? Vovin suggests that it is, in general, very
difficult, citing English and Mandarin as a case in point.
Now if Vovin is right, it would seem that we are obliged to take seriously
the inumerable demonstrations, many of them recent and prominent, that
certain languages must be distantly related because they exhibit some dozens
of random similarities -- even though there are no systematic correspondences.
 I believe that this is a misundertanding of my posting. We are NOT
obliged to accept such theories as Dene-Caucasian (of which I am
sceptical, too ), because it demonstrates a bunch of look-alikes.
Quite the opposite, I believe we are obliged NOT TO ACCEPT such
theories, because it demonstrates a bunch of look-alikes only, with NO
REGULAR CORRESOPPONDENCES. I intended my posting to say that these
 Larry Trask's example with 65 similarities between Basque and Hungarian
sounds interesting, but I would like to ask him on what basis did he find
this 65? Basic vocabulary only? How long was the list?
 Larry Trask is certainly right that the more similar the phonological
systems, the more possibility there will be to find look-alikes. I gave
English and Mandarin as an extreme case, but this is also a case, like Basque
and Hungarian.
 I do not claimt that there are no two languages, like Basque and Hungarian
where you actually can find a number of similarities. What I do claim is
that this is not true for ANY two languages taken at random.
 I played myself with Japanese and Hopi. On the basis of Swadesh's 100
list I got about 8-9 suspects, but it was impossible to establish any REGULAR
correspondences. Then I compared Proto-Japanese 100 list with Proto-Uto-Azte-
can 100 list, and interestingly enough I found only about 5 look-alikes, but
most of them were not the same as the "cognates" in the first list. Moreover,
there were certainly no regular correspondences again. The Japanese and Hopi
phonology is not that dissimilar as in the case of English and Mandarin.
This stuff is in my forthcoming article "Some notes on Language Comparison"
to be published in "Studies in Nostratic", ed. by V. Shevoroshkin. I can send
computer printouts to anybody who's interested.
 Answering to Jacques Guy:
 I allow semantic shifts, but I do not think that it is possible to
allow them for 100% of cases. There must be a certain number of comparisons
with identical semantics, otherwise the whole hypothesis will be very suspi-
cious: if this your point, then I wholeheartedly agree with that. I usually
try to have at least 10-20% percents of etymologies without any semantic
 Also, Jacques Guy wrote:
> I had mentioned Muyuw which had this distinction, worthy of the
> Guinnes Book of records, of having innovated 20% of its everyday
> vocabulary in one single generation...
 Well, what might be true of EVERYDAY vocabulary, may not necessarily
be true of BASIC vocabulary, as it is only tiny portion of basic vocabulary
which is used in everyday vocabulary.
Alexander Vovin
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: chance similarities (experiment)

Date: Thu, 20 Oct 1994 12:09:09 EDT
From: Don Ringe <>
Subject: chance similarities (experiment)
Dear Colleagues-- This is in reply to Prof. Trask's query. Toward the end of
1992 I did the same sort of experiment as Prof. Trask and got the same sort of
results. Here are the details.
I compiled Swadesh 100-word lists for nine North American languages
classifiable into seven families which, ACCORDING TO RELIABLE SPECIALISTS, are
not demonstrably related to one another. The seven groups were the following.
1) Tzotzil (of Zinacanta'n).
2) Zoque (of Francisco Leo'n, before the volcano made them move).
3) Western Highlands Mixtec (of San Miguel el Grande).
4) Pipil (Cuisnahuat dialect) and Hopi (Third Mesa dialect); these Uto-Aztecan
 languages are distantly related, but the relationship can be demon-
 strated even with crude probabilistic methods if one is clever enough.
 (The trick is to look for initial cons. correspondences that fall
 toward the top of the respective ranges you'd expect by chance, look
 for cons. correspp. after the 1st vowel that do likewise, observe how
 often the two classes appear *in the same words*, and calculate how
 likely that is by chance; result is a strong proof.)
5) Karuk.
6) Ojibwa and Menominee, two Algonkian languages whose relationship is close
 and obvious.
7) Greenlandic Inuit of the mid-19th c.
NOTE the inclusion of Greenlandic, a language which is not supposed to belong
to the proposed Amerind superstock; this will be important for those who don't
believe that probabilistic testing has anything to do with reality.
Counting Ojibwa-Menominee cognates as single items, and counting Pipil-Hopi
cognates as single items, I found no fewer than *115* sets of forms similar in
sound and meaning in various subsets of the seven groups--this in a 100-word
list! A few were patently anti-historical (if you know a little about the
languages), but since my purpose was to find *similar* forms (not provable
cognates), of course they had to be counted. Of course most of the sets were
binary (as expected for short lists in a small no. of groups), but 12 three-
member sets and 2 four-member sets also appeared. Like Prof. Trask, I could
have increased the number of sets (and the size of some sets) appreciably by
being less restrictive about what constitutes "similarity". A fair number of
the sets appear (often *in part*) in Greenberg's "Amerind Etymological
Dictionary", but I could not find the majority of them in that work. (This is
not because, as has been claimed, G insisted on using only comparanda that were
truly representative of their lower-order subgroups; specialists have warned
repeatedly that he did not do so effectively, and since they know the data, we
are not at liberty to doubt them.)
The numbers of similarities found between various pairs of these groups seem to
show no particular pattern; I found
 16 between Zoque and Uto-Aztecan,
 15 between UA and Karuk,
 13 between Tzotzil and Zoque,
 13 between UA and Algonkian,
 11 between Tzotzil and Mixtec,
 10 between UA and Inuit (n.b.!),
 9 between Mixtec and UA,
 8 between Tzotzil and UA,
 8 between Karuk and Algonkian,
and at least one between every other possible pair. The degree to which
individual lgg. participate in the similarity-sets also gives no clear pattern:
 Zoque 36 sets
 Tzotzil 34 sets
 Pipil 34 sets
 Hopi 32 sets
 Karuk 32 sets
 Inuit 28 sets (n.b.!)
 Mixtec 26 sets
 Ojibwa 24 sets
 Menominee 22 sets
Of the two-member groups compared, Algonkian participates in 29 sets--not many
more than Ojibwa or Menominee, because those lgg. are so closely related that
they mostly participate in the same sets; Uto-Aztecan, though, participates in
60--for exactly the opposite reason, as might be expected.
NOTE that, in both listings, Inuit is right in there with the rest of 'em, EVEN
to show that we really are dealing with random similarities.
As you might expect, I wasn't content to guess at that; I ran a bunch of
primitive probabilistic tests on these lgg., both pairwise and across the whole
set of seven groups. I found ABSOLUTELY NOTHING SIGNIFICANT; the similarities
ALL fall comfortably within their expected chance ranges. Personally I was
disappointed. I'd thought I saw a degree of similarity between Tzotzil and
Zoque, which (after all) are not only supposed to be members of a "Mexican
Penutian" stock, but are even in contact (and have been for centuries); but
when I ran the numbers I couldn't get even one significant sound correspond-
ence. Presently I thought I saw some sort of similarity between Karuk and the
Algonkian lgg. (and that was much more exciting, because there's not supposed
to be any close or special relationship between them by *anyone's* scenario);
but that, too, failed to pan out. So far as I can see, these really are random
Finally, I think Prof. Trask may be right about typology having some bearing on
this (though I can't say I've got evidence for it; the sample is far too
small). Look again at that second table above. Down to about the middle are
the lgg. which have *relatively* short lexemes, on the average--except Mixtec,
which is typologically much more like an East Asian lg.; toward the bottom are
Mixtec and the lgg. with conspicuously long lexemes. That might be an
accident, and then again it might not be. (If it isn't, it makes the strong
representation of Inuit in the similar sets even more remarkable.) One
reservation about drawing inferences from this too quickly results from another
bit of info I happen to have. I recall seeing, years ago, a "similarist"
comparison of English and Vietnamese which turned up an astonishing number of
sound/meaning similarities in spite of the obvious differences in typology
(based on whole dictionaries of the lgg., to be sure). I don't remember whose
work it was--he was an amateur, but obviously painstaking.
Of course it might be that how many "similar" forms one finds depends heavily
on who's looking. But we all know what sort of inferences to draw from that
Hope this is of some use. Cheers! --Don Ringe
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue