LINGUIST List 5.521

Fri 06 May 1994

Disc: Greenberg - Simulation with semantic shift

Editor for this issue: <>


  1. Jacques Guy, Greenberg: simulation of chance resemblance

Message 1: Greenberg: simulation of chance resemblance

Date: Thu, 5 May 1994 13:59:23 +Greenberg: simulation of chance resemblance
From: Jacques Guy <>
Subject: Greenberg: simulation of chance resemblance

(This is a follow-up to my previous message)
I had to implement semantic shifts in the simulation,
curse my curiosity. That involved rewriting the
program from scatch. I will not post the source
code this time, because it is quite intricate,
and one cannot clearly see how the simulation works
without detailed explanations. But here is the
main principle:

Note that Greenberg allows these semantic shifts:

to suck, breast, udder, milk, to milk, to chew,
throat, to swallow, cheek, neck, to drink,
nape of the neck.

That is, the semantic shifts cover 12 words.
Call this a fudge factor. No semantic shifts
allowed is fudge factor = zero. Here, the
fudge factor is strictly 11. Grant that
equating breast, udder, milk, and to milk
is not a fudge, ditto for neck and nape of
neck. We are left with:

to suck, breast etc., to chew, throat, to
swallow, cheek, to drink, neck. Eight
meanings: fudge factor 7.

All right. I rum my simulation 130 times
on 20 languages each represented by 300
words, a 1/200 chance of accidental resemblance
for every word, and a fudge factor of 5.

Out of those 130 experiments there were:

2605 cases of 3 languages with the same word
 i.e. on the average, you had 20 items
 which should up as identical by pure
 accident in 3 languages

 642 cases of 4 languages. So, 4.9 items
 showing up as identical in 4 languages
 by accident every time.

 121 cases of 5 languages, an average of
 0.93 items.

 23 cases of 6 languages

 2 cases of 7 languages

 1 case of 8

So you should expect to see the same word in
6 languages out of 20, by pure accident,
23 times out of 130, under conditions about
as stringent as those used by Greenberg.

That is almost one chance in five, a far
way from the one chance in 10 billion
calculated by Greenberg.

And to think that I have wasted a whole
afternoon to demonstrate a point that ought
to be intuitively obvious.

I know, you are getting sick of it. Well, complain to Jane Edwards,
she's the one responsible for starting me on this.

Results of 200 simulations of 500 words in 50 unrelated languages,
with a fudge factor of 7 (same as Greenberg's Proto-World *milk),
chance of accidental resemblance 1/250 (same as Greenberg's figure).

38.45 words found in 6 languages (that is a mean. Not the total
 cases in the 200 simulations. In other words, every time,
 you are likely to find 38 words looking like cognates
 between 50 unrelated languages each represented by a
 500-item list)

Found in 7 languages: 21.02
Found in 8 languages: 11.40
Found in 9 languages: 4.95
Found in 10 languages OR MORE: 3.35

"The way the protoworld crumbles," as James Hadley Chase
might write if he were still of this world.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue