**Editor for this issue:** <>

- Jacques Guy, Protolanguage undecidability and retention rates
- , Re: 5.677 Protolanguage
- Trey Jones, T

Mark Durie <mark_duriemuwayf.unimelb.edu.au> writes: >By Jacques Guy's method, even one of the daughter languages could be a proto >language, with 100% retention of vocabulary. No. The daughter language is evidently not the protolanguage. However, it is lexically _indistinguishable_ from the protolanguage. Therefore, it is as if it were the protolanguage. Thus, the protolanguage, the root of the tree, is the terminal node occupied by that hypothetical 100% retentive language. >Guy's 'proof' that the root can >be placed in infinitely many places only works on the assumption of >infinitely arbitrary variations in vocabulary replacement rates. ^^^^^^^^^^ No, definitely not. This is a misuse, again, of "infinite". It is also a misuse of "arbitrary". I doubt very much that lexical innovations are arbitrary. But, whatever their causes, we can be pretty sure that they vary, and that we cannot predict them with any useful degree of certainty. The outcome of the roll of a die is neither arbitrary nor, strictly speaking, random. We might, possibly, predict it if we were in possession of all the necessary information. But we are not. So it appears random. Ditto lexical innovations. And the variation cannot be infinitely arbitrary since it is necessarily confined within the range 0 to 1, or if you prefer, 0% to 100%. >It is bizarre to suggest >that a reconstruction that assumes retention rates ranging from 100% to 20% ^^^^^^^^^^^^^^^ >in the one family is as equally plausible as one which assumes a range of >55%-65% retention rates across the family. The proof makes no mention of retention rates but of _retentions_. A retention rate is, pardon this Lapalissade, a rate of retention. There is nothing particularly strange about a retention rate of, say, 95% per generation. If it persists for a thousand years or so, more than 30 generations, the retention, 1000 years later, is a paltry 20%. At the other end of the scale, Bergsland and Vogt (1962, in Current Anthropology, look it up), have observed the following retention rates per 1000 years: 200-item list 100-item list Icelandic rural dialect 97.6% 99% urban dialect 96.2% 98% Georgian 89.9% 96.5% Armenian 94% 97.8% David Lithgow (pers. com. circa 1970) has observed a replacement of some 20% of the basic vocabulary in Muyuw (Woodlark island) in one generation. Raise 0.8 to the 33rd power, and that gives you the retention rate of Muyuw per 1000 years should it continue to evolve at that rate: 0.06%. So there is nothing bizarre, then, in expecting even such apparently unbelievable figures: from 0 to 100% retention per thousand years! All this, in my view, explains why the cradle of Indo-European has been shuffled about so widely, from the Baltic to the Middle-East and I know not where else so that the poor baby must be feeling thoroughly sick by now.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Mark Durie <mark_duriemuwayf.unimelb.edu.au> writes: >By Jacques Guy's method, even one of the daughter languages could be a proto >language, with 100% retention of vocabulary. Guy's 'proof' that the root can >be placed in infinitely many places only works on the assumption of >infinitely arbitrary variations in vocabulary replacement rates. Few >advocates of lexico-statistics (I am not one) would share the covert >assumption of Jacques' proof that replacement rates are maximally and >arbitrarily variable. Even if one does not hold the controversial opposite >assumption that vocab replacement rates are universally constant across time >and space, most of us would not wish to go so far the other way as to assume >that they vary completely freely and arbitrarily. It is bizarre to suggest >that a reconstruction that assumes retention rates ranging from 100% to 20% >in the one family is as equally plausible as one which assumes a range of >55%-65% retention rates across the family. The proof was not that we could never figure out a root to the tree, but merely that cognate proportions alone are insufficient for locating the root. His proof is mathematically valid. The are, however, other ways to locate it (he mentions some in GLOTTO.DOC, though I don't recall if he did so in the post). Kind of ironic, someone interested in glottochronology like me coming to Jacques' defence. -Pat Crowe, SUNY at BuffaloMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Reply to: 5.677 Protolanguage proof Mark Durie, writing about Jacques Guy's proof of protolanguage uncertainty in lexico-statistics, makes a lot of assumptions about other people's assumptions himself. It strikes me as ironic, given the recent flurry of postings concerning the popular view and misunderdstandings of linguistics, that Jacques Guy seems to have to fight so hard against linguists' misunderstandings of mathematics and statistics. When I was first developing an interest in linguistics, I came across a book entitled _Everything_Linguists_Ever_Wanted_To_Know_About_Logic_But_Were_Ashamed_ _To_Ask_, which I thought was foolish, considering the predominating "formal" bent of linguistics. I was woefully wrong (and slightly biased by a formal training in mathematics and computer science..), and my depression deepens every time I see "refutations" of basic mathematical reasoning. But I digress. To the point, the main false assumption that Mark Durie is making about Jacques Guy's proof is that Jacques has any assumptions at all. The proof concerns what you can mathematically and statistically determine (that means FOR SURE) from lexico-statistical data. The answer, concerning the position of the protolanguage, is NOTHING! Mark Durie states: >By Jacques Guy's method, even one of the daughter languages could be a proto >language, with 100% retention of vocabulary. That is entirely correct... and not a flaw at all, as we shall see.. Durie continues: > Guy's 'proof' that the root can >be placed in infinitely many places only works on the assumption of >infinitely arbitrary variations in vocabulary replacement rates. Also true, and still not a flaw! let us consider an example. Suppose you had a data set for languages A, B, and C, and you constructed a relational tree such as this: A-----:--------:-- <..and you were tempted to put the protolanguage here. B-----; | Well, I hate to say it, but that could be rather | foolish of you, particularly if the languages in C--------------; question were Spanish(A), Portuguese(B) and Latin(C). In this case one of the "daughter" languages is in fact the protolanguage, Latin! (minor quibbling about Classical vs Vulgar Latin aside.. this is for illustrative purposes only, do not attempt this reconstruction at home, I am a trained professional. --Sorry, I am occasionally possesed by Dave Barry.) The point is that Durie has already made a HUGE (GIGANTIC) assumption that all the data in question comes from the same time period, which is by no means the case. Jacques tackles the more general case of all sorts of data from various time periods. The fact is, YOU JUST CAN'T TELL. In fact, in the example above, the protolanguage is actually most likely Spanish -----:--------: Portuguese -----; | | Classical Latin--------------; ^^-here, at the place where Classical and Vulgar Latin split.. very close to one of the "daughter" languages.. but that comes in part from all sorts of extra- and para-linguistic evidence, like age of written records, the fact that all the Romans are dead, general knowledge of world history, stuff like that. Durie writes: > It is bizarre to suggest >that a reconstruction that assumes retention rates ranging from 100% to 20% >in the one family is as equally plausible as one which assumes a range of >55%-65% retention rates across the family. To me, it is bizarre to introduce such real world knowledge as likely retention rates into a mathematical discussion. (As anyone who has studied college algebra can attest, mathematics has nothing to do with the real world [if it can help it].) We are talking about what the math can tell you, not what likely guess you can make based on real world knowledge that doesn't factor into the equations. I keep reiterating the main point here in hopes that it will sink in, somewhere, for someone who doesn't get it yet (but they are small hopes, as Jacques understands..): You cannot DETERMINE the position of a protolanguage in a lexico-statistically derived relation tree. You can use outside evidence to help you narrow down the range of likely possibilities, but that is outside the realm of what lexico-statistics can do for you. Okay.. take your best shot.. -Trey Jones, part-time Math Geek, part-time Ling Geek, full-time Computer Nerd.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue