**Editor for this issue:** Brett Churchill <brettlinguistlist.org>

- Mario Cal Varela, Sum: GoldVarb (addendum)

Dear all, After having posted my summary of responses to a query on "GoldVarb" a week ago, I received a message from Robert Sigley commenting on some of the points raised there. I thought I might just as well forward his message to linguist as something to be appended to the summary posted on October 13th. > >1) Preston is absolutely right about the procedure, and about the need for >linguistic motivation when collapsing factors. You test significance of >collapsing factors, or of removing entire factor groups, by comparing the >log-likelihood values, and calculating a value for chi-squared (actually, >G2) based on twice (the difference in log-likelihood). (The >'step-up/step-down' runs automatically perform the series of tests required >to add/remove entire factor groups; but you need to do the comparison >yourself when trying to simplify factor groups by collapsing factors, or >when trying to test significance of interaction effects.) > >2) Avila is also correct in her description of the procedure. My own >account differs in one minor detail -- I have assumed that the "input >weight" counts as an estimated parameter, so that I list the number of >degrees of freedom for each model as "(number of factors) - (number of >factor groups) + 1". I may be wrong (and if somebody *knows* I'm wrong, >please tell me!). Still, it doesn't matter, as you take the *difference* in >degrees of freedom between the two models you compare, so the "+1"s cancel >out. > >>Finally, Ron Smyth calls my attention to two limitations of the variable >>rule applications that could perhaps be commented on by more >>statistically-oriented researches than myself. The first one has to do with >>the fact that when the design has several factors, the output of the >>program does not give any information about some of the interactions. The >>other is that the program seems to handle nicely data with very few >>subjects per cell, where other applications would not give out anything >>significant. That is, GoldVarb does not keep track of subjects and seems to >>disregard individual differences. > >Let's take this a point at a time: > >1) INTERACTION EFFECTS > >GoldVarb does not freely give information about *any* interaction effects. >However, you can still use the GoldVarb output to test for their presence >-- and my PhD chapter describes two such methods (quite apart from the >tedious and often unilluminating procedure of looking for high values of G2 >in individual cells). > >First: if you construct a model containing just 2 factor groups, and this >model provides a poor fit to the data (choose "Show model fit" under the >Cells menu before doing the analysis), then this means that a model >treating these factor groups as independent provides a poor fit -- and >therefore there is some dependence between them. However, this could result >from several sources -- > >a) some third significant factor (which you may or may not have encoded!) >is inequitably distributed across the cells of the two-group model you're >testing; > >b) there really is an interaction effect. > >so this is only a rough indication of whether you should look for an >interaction effect. What we *can* say is that if the model is a good fit, >there's probably no interaction effect. > >It's very easy to carry out the entire set of such 2-group comparisons by >running them as part of a step-up/step-down analysis. And if you do this >first, then you only need to apply the second, more difficult test to the >much smaller set of 2-group comparisons that give significant results. > >Second, and more definitive (but more difficult): >* Take the model containing all the groups you've encoded (the "full-groups >model"). >* Note the log-likelihood, and the number of degrees of freedom. >* Then replace two factor groups with a new single factor group containing >their crossproduct (ie, every possible combination of factors is >represented). >* Note the new log-likelihood, and number of degrees of freedom. >* Conduct the chi-squared test as Avila describes. > >If this test result is significant, that tells you that the crossproduct of >factors is more informative than treating the factor groups independently >-- from which you can infer that there is a significant interaction effect. >(Assuming that you have encoded all significant influences!!!) > >It is possible to use this method to incorporate several interaction >effects into the model -- but it quickly becomes rather cumbersome, as you >will often have to collapse distinctions in order to include the >crossproduct factor group, and things get really messy when you need to >consider several interactions involving the same factor group. (I think the >best way to treat these is stepwise: if the most significant interaction is >between groups 1 and 2, and you suspect there's also an interaction between >groups 1 and 3, you can only approach it indirectly by comparing models >containing 1*2, 3, 4,...n and 1*2*3, 4,...n. By contrast, if you try >constructing a model containing 1*2, 1*3, 4,...n then you've effectively >encoded the distinctions from group 1 twice, which means your model has >redundant parameters and could produce unreliable results.) > > >2) INDIVIDUAL BEHAVIOUR > >Smyth's second point actually combines two problems: > >i) GoldVarb cannot be expected to take into account any factors (such as >behaviour of individual respondents) which are not encoded in the model. If >you want to test for significance of individual behaviour, you have to have >this as a factor group. (This may be impractical if you've got a large >number of individuals in your dataset.) > >ii) chi-squared tests (whether performed by GoldVarb or any other >application) assume that every data point is independent. In other words, >we assume that a speaker's choice on one occasion is not influenced by >their choice on other occasions. Hence the 'unit of variation' is the >single token. In datasets where many tokens have been drawn from the speech >of one individual in one interaction, this assumption may be false, which >will lead to significance being exaggerated. In extreme cases, it may be >better to treat the 'unit of variation' as being each *speaker*, or even >each *conversation*. Provided that you have a large number of tokens (at >least 10, and preferably at least 30) per speaker, you can use >nonparametric tests such as Mann-Whitney U or Kruskal-Wallis H to test for >significant differences among speakers from different social groups. Such >tests provide a useful 'sanity check' for token-based significance tests. > > Mario Cal Varela Departamento de Filoloxia Inglesa e Alemana, despacho 307 Facultade de Filoloxia Universidade de Santiago de Compostela c/ Burgo das Nacions s/n Santiago 15705 ESPANA tlf (981) 563100 ext. 11858 fax (981) 574646Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue