**Editor for this issue:** Scott Fults <scottlinguistlist.org>

- Robert Sigley, GoldVarb (addendum)

In writing to Mario, I referred to (and included a copy of) Ch.7 of my PhD thesis [Sigley, R. 1997. Choosing Your Relatives: Relative Clauses in New Zealand English. PhD thesis, Victoria University of Wellington, New Zealand.] This chapter compares logistic/Varbrul analysis with more ordinary chi-squared tests on crosstabulated data; it's intended as a practical guide to interpreting the GoldVarb output. My email to Marco was a summary of that material, with additional speculations, one of which was certainly wrong as stated (see below). I write now so that anyone wishing to discuss details with me can do so directly (email: Sigleyic.daito.ac.jp). (i) The number of degrees of freedom in a logistic or loglinear model = (the number of independently estimated parameters - the number of fixed parameters). Question: Is this equal to (number of factors) - (number of factor groups), as Avila states, or to (number of factors + 1) - (number of factor groups)? In other words, does the 'input weight' (which is also iteratively estimated) count? (ii) The comment I made in parentheses below is inaccurate. >It is possible to use this method to incorporate several interaction >effects into the model -- but it quickly becomes rather cumbersome, as you >will often have to collapse distinctions in order to include the >crossproduct factor group, and things get really messy when you need to >consider several interactions involving the same factor group. (I think the >best way to treat these is stepwise: if the most significant interaction is >between groups 1 and 2, and you suspect there's also an interaction between >groups 1 and 3, you can only approach it indirectly by comparing models >containing 1*2, 3, 4,...n and 1*2*3, 4,...n. By contrast, if you try >constructing a model containing 1*2, 1*3, 4,...n then you've effectively >encoded the distinctions from group 1 twice, which means your model has >redundant parameters and could produce unreliable results.) Here I was trying to reconcile differences between what I know in theory and what seems to work in practice, and managed a rather garbled account; a fuller explanation follows. Suppose we're comparing the models: (a) 1*2, 3, 4, ... , n (a model containing the interaction effect between groups 1 and 2, but treating every other factor group as independent) (b) 1*2, 1*3, 4, ... , n ( a model containing independent interactions between groups 1 and 2, and groups 1 and 3) (c) 1*2, 1*3, 2*3, 4, ... , n (containing independent 2-way interactions for groups 1 and 2, 1 and 3, 2 and 3) (d) 1*2*3, 4, ... , n (containing the 3-way interaction for groups 1, 2 and 3) In theory: To test the significance of adding the 1*3 interaction to a model containing the 1*2 interaction, you should compare models (a) and (b). To test the significance of further adding the 2*3 interaction, you should compare models (b) and (c). To test the significance of the 3-way 1*2*3 interaction, you should compare models (c) and (d). These models show increasing complexity, and an increasing number of independently-estimated parameters, from (a) < (b) < (c) < (d). In practice: this doesn't always work, for several reasons. * Crossproducts often contain many apparently categorical environments ('knockouts') -- mostly because of low cell occupancy, but also because of systematic gaps -- which must be excluded or collapsed for analysis. Performing these simplifications sometimes produces nonsensical results. I've often found that a model containing a 3-way interaction contains *fewer* independently-estimated parameters than the supposedly 'simpler' model containing the 3 2-way interactions -- once knockouts are excluded. Thus *in some cases* you won't be able to use the recommended model test, and some more indirect approach will be necessary. * Crossproducts often contain a large number of factors. This may mean that the overall model has a higher number of parameters than is justified by the number of tokens in the dataset. Thus, accidental redundancy (where several combinations of factors describe the same set of tokens) may result. This is particularly likely when you include two factor groups based partly on the same distinctions (eg the 1*2, 1*3 crossproducts, which will both partition the dataset along the divisions from the original group 1). I must emphasise that including such crossproducts of shared factor groups does not necessarily result in redundancy (in contrast to what my original statement implied) -- but it does make it more likely. Cheers, Robert Sigley. +-----------------------------------------------+ | Robert Sigley, Foreign Languages Dept | | (English Division), Daito Bunka University, | | 1-9-1 Takashimadaira, Itabashi-ku, Tokyo 175 | +-----------------------------------------------+Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue