|Submitter Email:||click here to access email|
In writing to Mario, I referred to (and included a copy of) Ch.7 of my
PhD thesis [Sigley, R. 1997. Choosing Your Relatives: Relative Clauses
in New Zealand English. PhD thesis, Victoria University of Wellington,
New Zealand.] This chapter compares logistic/Varbrul analysis with
more ordinary chi-squared tests on crosstabulated data; it's intended
as a practical guide to interpreting the GoldVarb output.
My email to Marco was a summary of that material, with additional
speculations, one of which was certainly wrong as stated (see below).
I write now so that anyone wishing to discuss details with me can do so
directly (email: Sigley@ic.daito.ac.jp).
(i) The number of degrees of freedom in a logistic or loglinear model =
(the number of independently estimated parameters - the number of fixed
Question: Is this equal to (number of factors) - (number of factor groups),
as Avila states, or to (number of factors + 1) - (number of factor groups)?
In other words, does the 'input weight' (which is also iteratively
(ii) The comment I made in parentheses below is inaccurate.
>It is possible to use this method to incorporate several interaction
>effects into the model -- but it quickly becomes rather cumbersome, as you
>will often have to collapse distinctions in order to include the
>crossproduct factor group, and things get really messy when you need to
>consider several interactions involving the same factor group. (I think the
>best way to treat these is stepwise: if the most significant interaction is
>between groups 1 and 2, and you suspect there's also an interaction between
>groups 1 and 3, you can only approach it indirectly by comparing models
>containing 1*2, 3, 4,...n and 1*2*3, 4,...n. By contrast, if you try
>constructing a model containing 1*2, 1*3, 4,...n then you've effectively
>encoded the distinctions from group 1 twice, which means your model has
>redundant parameters and could produce unreliable results.)
Here I was trying to reconcile differences between what I know in theory
and what seems to work in practice, and managed a rather garbled account; a
fuller explanation follows.
Suppose we're comparing the models:
(a) 1*2, 3, 4, ... , n (a model containing the interaction effect between
groups 1 and 2, but treating every other factor group as independent)
(b) 1*2, 1*3, 4, ... , n ( a model containing independent interactions
between groups 1 and 2, and groups 1 and 3)
(c) 1*2, 1*3, 2*3, 4, ... , n (containing independent 2-way interactions
for groups 1 and 2, 1 and 3, 2 and 3)
(d) 1*2*3, 4, ... , n (containing the 3-way interaction for groups 1, 2 and 3)
To test the significance of adding the 1*3 interaction to a model
containing the 1*2 interaction, you should compare models (a) and (b).
To test the significance of further adding the 2*3 interaction, you should
compare models (b) and (c).
To test the significance of the 3-way 1*2*3 interaction, you should compare
models (c) and (d).
These models show increasing complexity, and an increasing number of
independently-estimated parameters, from (a) < (b) < (c) < (d).
In practice: this doesn't always work, for several reasons.
* Crossproducts often contain many apparently categorical environments
('knockouts') -- mostly because of low cell occupancy, but also because
of systematic gaps -- which must be excluded or collapsed for analysis.
Performing these simplifications sometimes produces nonsensical results.
I've often found that a model containing a 3-way interaction contains
*fewer* independently-estimated parameters than the supposedly
'simpler' model containing the 3 2-way interactions -- once
knockouts are excluded. Thus *in some cases* you won't be able to use
the recommended model test, and some more indirect approach will be
* Crossproducts often contain a large number of factors. This may mean that
the overall model has a higher number of parameters than is justified by
the number of tokens in the dataset. Thus, accidental redundancy (where
several combinations of factors describe the same set of tokens) may
result. This is particularly likely when you include two factor groups
based partly on the same distinctions (eg the 1*2, 1*3 crossproducts,
which will both partition the dataset along the divisions from the
original group 1). I must emphasise that including such crossproducts of
shared factor groups does not necessarily result in redundancy (in contrast
to what my original statement implied) -- but it does make it more likely.
| Robert Sigley, Foreign Languages Dept |
| (English Division), Daito Bunka University, |
| 1-9-1 Takashimadaira, Itabashi-ku, Tokyo 175 |
|Original Query:||Read original query|
Sums main page