AUTHOR: Rasinger, Sebastian M. TITLE: Quantitative Research in Linguistics SUBTITLE: An Introduction SERIES: Research Methods in Linguistics PUBLISHER: Continuum International Publishing Group Ltd YEAR: 2008
Thomas Hoffmann, English Linguistics, University of Regensburg (Germany)
SUMMARY
In many areas of modern linguistics, quantitative data play an increasingly important role, a fact which obviously leads to a demand for textbooks introducing students and junior researchers to the topic. Sebastian Rasinger's ''Quantitative Research in Linguistics'' explicitly tries to meet this demand by providing ''an introduction to quantitative research methods [...] aimed at those with a minimum of prior knowledge'' (p. 1). The book consists of ten chapters, grouped into two parts: part I (chapters 2-4) gives a first overview of basic issues in quantitative research as well research and questionnaire design. Part II (chapters 5-9) then deals with descriptive and exploratory statistical data analysis using Microsoft Excel. Most chapters contain exercises, the answers to which can be found in chapter 10 ''Appendix and Solutions'', which also provides a list of the Excel functions used in the book and several statistical significance tables.
The book opens with a brief introductory chapter (pp. 1-5), in which Rasinger stresses the importance of a basic knowledge of quantitative research methods for students and researchers and outlines the structure of the book.
The first chapter of part I (ch. 2 ''Quantitative Research: Some Basic Issues''; pp. 7-34) discusses the difference between qualitative and quantitative data. While Rasinger notes that the former data give rise to questions of ''how something is'' (p. 11), the latter are said to be investigated by questions such as ''how much or how many there is/are of whatever we are interested in'' (p. 10). On top of that, qualitative research is characterised as inductive and hypothesis-generating, while quantitative approaches are claimed to be deductive and hypothesis-testing (p. 11-2). Next, various issues concerning ''variables'' are addressed (pp. 18-27), including amongst others their measurement, definition, operationalisation and levels of measurement (using the widely-used categorical, ordinal, interval, ratio scale distinction). The chapter closes with a discussion of reliability/validity (pp. 28-31) and the relationship between hypotheses, laws and theories (pp. 31-4).
In chapter 3, Rasinger then turns to ''Research Design and Sampling'' (pp. 35-55). In essence he divides the various research designs into those which structure research in terms of temporal order (longitudinal vs. cross-sectional designs) and those which allow for ''explicit and deliberate manipulation of variables'' (p. 36; i.e. experimental and quasi-experimental designs). Giving the example of vocabulary growth in first language acquisition, he points out that longitudinal studies require measurements ''on several (at least two) occasions'' (p. 38). In contrast to this, cross-sectional designs such as Labov's (1972) rhoticity study in New York entail data collection at one point in time (pp. 36-8). Furthermore, he shows how the sociolinguistic notion of apparent time can be used to interpret the results from synchronic cross-sectional studies as evidence for linguistic change (p. 41). Following a review of central issues of experimental (experimental vs. control group / between-subject vs. within-subject design) and quasi-experimental design (pp. 42-5), Rasinger then turns to the question of sampling: he first of all sketches the relationship between population and sample, and after that discusses the pros and cons of several sampling techniques (random/probabilistic vs. non-random/non-probabilistic samples, the latter being differentiated into opportunity and convenience samples; pp. 45-52). The final section of the chapter deals with ethical guidelines in research (such as seeking participants' consent or allowing subjects to withdraw at any time; pp. 52-5).
''Questionnaire Design and Coding'' (ch. 4; pp. 56-83) is the topic of the next section. As the title suggests, in addition to ''general guidelines on how to design a questionnaire'' (p. 57) the chapter also gives information on ''how to prepare questionnaire-based data for [statistical] analysis'' (p. 57). First, Rasinger points out that the basis for any good questionnaire is a clear and precise research question. He then goes on to describe multiple choice/item questions (pp. 59-61), the measurement of attitudes and beliefs (focusing on elicitation instruments such as semantic differentials and Likert scales; pp. 61-3), pitfalls and problems in question phrasing (pp. 63-7), and the role of piloting (pp. 67-9), layout (pp. 69-71) and the number and sequence of questions (pp. 70-1). Finally, a sample questionnaire (pp. 76-82) is provided and discussed.
Part II of the book moves on to statistical data analysis, with chapter 5 (''A First Glimpse at Data''; 87-109) dealing with descriptive statistical concepts such as absolute and relative frequencies (pp. 89-93) and ''classes, width and cumulative frequencies'' (pp. 93-8). Besides, particular emphasis is placed on the visualisation of data by graphs (pp. 98-109. For this, Rasinger draws on data from various linguistic studies to illustrate the use of bar and pie charts (namely Labov's 1972 study on cluster simplification as well as Wolfram's 1969 and Trudgill's 1974 studies on non-standard features in Detroit and Norwich, respectively; pp. 100-3), line graphs (based on data from Hirsh-Pasek and Golinkoff's 1996 fixation time study investigating children's processing of verb argument structure, pp.105-7) and scatter plots (this time using a ''fictive'' data set; pp. 107-9). (In fact, it should be pointed out that most data sets discussed in the book are from actual linguistic studies.)
After this, Rasinger turns to measures of central tendency and dispersion (ch. 6 ''Describing Data Properly -- Central Location and Dispersion''; pp. 110-36). He explains crucial notions such as mean, median and mode together with quartiles, quintiles and percentiles (pp. 113-23). Subsequently, he moves on to ''measures of dispersion'' (p. 123), introducing range, variance, standard deviation, standard error and z-scores (pp. 123-9, 133-6). The normal distribution with its special properties is discussed in subsection 6.4. (129-32).
Despite its simple sounding title ''Analysing Data -- A Few Steps Further'' (pp.137-74), chapter 7 takes the reader with a minimum of prior knowledge for quite a statistic ride from probability theory to multiple regression. The first two sections explore probability issues, i.e. simple, conditional and joint probabilities (pp. 138-44). Following this, test statistical concepts such as chi-square tests, Pearson correlation, partial correlation, causality, significance, simple and multiple regression and correlation and reliability are introduced (pp. 144-74).
Finally, chapters 8 (''Testing Hypotheses''; pp. 175-94) and 9 (''Analysing Dodgy Data: When Things Are Not Quite Normal''; pp. 195-205) complete the discussion of statistical tests. Chapter 8 mainly focuses on the various types of t-tests (for dependent and independent samples; pp. 178-91), but also illustrates the use of chi-square tests for hypothesis testing (pp. 191-4). In contrast to this, chapter 9 presents non-parametric tests for data which do not follow a normal distribution, namely the Spearman correlation test (pp. 196-9), Kendall's tau (p. 200), the Wilcoxon signed-rank test (pp. 200-3) and the Mann-Whitney U test (pp. 203-5). (As mentioned above, the last chapter of the book, chapter 10 (pp. 206-23) actually only includes the Appendix.)
EVALUATION
Since the majority of my own students have a strong humanities but limited mathematical background, I know how difficult it is to find accessible introductory texts on quantitative linguistics for an audience that is easily intimidated by mathematical formulae, let alone statistical tests. Therefore, Rasinger's ''Quantitative Research in Linguistics'' with its reader-friendly and hands-on approach is a welcome contribution to the field. Unfortunately, however, due to reasons mainly relating to the statistics part of the book (for some of which Rasinger can't be held responsible at all), I would be more than hesitant to adopt it as the textbook for any of my classes.
With any textbook an author has to make difficult decisions as to which aspects should be focussed upon and which ignored. I think it is fair to say that in this respect Rasinger has done a good job. Part I is an extremely accessible introduction to the basics of quantitative research in linguistics and covers most of the central concepts. (Though considering the prominence of quantitative research in experimental psycholinguistics, I personally would have liked the book to have included a chapter on experiment design, covering issues such as stimuli and filler design, randomisation of stimuli, etc.; e.g. Cowart 1997. Furthermore, a section on quantitative corpus linguistic research would definitely also have been an asset; cf. Gries 2009: 173-217.) The same applies to part II of the book. It surveys most of the basic (and even some of the more advanced) statistical tests that any beginning researcher might need. Moreover, throughout the book, all these topics are presented in a way that should be easily accessible for the intended readership.
Strange as it may sound, the book's biggest problem is its date of publication. Rasinger wrote ''Quantitative Research in Linguistics'' at a time when no hands-on statistical textbook for linguists was available. In the same year as it was published though, three excellent statistical textbooks appeared on the market (Baayen 2008; Gries 2008; Johnson 2008), all of which work with the free, open source R software (http://www.r-project.org/). This makes Rasinger's use of Excel for statistical analysis a somewhat anachronistic choice, for several reasons.
Before going into details of these reasons, let me note that I strongly believe that it doesn't matter which software researchers perform their statistical analyses with, as long as the analysis is carried out in a sound and careful way. For an introductory textbook to statistics for students, however, I feel that Excel is now an unfortunate choice because:
1) It does not allow all kinds of statistical analyses: for multiple regression e.g. even Rasinger himself suggests ''changing to a different software package'' (p. 169) and for some non-parametric tests like Kendall's tau he admits that ''there is no simple way of calculating it in Excel'' (p. 200). In R, however, all of these (and many more) tests can easily be carried out ( e.g. Baayen 2008: 165-240; Gries 2008: 150). Since I would not recommend confusing students by first teaching them how to do statistics in Excel and then in R, I think it makes much more sense to start with R straightaway (especially since Excel's syntax is not really that much simpler than R's).
2) Moreover, all of three textbooks using R (Baayen 2008; Gries 2008, Johnson) are also written in a very accessible style (though Baayen's book is a somewhat more demanding read and an English version of Gries's book will not appear until later this year) and provide an even more thorough statistical introduction than Rasinger's book (which is the only linguistic introduction to statistics using Excel that I am aware of).
3) Another reason why I would prefer R over Excel (or SPSS) is the fact that the data doesn't have to be recoded. Unlike Excel (or SPSS; cf. pp. 71-5), factors such as ''gender'' with levels ''male'' and ''female'' do not have to be recoded into numbers (such as ''1'' and ''2''). This minimises the danger of beginners erroneously treating factors as numerical variables (which would invalidate their statistical analysis).
4) Finally, Excel and SPSS are commercial software packages, while students and researchers can download R for free (for further advantages of R; cf. e.g. Baayen 2008: x-xiii).
As pointed out above, since none of the R textbooks were available to Rasinger, he can obviously not be blamed for his choice of Excel as his statistical software package. However, from an instructor's point of view, these reasons would lead me not to adopt ''Quantitative Research in Linguistics'' as a textbook for any of my courses (since I wouldn't be able to use half of the book).
Besides the software issue, however, the statistical section of the book also contains a couple of slips and mistakes, which would make me question its use as a textbook:
a) In his discussion of data coding in chapter 4.8. Rasinger e.g. suggests filling in ''999'' for a missing ''age'' value (p. 73-4). This is not only unorthodox, but simply wrong (since it changes the mean age of the sample from 25 to 219.8).
b) The presentation of the chi-square test (pp. 144-9) is flawed in several respects: first, Rasinger claims that ''the chi-square test only works reliably when the minimum count in each cell is 5'' (p.148). Yet, it is not the observed frequencies but the expected ones that must meet this criterion (Gries 2008: 157). On top of that, he does not mention that the 2x2 table data he presents actually require a Yates-corrected version of the chi-square test (something that R automatically adjusts for) and he fails to point out that the significance p-value of such tests crucially depends on sample size (i.e. that larger data sets automatically yield more significant results, so that the p-value cannot be interpreted as the size of an effect; cf. Baayen 2008: 114-6; Gries 2008: 178).
c) In the section on multiple regression, coefficients and probability values are again presented as indicators of effect size (p. 170-1), with no indication that only z-scaled coefficients (Gries 2008: 260-1) allow a comparison of effect size (since the size of a coefficient crucially depends on the scale of the independent variable in question) and that the p-values are dependent on sample size (cf. above).
d) While it is mentioned that data which do not follow a normal distribution require nonparametric tests (p. 195), the author doesn't really explain how one can test data with respect to this criterion. Again, however, the required tests, i.e. the Shapiro-Wilk test for normality or Kolmogorov-Smirnov one-sample test, can easily performed in R (using the functions shapiro.test() and ks.test(), respectively; cf. Baayen 2008: 73). The omission of these tests is particularly unfortunate since the validity of multiple regression analysis crucially depends on the fact that the residuals and their variances follow a normal distribution -- again something that is not mentioned by Rasinger.
Let me conclude by pointing out again that despite the largely negative tone of the above comments, ''Quantitative Research in Linguistics'' is in fact a solid, easy-to-read introduction to quantitative linguistic research. However, mainly because of the recent publication of so many excellent statistics textbooks, I do not think this is going to become one of the main textbooks in the field.
REFERENCES
Baayen, R. H. 2008. Analyzing Linguistic Variation: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.
Cowart, W. 1997 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage.
Gries, St. Th. 2008. Statistik fuer Sprachwissenschaftler. (Studienbuch zur Linguistik 13). Goettingen: Vandenhoeck & Ruprecht.
Gries, St. Th. 2009. Quantitative Corpus Linguistics with R: A Practical Introduction. New York: Routledge.
Hirsh-Pasek, K. and R. M. Golinkoff. 1996. The Origins of Grammar: Evidence from Early Language Comprehension. Cambridge, MA: MIT Press.
Johnson, K. 2008. Quantitative Methods in Linguistics. Malden, MA and Oxford: Blackwell.
Labov, W. 1972. Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Trudgill, P. 1974.The Social Differentiation of English in Norwich. London: Cambridge University Press.
Wolfram, W. 1969. A Sociolinguistic Description of Detroit Negro Speech. Washington, DC: Center for Applied Linguistics. |