correla test

Upload: gerlg

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Correla Test

    1/19

    SPSS for Psychologists Contents vii

    Contents

    Dedications v

    Preface xi

    Acknowledgements xv

    Chapter One Introduction 1

    1 Psychological research and SPSS 2

    2 Some basic statistical concepts 4

    3 Working with SPSS 18

    4 Starting SPSS 20

    5 How to exit from SPSS 246 Some useful option settings in SPSS 25

    Chapter Two Data entry in SPSS 27

    1 The Data Editor window 28

    2 Defining a variable in SPSS 30

    3 Entering data 42

    4 Saving a data file 45

    5 Opening a data file 48

    6 Data entry exercises 50

    7 Answers to data entry exercises 548 Summary descriptive statistics and the Viewer window 56

    Chapter Three Tests of difference for two sample designs 69

    1 An introduction to the t-test 70

    2 The independent t-test 71

    3 The paired t-test 79

    4 An introduction to the nonparametric equivalents of the t-test 85

    5 The MannWhitney test 86

    6 The Wilcoxon test 89

    Chapter Four Tests of correlation 93

    1 An introduction to tests of correlation 94

    2 Descriptive statistics in correlation 95

    3 Pearsons r: parametric test of correlation 102

    4 Spearmans rs: nonparametric test of correlation 106

  • 8/2/2019 Correla Test

    2/19

    viii SPSS for Psychologists Contents

    Chapter Five Tests for nominal data 109

    1 Nominal data and dichotomous variables 110

    2 Chi-square tests versus the chi-square distribution 112

    3 The goodness-of-fit chi-square 113

    4 The multi-dimensional chi-square 114

    5 The McNemar test for repeated measures 127

    Chapter Six Data handling 131

    1 An introduction to data handling 132

    2 Sorting a file 133

    3 Splitting a file 135

    4 Selecting cases 137

    5 Recoding values 141

    6 Computing new variables 146

    7 Counting values 149

    8 Ranking cases 1529 Other useful functions 155

    10 Data file for scales or questionnaires 157

    Chapter Seven Analysis of variance 161

    1 An introduction to analysis of variance (ANOVA) 162

    2 One-way between-subjects ANOVA 175

    3 Two-way between-subjects ANOVA 182

    4 One-way within-subjects ANOVA 188

    5 Two-way within-subjects ANOVA 1946 Mixed ANOVA 204

    7 Some additional points 210

    8 Planned and unplanned comparisons 213

    9 Nonparametric equivalents to ANOVA: KruskalWallis and Friedman 221

    Chapter Eight Multiple regression 227

    1 An introduction to multiple regression 228

    2 Performing a multiple regression on SPSS 235

    Chapter Nine Analysis of covariance and multivariate analysis of

    variance 245

    1 An introduction to analysis of covariance 246

    2 Performing analysis of covariance on SPSS 250

    3 An introduction to multivariate analysis of variance 263

    4 Performing multivariate analysis of variance on SPSS 267

  • 8/2/2019 Correla Test

    3/19

    SPSS for Psychologists Contents ix

    Chapter Ten Discriminant analysis and logistic regression 273

    1 Discriminant analysis and logistic regression 274

    2 An introduction to discriminant analysis 276

    3 Performing discriminant analysis on SPSS 280

    4 An introduction to logistic regression 293

    5 Performing logistic regression on SPSS 294

    Chapter Eleven Factor analysis, and reliability and dimensionality

    of scales 301

    1 An introduction to factor analysis 302

    2 Performing a basic factor analysis on SPSS 313

    3 Other aspects of factor analysis 326

    4 Reliability analysis for scales and questionnaires 331

    5 Dimensionality of scales and questionnaires 337

    Chapter Twelve Beyond the basics 3411 The syntax window 342

    2 Option settings in SPSS 350

    3 Getting help in SPSS 352

    4 Printing from SPSS 355

    5 Incorporating SPSS output into other documents 358

    6 Graphing tips 359

    7 Interactive charts 365

    Glossary 367

    References 387

    Appendix 1: Data files 391

    Appendix II: Defining a variable in SPSS versions 8 and 9 423

    Appendix III: Adding regression lines to scattergrams before Version 12 433

    1 Simple scattergram (Chapter 4, Section 2) 434

    2 Scattergram with multiple groups (Chapter 9, Section 2) 438

    Index 441

  • 8/2/2019 Correla Test

    4/19

    Chapter Four

    Tests of correlation

    An introduction to tests of correlation

    Descriptive statistics in correlation

    Pearsons r: parametric test of

    correlation

    Spearmans rs: nonparametric test ofcorrelation

  • 8/2/2019 Correla Test

    5/19

    94 SPSS for Psychologists Chapter Four

    Section 1: An introduction to tests of correlation

    Researchers often wish to measure the degree of relationship between two variables.

    For example, there is likely to be a relationship between age and reading ability in

    children. Such an investigation is not a true experiment, for the same reason that a

    natural independent groups design (e.g., when age group or sex is selected as the

    grouping variable) is not a true experiment. In both, the experimenter does not

    manipulate the independent variable, and no statement about causation can be

    made. In a natural independent groups design, the experimenter chooses the levels

    of the independent variable from natural characteristics, and then looks for

    differences between the groups. In a correlation there is no independent variable:

    you simply measure two variables. So, if someone wished to investigate the effect

    of smoking on respiratory function, then, in a natural independent groups design,

    you could choose to measure and then compare respiratory function in smokers

    with that in non-smokers. A more common design, however, would be forresearchers to measure both how many cigarettes people smoke and their

    respiratory function, and then test for a correlation.

    An important point to remember is that correlation does not imply causation. In any

    correlation, there could be a third variable which explains the association between the

    two variables that you measured. For example, there may be a correlation between the

    number of ice creams sold and the number of people who drown. Here temperature is

    the third variable, which could explain the relationship between the measured

    variables. Even when there seems to be a clear cause and effect relationship, a

    correlation alone is not sufficient evidence for a causal relationship. Only if one

    variable has been manipulated can one draw such conclusions.

    Francis Galton carried out early work on correlation, and one of his colleagues,

    Pearson, developed a method of calculating correlation coefficients for parametric

    data: Pearson's Product Moment Correlation Coefficient (Pearsons r). When one or

    both of the scales is not either interval or ratio, or if the data do not meet the other two

    assumptions for using parametric statistical tests, then a nonparametric test of

    correlation such as Spearmans rs should be used. Thes is to distinguish it from

    Pearsons r. This test was originally called Spearmans (the Greek letter rho).

    Note that for a correlation to be acceptable one should normally test at least 100

    participants; otherwise a small number of participants with extreme scores could

    skew the data and either prevent a correlation from being revealed when it does

    exist or cause an apparent correlation that does not really exist. The scattergram is a

    useful tool for checking such eventualities.

  • 8/2/2019 Correla Test

    6/19

    SPSS for Psychologists Chapter Four 95

    Section 2: Descriptive statistics in correlation

    One of the easiest ways to tell if two items are related and to spot trends is to plot

    scattergrams or scatterplots. Figure 4.1 shows a hypothetical example. Each point on

    the scattergram represents the age and the reading ability of one child. The line running

    through the data points is called a regression line. It represents the best fit of a

    straight line to the data points. The line in Figure 4.1 slopes upwards from left to right:

    as one variable increases in value, the other variable also increases in value and this is

    called a positive correlation. The closer the points are to being on the line itself, the

    stronger the correlation. If all the points fall along the straight line, then it is said to be a

    perfect correlation. The scattergram will also show you any outliers.

    Figure 4.1. Scattergram illustrating a positive correlation: hypothetical data for the

    relationship between age and reading ability in children.

    In the scattergram shown in Figure 4.2, the dots are scattered randomly, all over the

    graph. It is not possible to draw any meaningful best fit line at all, and the

    correlation would be close to zero: that is, there is no relationship between the two

    variables.

    Figure 4.2. Scattergram showing two variables with zero relationship.

  • 8/2/2019 Correla Test

    7/19

    96 SPSS for Psychologists Chapter Four

    It is often the case that as one variable increases in value, the other variable decreases in

    value: this is called a negative correlation. In the following example of how to produce

    a scattergram with SPSS, we are going to use data that give a negative correlation.

    EXAMPLE STUDY: RELATIONSHIP BETWEEN AGE AND CFF

    A paper by Mason, Snelgar, Foster, Heron and Jones (1982) described an

    investigation of (among other things) whether the negative correlation between age

    and CFF (explained below) is different for people with Multiple Sclerosis than for

    control participants. For this example, we have created a data file that will

    reproduce some of the findings for the control participants. CFF can be described

    briefly and somewhat simplistically as follows. If a light is flickering on and off at a

    low frequency, then most people can detect the flicker. If the frequency of flicker is

    increased then eventually it looks like a steady light. The frequency at which

    someone can no longer perceive flicker is called his or her critical flicker frequency(CFF). (These data are available in Appendix I or from the web address listed

    there.)

    How to obtain a scattergram

    Click on Graphs on the menu bar, and then from the menu select Scatter. In the

    Scatter/Dot dialogue box, shown below, click on the Simple Scatter display, then

    click on the Define button. (Note: in Version 12 and earlier, the dialogue box is called

    Scatterplot.)

    The other options in the Scatter/Dot (Scatterplot)dialogue box produce other types of

    graph, which you can explore in the future. We will only be describing the Simple

    Scatter command. After you have clicked on the Define button, the Simple

    Scatterplot dialogue box will appear.

  • 8/2/2019 Correla Test

    8/19

    SPSS for Psychologists Chapter Four 97

    In the Simple Scatterplot dialogue box, shown below, move the variable names,

    one into the box labelled X Axis, and one into the Y Axis box. You can use the

    Titles button and the Options button if you wish.

    When you have finished, click on OK. The Output Window will open, containing

    the scattergram: a part of that window is shown on the next page.

    TIP The Panel by facility, introduced in Version 13, allows you to plot a scattergram

    for two different groups at the same time. For example, if we had recorded the gender of

    the participants then we could plot a scattergram for men and for women separately by

    moving the grouping variable name into Columns. A second grouping variable (e.g.,

    patient or control participant) could beused in Rows to produce 4 separate scattergramsin all.

  • 8/2/2019 Correla Test

    9/19

    98 SPSS for Psychologists Chapter Four

    How to add a regression line to the scattergram

    To add the regression line, you have to edit the graph: start by double-clicking in

    the scattergram, and the SPSS Chart Editor window, shown at the bottom of this

    page, will appear.

    From this point in the procedure, changes were made between SPSS Versions 11

    and 12, and another change between Versions 12 and 13. Here we show you how to

    produce the regression line for Version 12 and also for Version 13. The procedure

    for Version 10 or 11 is shown in Appendix III.

  • 8/2/2019 Correla Test

    10/19

    SPSS for Psychologists Chapter Four 99

  • 8/2/2019 Correla Test

    11/19

    100 SPSS for Psychologists Chapter Four

    You can copy the scattergram and paste it into a Word document for a report, adding a

    suitable figure legend. For example, see Figure 4.3 on the next page.

    TIP Figure legends should be suitable for the work into which you are incorporating

    the figure. The legend to Figure 4.3 might be suitable for a report about the study into

    age and CFF. The legends to Figures 4.1 and 4.2, however, are intended to help you

    follow the explanation in this book, and would not be suitable for a report.

    In addition to adding the regression line you can edit other elements of the chart, to

    improve appearance. For example, SPSS charts are usually rather large. If you leave

    them large, then the report will be spread over more pages than necessary which can

    hinder the ease with which the reader follows your argument. You can shrink charts

    easily in Word, but it is best to change the size in Chart Editor as then the font andsymbol size will automatically be adjusted for legibility. Editing would also be useful

    when a number of cases all fall at the same point. The data that we use to illustrate use

    of Spearmans rs (Section 4) demonstrates that situation. To clearly illustrate the data

    you can edit the data symbols in Chart Editor, so that they vary in size according to

    the number of cases at each point. Guidelines on the appearance of Figures are

    given in APA (2001).

  • 8/2/2019 Correla Test

    12/19

    SPSS for Psychologists Chapter Four 101

    Figure 4.3. Critical flicker frequency (in Hz) plotted against participants age (in

    years).

    A scattergram is a descriptive statistic that illustrates the data, and can be used to

    check the data. For example, there may be some extreme outliers that strongly

    influence the regression line, or there may be a non-linear relationship. If there doesappear to be a linear relationship (as Pearsons r makes the assumption that any

    relationship will be linear) we can find out whether or not it is significant with an

    inferential statistical test of correlation. A test of correlation will give both the

    significance value and the strength of the correlation. The strength of correlation is

    indicated by the value of the correlation coefficient which varies between 1 and 0.

    A perfect negative correlation would have a coefficient of 1, and a perfect positive

    correlation would have a coefficient of +1. In psychology perfect correlations (in

    which all the points fall exactly on the regression line) are extremely rare and rather

    suspect.

    Note the R Sq Linear value that appears in the scattergram (Versions 12 and 13).

    This is not the correlation coefficient itself; it is the square of Pearsons r(which we

    demonstrate in Section 3). r2 is itself a useful statistic that we will return to in

    Section 3. You can remove the R Sq legend if you wish: in the Chart Editor window

    double-click on the legend, so that it is selected, then press delete key.

  • 8/2/2019 Correla Test

    13/19

    102 SPSS for Psychologists Chapter Four

    Section 3: Pearsons r: parametric test of correlation

    To illustrate how to carry out this parametric test of correlation, we will use the

    same data as we used to obtain the scattergram and regression line.

    The hypothesis tested was that there would be a negative correlation between CFF

    and age.

    The study employed a correlational design. Two variables were measured. The first

    was age, operationalised by asking participants who ranged in age from 25 to 66 to

    participate. The second variable was CFF, operationalised by using a flicker

    generator to measure CFF for each participant: six measures were made, and the

    mean taken to give a single CFF score for each participant.

    HOW TO PERFORM A PEARSONS R

    TIP SPSS will correlate each variable that you include with every other variable that

    you include. Thus, if you included three variables A, B and C, it will calculate the

    correlation coefficient for A * B, A * C and B * C. In the Pearsons rexample we have just

    two variables, but in the Spearmans rs example we include three variables so that you

    can see what a larger correlation matrix looks like.

  • 8/2/2019 Correla Test

    14/19

    SPSS for Psychologists Chapter Four 103

    TIPIn the

    Bivariate Correlationsdialogue box, you have the option of choosing

    either a one- or two-tailed test, and SPSS will then print the appropriate value ofp. In the

    statistical tests that we have covered previously, SPSS prints the two-tailed p value, and

    if you have a one-tailed hypothesis you halve that value to give the one-tailed p value.

    The annotated output for Pearsons ris shown on the next page.

  • 8/2/2019 Correla Test

    15/19

    104 SPSS for Psychologists Chapter Four

    SPSS OUTPUT FOR PEARSONS R

    Obtained Using Menu Item: Correlate > Bivariate

    What you might write in a report is given below, after we tell you about effect sizes

    in correlation.

    TIP For correlations, the sign of the coefficient indicates whether the correlation is

    positive or negative, so you must report it (unlike the sign in a t-test analysis).

  • 8/2/2019 Correla Test

    16/19

    SPSS for Psychologists Chapter Four 105

    EFFECT SIZES IN CORRELATION

    The value ofrindicates the strength of the correlation, and it is a measure of effect

    size (see Chapter 1, Section 2). As a rule of thumb, rvalues of 0 to .2 are generally

    considered weak, .3 to .6 moderate, and .7 to 1 strong. The strength of the correlation

    alone is not necessarily an indication of whether it is an important correlation: thesignificance value should normally also be considered. With small sample sizes this

    is crucial, as strong correlations may easily occur by chance. With large to very

    large sample sizes, however, even a small correlation can be highly statistically

    significant. To illustrate that, look at a table of the critical values ofr(in the back of

    most statistics text books). For example, if you carry out a correlation study with a

    sample of 100 and obtain rof .2, it is significant at the .05 level, two-tailed. Yet .2 is

    only a weak correlation. In some survey studies sample sizes may be in the

    thousands, so significance alone cannot be used a guide. Instead the effect size and

    the proportion of variation explained may be more important.

    The concept of proportion of variance explained is described in Chapter 7, Section

    1. Briefly, a correlation coefficient allows us to estimate the proportion of variation

    within our data that is explained by the relationship between the two variables. (The

    remaining variation is down to extraneous variables, both situational and

    participant.) The proportion of variation explained is given by r2. Thus, for the age

    and CFF example in which r = .78, r2

    = .6084 and we can say that 60% of the

    variation in the CFF data can be attributed to age. Note that, logically, we can just

    as easily say that 60% of the variation in the age data can be attributed to CFF. The

    latter statement should make it clear that we are not implying a causal relationship:

    we cannot do so with correlation. The important practical point is that the two

    variables have quite a lot of variation in common, and one could use a persons age

    to predict what their CFF might be. If their measured CFF is outside the lower

    confidence limit for their age, then we could investigate further.

    Note that the proportion of variation explained does not have to be large to be

    important. How important it is may depend on the purpose of the study (see Howell,

    2002, pp 304305). Proportion of variance explained in correlational designs will

    be returned to in Chapter 8 on multiple regression.

    Reporting the results

    In a report you might write: There was a significant negative correlation between

    age and CFF (r = .780, N = 20, p < .0005, one-tailed). It is a fairly strong

    correlation: 60.8% of the variation is explained. The scattergram (Figure 4.3) shows

    that the data points are reasonably well distributed along the regression line, in a

    linear relationship with no outliers.

  • 8/2/2019 Correla Test

    17/19

    106 SPSS for Psychologists Chapter Four

    Section 4: Spearmans rs: nonparametric test ofcorrelation

    If either (or both) of the two variables involved in a correlational design are

    nonparametric (because they do not meet the assumptions for parametric data, seeChapter 1, Section 2), then we use a nonparametric measure of correlation. Here,

    we describe two such tests, Spearmans rs and Kendalls tau-b.

    EXAMPLE STUDY: THE RELATIONSHIPS BETWEEN ATTRACTIVENESS,

    BELIEVABILITY AND CONFIDENCE

    Previous research using mock juries has shown that attractive defendants are less

    likely to be found guilty than unattractive defendants, and that attractive individuals

    are frequently rated more highly on other desirable traits, such as intelligence. In astudy undertaken by one of our students, participants saw the testimony of a woman

    in a real case of alleged rape. They were asked to rate her, on a scale of 1 to 7, in

    terms of how much confidence they placed in her testimony, how believable she

    was and how attractive she was. (These data are available in Appendix I or from the

    web address listed there.)

    The design employed was correlational; with three variables each measured on a 7

    point scale. Although it often accepted that such data could be considered interval

    in nature (see Chapter 1, Section 2), for the purpose of this Section we will consider

    it as ordinal data. The hypotheses tested were that:

    1. There would be a positive relationship between attractiveness and confidence

    placed in testimony.

    2. There would be a positive relationship between attractiveness and believability.

    3. There would be a positive relationship between confidence placed in testimony and

    believability.

    TIP We are using this study to illustrate use of Spearmans rs and some other aspects

    of correlation. However, multiple regression (Chapter 8) would usually be more

    appropriate for 3 or more variables in a correlational design.

    HOW TO PERFORM SPEARMANS RS

    Carry out steps 1 to 5 as for the Pearsons r (previous Section). At step 6 select

    Spearman instead ofPearson (see Bivariate Correlations dialogue box below).

  • 8/2/2019 Correla Test

    18/19

    SPSS for Psychologists Chapter Four 107

    This example also illustrates the fact that you can carry out more than one

    correlation at once. There are three variables, and we want to investigate the

    relationship between each variable with each of the other two. To do this you

    simply highlight all three variable names and move them all into the Variables box.

    The SPSS output for Spearmans rs is shown below.

    SPSS OUTPUT FOR SPEARMANS RS

    Obtained Using Menu Item: Correlate > Bivariate

  • 8/2/2019 Correla Test

    19/19

    108 SPSS f P h l i t Ch t F

    REPORTING THE RESULTS

    When reporting the outcome for each correlation, you would write at the

    appropriate points:

    There was a significant positive correlation between confidence in testimony and

    believability (rs = .372,N= 89,p < .0005, two-tailed).There was no significant correlation between confidence in testimony and

    attractiveness (rs = .157,N= 89,p = .143, two-tailed).

    There was a significant positive correlation between attractiveness and believability

    (rs = .359,N= 89, p = .001, two-tailed).

    You could illustrate each pair of variables in a scattergram (see Section 2). These

    data illustrate an aspect of scattergrams mentioned in Section 2. Many cases have

    the same values on both variables and it is unclear where all the cases are. To

    clearly illustrate the data you can edit the data symbols in Chart Editor, so that they

    vary in size according to the number of cases at each position.

    Note that the R Sq Linear value, given in the scattergram when you add a regression

    line, is the square of Pearsons r (r2) and not the square of Spearmans rs. As

    described in Section 3, r2 indicates the proportion of variation explained. You will

    see that it is rather small for each of these three relationships; the largest is 18.4%.

    As this research deals with possible influences on jury decisions, a small amount of

    variance explained might nonetheless be important.

    HOW TO PERFORM KENDALLS TAU-B:

    Some researchers prefer to use Kendalls tau instead of Spearmans rs. To undertake

    a Kendalls tau, follow the same steps as for Pearsons r, but at step 6 select

    Kendalls tau-b. The output takes the same form as that for Spearmans rs.

    Kendalls tau-b takes ties into account. Kendalls tau-c, which ignores ties, is

    available in Crosstabs (see Chapter 5, Section 4).