Skills Lab 07: Correlation and Chi-square

Author

MS

Google doc: bit.ly/skills-lab-07

Setup

Packages and data

Load the necessary packages:

library(tidyverse)
library(ggrain)

Data

Load the data:

games_tib <- readr::read_csv("data/video_games_data.csv")

Variables in the dataset:

  • id: Participant’s ID
  • age: Participants age
  • game: Name of the video game
  • game_type: Game classification as “Shooter”, “Sports game”, “RPG” or “Animal crossing”
  • affect: Level of emotional affect measured from -6 (most negative) to + 6 (most positive)
  • affect_cat: Categorical version of the affect variable, with values “Negative” or “Positive”
  • life_sat: life Satisfaction
  • experience: Experience of playing video games (0-100)
  • hours: Hours spend playing video games per week

Correlation

Task 1: Create a correlation matrix

  • A good first step in an analysis is to explore the associations between variables

  • Select the continuous (numeric varables) in the dataset and generate a correlation matrix. Save this selection of columns in a new object called games_tib_cor

  • Create a visualisation of the correlations

games_tib_cor <- games_tib |> 
  dplyr::select(affect, life_sat, experience, age, hours)
games_tib_cor |> GGally::ggscatmat()

Task 2: Run correlation tests

  • Run correlation tests on the numeric variables

  • Find the variables relevant to the hypothesis below

    • What is the relationship between these two variables?

    • Is the relationship strong?

    • Is it statistically significant?

    • Can we reject the null hypothesis?

Hypothesis

There will be a negative relationship between time spent playing video games and emotional affect.

correlation::correlation(games_tib_cor)
# Correlation Matrix (pearson-method)

Parameter1 | Parameter2 |        r |        95% CI | t(16977) |         p
-------------------------------------------------------------------------
affect     |   life_sat |     0.63 | [ 0.62, 0.64] |   106.24 | < .001***
affect     | experience |     0.04 | [ 0.03, 0.06] |     5.38 | < .001***
affect     |        age |     0.15 | [ 0.13, 0.16] |    19.29 | < .001***
affect     |      hours |     0.02 | [ 0.00, 0.03] |     2.42 | 0.047*   
life_sat   | experience |     0.04 | [ 0.02, 0.05] |     4.75 | < .001***
life_sat   |        age |     0.11 | [ 0.10, 0.13] |    14.65 | < .001***
life_sat   |      hours | 5.61e-03 | [-0.01, 0.02] |     0.73 | 0.464    
experience |        age |     0.49 | [ 0.48, 0.50] |    73.44 | < .001***
experience |      hours |    -0.01 | [-0.03, 0.00] |    -1.63 | 0.207    
age        |      hours |     0.17 | [ 0.16, 0.19] |    22.64 | < .001***

p-value adjustment method: Holm (1979)
Observations: 16979

Chi-square

Hypothesis

There will be an association between type of game and experiences of positive or negative affect.

Task 3: Quick data cleaning

  • We’re interested in comparing the game “Animal crossing” against games classified as “Sports game” - filter the rows that only contain these two game types and save the new dataset into an object called games_tib_chi

  • Make a prediction! Who do you think is going to be more likely to experience positive affect? Players of Animal Crossing or players of sports games (car racing)?

games_tib_chi <- games_tib |> 
  dplyr::filter(game_type %in% c("Animal crossing", "Sports game"))

Task 4: Plotting!

  • Create a bar plot showing the counts of participants across the two game types split by affect valence

  • Change the default colours

  • Adjust axis labels

  • Interpret the plot - does this it support your prediction?

games_tib_chi |> 
  ggplot2::ggplot(aes(x = game_type, fill = affect_cat)) + 
  geom_bar(position = "dodge", alpha = 0.6) +
  scale_fill_manual(values = c("darkmagenta", "lightseagreen")) +
  labs(x = "Game type", y = "Frequency", fill = "Affect") +
  theme_light() 

# if time, this is useful: 
games_tib_chi |> 
  ggplot2::ggplot(aes(x = game_type, fill = affect_cat)) + 
  geom_bar(position = "fill", alpha = 0.75) +
  scale_fill_manual(values = c("darkmagenta", "lightseagreen")) +
  labs(x = "Game type", y = "Frequency", fill = "Affect") +
  theme_light() 

Task 5: Run Chi-square test

  • Run the test of association between type of game and affect category

  • Interpret the results - does the statistical test support your prediction?

  • Can we reject the null hypothesis?

chi_test <- chisq.test(games_tib_chi$game_type, games_tib_chi$affect_cat)
chi_test

    Pearson's Chi-squared test with Yates' continuity correction

data:  games_tib_chi$game_type and games_tib_chi$affect_cat
X-squared = 147.94, df = 1, p-value < 2.2e-16
chi_test$expected
                       games_tib_chi$affect_cat
games_tib_chi$game_type  Negative Positive
        Animal crossing  950.5018 5580.498
        Sports game     1361.4982 7993.502
chi_test$observed
                       games_tib_chi$affect_cat
games_tib_chi$game_type Negative Positive
        Animal crossing     1217     5314
        Sports game         1095     8260

References:

Videogames and well-being pre-print paper (source of the dataset):

  • https://osf.io/preprints/psyarxiv/8cxyh