Tutorial 08: Linear model (1)

Setting Up

In this tutorial, you’ll get started working with the linear model. The goal of this tutorial is to get a feel for the equation of a line and the basic structure of the lm() function. We’ll be spending a lot of time on linear models, so let’s just focus on getting the basics down for now.

Task 1

Load packages and data

Load the tidyverse package.
Read in the gensex data located in the directory data/gensex.csv. Store this object using the assignment operator as gensex.

Hint

Use the functions library() and readr::read_csv().

Solution

library(tidyverse)

# load gensex data:
gensex <- readr::read_csv("data/gensex.csv")

The `lm()` function

Let’s have another look at the gensex data we’ve been working with a bit already. We’ve only had a look at a few variables thus far, but there’s lots more interesting info here!

This time we’ll use some different variables. I’ll be using romantic_freq and sexual_freq, which are ratings of how frequently the participants experience romantic and sexual attraction respectively. A higher score indicates higher frequency.

Task 4

What do you think the relationship between your two variables will be? Take some notes in your Quarto document.

Consider This

In answering the above question, consider:

What direction will the relationship be in?
- Will it be a positive or negative relationship?
- What would this look like in the linear model?
How strong will the relationship be?
- What will it look like for the relationship to be stronger or weaker?

Running the Model

The lm() function, which stands for “Linear Model”, can be used in a very similar way to the t.test function. That is, we need to specify:

The formula, in the form of outcome ~ predictor
The dataset where we can find the variables specified in the formula.

Task 5

Use the lm() function to create a linear model.

Use frequency of romantic attraction as the predictor and frequency of sexual attraction as the outcome. Save the model as freq_lm.

Hint

The general form of the function is:

lm(formula, data = some_dataset)

Adapt this code based on the instructions.

Solution

# use lm to create a linear model 
freq_lm <- gensex |> 
  lm(sexual_freq ~ romantic_freq, data = _)

Interpreting the Model

Task 6

Explore the model and interpret the results.

Call the freq_lm object and use the output to answer the quiz questions. Round all your answers to 2 decimal places.
In your Quarto notebook, write the equation of the linear model for this analysis.
In your Quarto notebook, write down your interpretation of the b₁ value you obtained. What does it tell you?
Move the sliders in the interactive visualisation above to set the visualisation to the same values from your model. Is this the strength and direction of the relationship you predicted? Write down your thoughts in your Quarto notebook.

Question 7

What is the value of b₀ in this analysis?

What is the value of b₁ in this analysis?

Question 8

What is the direction of the relationship between these variables?

Positive Exponential Negative

Hint

$b_{0}$ is labelled as the “intercept” in the output.
$b_{1}$ will be labelled with the name of the predictor.

Visualisation

Task 7

Create a scatterplot that shows the modelled variables

Set up the base layer with romantic_freq on the x axis and sexual_freq on the y axis.
Add + geom_point() to the base layer to create the scatter
Tweak the formatting to make the plot look nice and professional (see the Skills Lab!)

Hint

The base layer of a ggplot has a form of:

some_dataset |> ggplot2::ggplot(aes(x = some_predictor, y = some_outcome)) + 
   ... some other code

Solution

# create a scatterplot:
gensex |>  
  ggplot(aes(x = romantic_freq, y = sexual_freq)) +
  geom_point(position = "jitter", alpha = .1) + # important note about this line below. 
  scale_x_continuous(breaks = c(0:9)) +
  scale_y_continuous(breaks = c(0:9)) +
  labs(x = "Frequency of Romantic Attraction", y = "Frequency of Sexual Attraction") + 
  theme_bw()

If you’ve just used the “alpha” argument on geom_point, that’s great. We added the position = "jitter" argument here. The “jitter” position moves each point randomly left, right, up or down. This makes it easier to see a pattern in a dataset where both variables are on a limited integer scale (notice that if you remove the position argument, the points are just organised in a grid, because each variable can only take values 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9, and nothing in between).

We need to be careful here. By using the “jitter” position, we are, in essence, misrepresenting the actual values that occur in the data. If we wanted to write this up in a report, it might be a good idea to show “non-jittered” and “jittered” version side-by-side, so that our readers can also see the visualisation of the original data and make up their mind.

Using the Model

Task 8

Question 9

Imagine you have a friend who would rate their frequency of romantic attraction as a 2. What does your model predict their frequency of sexual attraction would be? Use the equation to help you out, using bs rounded to two decimal places. Give your answer to two decimal places as well.

Overall, what have we discovered about the relationship between the frequency of sexual and romantic attraction? What don’t we know yet? Write down your thoughts in your Quarto doc.

Recap

Well done on all of your hard work! Make sure you work on these ideas and get them down clearly; they will be very important for the rest of the module (and next year). You should now be able to do the following:

Understand how the b₀ (intercept) and b₁ (slope) values specify the line
- Explain why a b₁ value of 0 represents no relationship between the predictor and the outcome
Create a linear model using lm()
Write an equation for the linear model using lm() output
Use the equation to calculate a predicted value for the outcome, given a value of the predictor.

That’s all for today. See you soon!

ChallengR

ChallengR Time!

This task is a ChallengR, which are always optional, and will never be assessed - they’re only there to inspire you to try new things! If you solve this task successfully, you can earn a bonus 2500 Kahoot Points. You can use those points to earn bragging rights and, more importantly, shiny stickers. See the Games and Awards page on Canvas.

There are no solutions in this document for this ChallengR task. If you get stuck, ask us for help in your practicals or at the Help Desk, and we’ll be happy to point you in the right direction.

In your data folder there is a new dataset, many_data.csv. The file contains two variables, x and y, organised into subsets by dataset.

many_data <- readr::read_csv("data/many_data.csv")

Task 9

There is something strange going on in this set of datasets. Using what you have learned so far, can you work out what it is? You may want to compare summaries, visualisations, and analyses of the different dataset subsets to figure it out.

Hint

Make sure you use x as the predictor/x-axis variable, and y as the outcome/y-axis variable.

One way you could look at a specific subset of the data is by filter()ing for only one subset at a time.

Look into the ggplot2::facet_wrap() function for making lots of plots at once.

When you’ve thoroughly explored the different subsets of dataset, complete the Week 8 ChallengR quiz on Canvasto claim your Kahoot! points.

Tutorial 08: Linear model (1)

Setting Up

Task 1

Understanding the Equation

Using the Equation

Task 2

Question 1

Question 2

Question 3

Question 4

Flatlining

Task 3

Question 5

Question 6

The `lm()` function

Task 4

Running the Model

Task 5

Interpreting the Model

Task 6

Question 7

Question 8

Visualisation

Task 7

Using the Model

Task 8

Question 9

Recap

ChallengR

Task 9

Setting Up

Task 1

Understanding the Equation

Using the Equation

Task 2

Question 1

Question 2

Question 3

Question 4

Flatlining

Task 3

Question 5

Question 6

The lm() function

Task 4

Running the Model

Task 5

Interpreting the Model

Task 6

Question 7

Question 8

Visualisation

Task 7

Using the Model

Task 8

Question 9

Recap

ChallengR

Task 9

The `lm()` function