Tutorial 08: Linear model (1)

Setting Up

In this tutorial, you’ll get started working with the linear model. The goal of this tutorial is to get a feel for the equation of a line and the basic structure of the lm() function. We’ll be spending a lot of time on linear models, so just focus on getting the basics down for now.

Task 1

Load packages and data

  1. Load the tidyverse package.
  2. Read in the gensex data located in the directory data/gensex.csv. Store this object using the assignment operator as gensex.

Use the functions library() and readr::read_csv().

library(tidyverse)

# load gensex data:
gensex <- here::here("data/gensex_2022.csv") |> readr::read_csv()

Understanding the Equation

Let’s start with a refresher to get warmed up. In the lecture we learned about the equation of a line, a very important equation that we will be using again and again.

\[ y_{i} = b_{0} + b_{1}x_{1i} \]

This equation has the following elements:

  • y: The value of the outcome

  • b0: the intercept, or value of the outcome when the predictor is 0

  • b1: the slope, or change in the outcome for each unit change in the predictor

  • x1: the predictor

These definitions are important for you to know, but if they don’t make a lot of sense at the moment, don’t worry - that’s what this tutorial is for. We’ll start by getting a handle on what the b-values do first, and then we’ll move on to creating a linear model with the gensex dataset to get some practice using these values.

Using the Equation

Before we jump into the data, let’s have a go practicing with the model itself and getting the hang of how it works. To do this, we’ll use the beautiful interactive visualisation below, courtesy of Dr Milan Valasek.

Using the Visualisation

The yellow line, our linear model, is determined by two values:

  • b0, the intercept, in purple

  • b1, the slope, in teal

You can change the values of each by moving the sliders. The numbers in the coloured circles correspond to the line above. You can reset the sliders to 0 by double clicking on them.

As you move the teal b1 slider, notice the solid horizontal black line that moves up and down. Where this line connects with the y axis is the predicted value of y when x = 1.

Task 2

Explore the equation of a line.

  1. Spend a minute playing with the visualisation and moving both sliders to get the hang of how it works.
  2. Answer the quiz questions below.

Question 1

What happens to the line as you change the value of b0 (purple slider)?


Question 2

What happens to the line as you change the value of b1 (teal slider)?


Question 3

Set the sliders so that the line slopes down from left to right. What is the direction of the relationship quantified by b1?


Question 4

Set b0 to -.66 and b1 to 4.65. What’s the predicted value of y to the nearest whole number?

Well done so far. You can keep playing with the visualisation as much as you like - it will help a lot if you can understand how these values work in the linear model equation.


Flatlining

Before we move on, let’s take a look at a special scenario.

Task 3

Move or reset the sliders so that the line is perfectly horizontal, then answer the quiz questions.

Question 5

When the linear model is perfectly horizontal, what is the value of b1?


Question 6

When b1 is 0, what is the relationship between x and y?


Important

This is an important point, so make sure you think it through. Remember that the slope of the line captures the change in y for each unit change in x. If the slope is 0, that means that no matter how much x changes, y doesn’t change at all. So, in other words, there is no relationship between x and y.

Why is this so important? You might recognise “no relationship between x and y” as a way we often state the null hypothesis. We’ll talk more next week about hypothesis testing for linear models, but right now, the key is to realise that a b1 of 0 means no relationship.

Now that we have a sense of how the b-values in the linear model work, let’s move on to looking at some data. Remember that you can always come back to the visualisation if you’d like to practice with the equation more.

The lm() function

Let’s have another look at the gensex data we’ve been working with a bit already. We’ve only had a look at a few variables thus far, but there’s lots more interesting info here!

This time we’ll use some different variables. I’ll be using romantic_freq and sexual_freq, which are ratings of how frequently the participants experience romantic and sexual attraction respectively. A higher score indicates higher frequency.

Task 4

What do you think the relationship between your two variables will be? Take some notes in your Quarto document.

Consider This

In answering the above question, consider:

  • What direction will the relationship be in?

    • Will it be a positive or negative relationship?

    • What would this look like in the linear model?

  • How strong will the relationship be?

    • What will it look like for the relationship to be stronger or weaker?

Running the Model

The lm() function, which stands for “Linear Model”, can be used in a very similar way to the t.test function. That is, we need to specify:

  • The formula, in the form of outcome ~ predictor
  • The dataset where we can find the variables specified in the formula.

Task 5

Use the lm() function to create a linear model.

  • Use frequency of romantic attraction as the predictor and frequency of sexual attraction as the outcome. Save the model as freq_lm.

The general form of the function is:

lm(formula, data = some_dataset)

Adapt this code based on the instructions.

# use lm to create a linear model 
freq_lm <- gensex |> 
  lm(sexual_freq ~ romantic_freq, data = _)

Interpreting the Model

Task 6

Explore the model and interpret the results.

  1. Call the freq_lm object and use the output to answer the quiz questions. Round all your answers to 2 decimal places.
  2. In your Quarto notebook, write the equation of the linear model for this analysis.
  3. In your Quarto notebook, write down your interpretation of the b1 value you obtained. What does it tell you?
  4. Move the sliders in the interactive visualisation above to set the visualisation to the same values from your model. Is this the strength and direction of the relationship you predicted? Write down your thoughts in your Quarto notebook.

Question 7

What is the value of b0 in this analysis?

What is the value of b1 in this analysis?

Question 8

What is the direction of the relationship between these variables?


  • \(b_0\) is labelled as the “intercept” in the output.

  • \(b_1\) will be labelled with the name of the predictor.

Visualisation

Task 7

Create a scatterplot that shows the modelled variables

  1. Set up the base layer with romantic_freq on the x axis and sexual_freq on the y axis.
  2. Add. + geom_point() to the base layer to create the scatter
  3. Tweak the formatting to make the plot look nice and professional (see the Skills Lab!)

The base layer of a ggplot has a form of:

some_dataset |> ggplot2::ggplot(aes(x = some_predictor, y = some_outcome)) + 
   ... some other code
# create a scatterplot:
gensex |>  
  ggplot(aes(x = romantic_freq, y = sexual_freq)) +
  geom_point(position = "jitter", alpha = .4) +
  scale_x_continuous(name = "Frequency of Romantic Attraction",
                     breaks = c(0:9)) +
  scale_y_continuous(name = "Frequency of Sexual Attraction",
                     breaks = c(0:9)) +
  theme_bw()

Using the Model

Task 8

Question 9

Imagine you have a friend who would rate their frequency of romantic attraction as a 2. What does your model predict their frequency of sexual attraction would be? Use the equation to help you out, using bs rounded to two decimal places. Give your answer to two decimal places as well.

Overall, what have we discovered about the relationship between the frequency of sexual and romantic attraction? What don’t we know yet? Write down your thoughts in your Quarto doc.

Recap

Well done on all of your hard work! Make sure you work on these ideas and get them down clearly; they will be very important for the rest of the module (and next year). You should now be able to do the following:

  • Understand how the b0 (intercept) and b1 (slope) values specify the line

    • Explain why a b1 value of 0 represents no relationship between the predictor and the outcome
  • Create a linear model using lm()

  • Write an equation for the linear model using lm() output

  • Use the equation to calculate a predicted value for the outcome, given a value of the predictor.

That’s all for today. See you soon!

ChallengR

ChallengR Time!

This task is a ChallengR, which are always optional, and will never be assessed - they’re only there to inspire you to try new things! If you solve this task successfully, you can earn a bonus 2500 Kahoot Points. You can use those points to earn bragging rights and, more importantly, shiny stickers. (See the Games and Awards page on Canvas.)

There are no solutions in this document for this ChallengR task. If you get stuck, ask us for help in your practicals or at the Help Desk, and we’ll be happy to point you in the right direction.

In your data folder there is a new dataset, many_data.csv. The file contains two variables, x and y, organised into subsets by dataset.

many_data <- readr::read_csv("data/many_data.csv")

Task 9

There is something strange going on in this set of datasets. Using what you have learned so far, can you work out what it is? You may want to compare summaries, visualisations, and analyses of the different dataset subsets to figure it out.

Make sure you use x as the predictor/x-axis variable, and y as the outcome/y-axis variable.


Next


One way you could look at a specific subset of the data is by filter()ing for only one subset at a time.


Next


Look into the ggplot2::facet_wrap() function for making lots of plots at once.


When you’ve thoroughly explored the different subsets of dataset, complete the Week 8 ChallengR quiz on Canvas to claim your Kahoot! points.