library(tidyverse)
# load gensex data:
<- here::here("data/gensex_2022.csv") |> readr::read_csv() gensex
Tutorial 08: Linear model (1)
Setting Up
In this tutorial, you’ll get started working with the linear model. The goal of this tutorial is to get a feel for the equation of a line and the basic structure of the lm()
function. We’ll be spending a lot of time on linear models, so just focus on getting the basics down for now.
Task 1
Load packages and data
- Load the
tidyverse
package. - Read in the
gensex
data located in the directorydata/gensex.csv
. Store this object using the assignment operator asgensex
.
Understanding the Equation
Let’s start with a refresher to get warmed up. In the lecture we learned about the equation of a line, a very important equation that we will be using again and again.
\[ y_{i} = b_{0} + b_{1}x_{1i} \]
This equation has the following elements:
y: The value of the outcome
b0: the intercept, or value of the outcome when the predictor is 0
b1: the slope, or change in the outcome for each unit change in the predictor
x1: the predictor
These definitions are important for you to know, but if they don’t make a lot of sense at the moment, don’t worry - that’s what this tutorial is for. We’ll start by getting a handle on what the b-values do first, and then we’ll move on to creating a linear model with the gensex
dataset to get some practice using these values.
Using the Equation
Before we jump into the data, let’s have a go practicing with the model itself and getting the hang of how it works. To do this, we’ll use the beautiful interactive visualisation below, courtesy of Dr Milan Valasek.
The yellow line, our linear model, is determined by two values:
b0, the intercept, in purple
b1, the slope, in teal
You can change the values of each by moving the sliders. The numbers in the coloured circles correspond to the line above. You can reset the sliders to 0 by double clicking on them.
As you move the teal b1 slider, notice the solid horizontal black line that moves up and down. Where this line connects with the y axis is the predicted value of y when x = 1.
Task 2
Explore the equation of a line.
- Spend a minute playing with the visualisation and moving both sliders to get the hang of how it works.
- Answer the quiz questions below.
Flatlining
Before we move on, let’s take a look at a special scenario.
Task 3
Move or reset the sliders so that the line is perfectly horizontal, then answer the quiz questions.
This is an important point, so make sure you think it through. Remember that the slope of the line captures the change in y for each unit change in x. If the slope is 0, that means that no matter how much x changes, y doesn’t change at all. So, in other words, there is no relationship between x and y.
Why is this so important? You might recognise “no relationship between x and y” as a way we often state the null hypothesis. We’ll talk more next week about hypothesis testing for linear models, but right now, the key is to realise that a b1 of 0 means no relationship.
Now that we have a sense of how the b-values in the linear model work, let’s move on to looking at some data. Remember that you can always come back to the visualisation if you’d like to practice with the equation more.
The lm()
function
Let’s have another look at the gensex
data we’ve been working with a bit already. We’ve only had a look at a few variables thus far, but there’s lots more interesting info here!
This time we’ll use some different variables. I’ll be using romantic_freq
and sexual_freq
, which are ratings of how frequently the participants experience romantic and sexual attraction respectively. A higher score indicates higher frequency.
Task 4
What do you think the relationship between your two variables will be? Take some notes in your Quarto document.
In answering the above question, consider:
What direction will the relationship be in?
Will it be a positive or negative relationship?
What would this look like in the linear model?
How strong will the relationship be?
- What will it look like for the relationship to be stronger or weaker?
Running the Model
The lm()
function, which stands for “Linear Model”, can be used in a very similar way to the t.test function. That is, we need to specify:
- The formula, in the form of
outcome ~ predictor
- The dataset where we can find the variables specified in the formula.
Task 5
Use the lm()
function to create a linear model.
- Use frequency of romantic attraction as the predictor and frequency of sexual attraction as the outcome. Save the model as
freq_lm
.
Interpreting the Model
Task 6
Explore the model and interpret the results.
- Call the
freq_lm
object and use the output to answer the quiz questions. Round all your answers to 2 decimal places. - In your Quarto notebook, write the equation of the linear model for this analysis.
- In your Quarto notebook, write down your interpretation of the b1 value you obtained. What does it tell you?
- Move the sliders in the interactive visualisation above to set the visualisation to the same values from your model. Is this the strength and direction of the relationship you predicted? Write down your thoughts in your Quarto notebook.
Visualisation
Task 7
Create a scatterplot that shows the modelled variables
- Set up the base layer with
romantic_freq
on the x axis andsexual_freq
on the y axis. - Add.
+ geom_point()
to the base layer to create the scatter - Tweak the formatting to make the plot look nice and professional (see the Skills Lab!)
Using the Model
Task 8
Overall, what have we discovered about the relationship between the frequency of sexual and romantic attraction? What don’t we know yet? Write down your thoughts in your Quarto doc.
Recap
Well done on all of your hard work! Make sure you work on these ideas and get them down clearly; they will be very important for the rest of the module (and next year). You should now be able to do the following:
Understand how the b0 (intercept) and b1 (slope) values specify the line
- Explain why a b1 value of 0 represents no relationship between the predictor and the outcome
Create a linear model using
lm()
Write an equation for the linear model using
lm()
outputUse the equation to calculate a predicted value for the outcome, given a value of the predictor.
That’s all for today. See you soon!
ChallengR
This task is a ChallengR, which are always optional, and will never be assessed - they’re only there to inspire you to try new things! If you solve this task successfully, you can earn a bonus 2500 Kahoot Points. You can use those points to earn bragging rights and, more importantly, shiny stickers. (See the Games and Awards page on Canvas.)
There are no solutions in this document for this ChallengR task. If you get stuck, ask us for help in your practicals or at the Help Desk, and we’ll be happy to point you in the right direction.
In your data
folder there is a new dataset, many_data.csv
. The file contains two variables, x
and y
, organised into subsets by dataset
.
<- readr::read_csv("data/many_data.csv") many_data
Task 9
There is something strange going on in this set of datasets. Using what you have learned so far, can you work out what it is? You may want to compare summaries, visualisations, and analyses of the different dataset
subsets to figure it out.
When you’ve thoroughly explored the different subsets of dataset
, complete the Week 8 ChallengR quiz on Canvas to claim your Kahoot! points.