library(tidyverse)
Skills Lab 01: Selecting variables, logical assertions and filtering
You can access the Skills Lab project for Week 1 on Posit Cloud.
Also check the Analysing Data Panopto page for recordings or the main Posit Cloud page for other materials.
Setup
Packages
Data
Run this code chunk to read in the data into your Environment.
<- readr::read_csv("data/smarvus_data.csv") smarvus_tib
Codebook
Run this code chunk to open the Codebook in the Viewer tab.
::rstudio_viewer("smarvus_codebook.html", "data") ricomisc
Logical Assertions
Claims or statements that can be either TRUE
or FALSE
- they return logical data.
Single Assertions
Numbers
## Single equals is reserved for assignment (similar to <-)
2 = 2
## Use two equals for "exactly equals"
2 == 2
## Single symbols for less than/greater than
38 > 8
90 < -50
## Use two symbols for less/greater than or equal to
3068345098 <= 1
## Order of symbols matters!
3068345098 =< 1
Error: <text>:15:13: unexpected '<'
14: ## Order of symbols matters!
15: 3068345098 =<
^
Strings
## Exact text matching
"black" != "white"
[1] TRUE
## Case-sensitive!
"Hello!" == "hello!"
[1] FALSE
Vectorised Assertions
## Create and store a vector
<- 95:105
scores scores
[1] 95 96 97 98 99 100 101 102 103 104 105
## Make an assertion about every element
>= 100 scores
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
Matching Multiple Values
## Matching to a vector of values with %in%
c("cat", "rat", "dragon", "dinosaur", "opossum", "honeybee", "dog") %in% c("dog", "dinosaur")
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE
Convenience function dplyr::between()
to get all values between and including two values.
::between(scores, 98, 102) dplyr
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
Missing Values
Family of is.*()
functions that check for particular types of data. (The *
is a placeholder for lots of different things, like is.numeric()
, is.logical()
, etc.)
Missing values are NA
(not available) in R. The function is.na()
returns TRUE
if a piece of data IS NA
(that is, it IS missing) and FALSE
for anything else.
## Single assertions
is.na("My will to live")
[1] FALSE
is.na(NA)
[1] TRUE
## Vectorised assertion
<- c("My faith in humanity", "My love for R", NA, "My obsession with hex stickers")
missing_things is.na(missing_things)
[1] FALSE FALSE TRUE FALSE
Combining Assertions
&
AND|
OR
## Combine with & (AND) - all must be TRUE to return TRUE
0 < 1 & "dragons" == "real"
[1] FALSE
## Combine with | (OR) - at least one must be TRUE to return TRUE
0 < 1 | "dragons" == "real"
[1] TRUE
Using filter()
View the Data
## Print out the dataset in the document
smarvus_tib
# A tibble: 2,776 × 34
unique_id country language university degree_major degree_year age gender
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 X8V0T6 Netherla… English Universit… Psychology 1st Year 18-21 Femal…
2 J3W3Y7 England English Universit… Psychology 1st Year 18-21 Femal…
3 S7C2L2 England English Universit… Psychology 1st Year 22-25 Femal…
4 Y4Z6A6 Scotland English Universit… Psychology 1st Year 26+ Femal…
5 L2O9Z1 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
6 B5I6O0 Austria German Universit… Psychology 1st Year 18-21 Femal…
7 N8H9D1 England English Loughboro… Psychology 1st Year 18-21 Male/…
8 F2J7V4 England English Bournemou… Psychology 1st Year 18-21 Femal…
9 N9M3V8 Germany German Universit… Psychology 1st Year 18-21 Femal…
10 O3F8F8 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
# ℹ 2,766 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
# in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
# cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
# ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
# r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
# r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …
## Print a vector of variable names
names(smarvus_tib)
[1] "unique_id" "country" "language"
[4] "university" "degree_major" "degree_year"
[7] "age" "gender" "spld"
[10] "in_person_lectures" "in_person_practicals" "atms_per"
[13] "belief" "bfne" "cas_cre"
[16] "cas_non" "crt" "ius_sf_inh"
[19] "ius_sf_pro" "lsas_sr_per" "lsas_sr_soc"
[22] "ngse" "r_mars_course" "r_mars_num"
[25] "r_mars_test" "r_tas_bod" "r_tas_ten"
[28] "r_tas_tes" "r_tas_worry" "stars_ask"
[31] "stars_int" "stars_test" "sticsa_cog"
[34] "sticsa_som"
## Open in new tab in RStudio
View(smarvus_tib)
Assertions about Variables
Use the name of the variable in the dataset to write the assertions.
Keep only cases whose language was NOT English.
- Check the Codebook.
- Variable name:
language
- Possible values: English, German, Spanish, French, Bahasa Indonesia, Italian, Dutch, Turkish, Estonian, Hungarian, Romanian, Hebrew
- Variable name:
- Write the statement using the variable name and values
language != "English"
- ‘Which values in the variable
language
are NOT equal to the value “English”?’
- Use in
filter()
command - Check output!
::filter(
dplyr
smarvus_tib,!= "English"
language )
# A tibble: 1,265 × 34
unique_id country language university degree_major degree_year age gender
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 B5I6O0 Austria German Universit… Psychology 1st Year 18-21 Femal…
2 N9M3V8 Germany German Universit… Psychology 1st Year 18-21 Femal…
3 V9K7F4 Estonia Estonian Universit… Psychology 1st Year 26+ Male/…
4 E2D7J2 Indonesia Bahasa … Padjadjar… Psychology 1st Year 18-21 Femal…
5 P6C1F2 Estonia Estonian Universit… Psychology 1st Year <NA> Femal…
6 N8E5D3 Germany German Universit… Psychology 1st Year 18-21 Femal…
7 Z0L8D0 Indonesia Bahasa … Bina Nusa… Psychology 1st Year 18-21 Femal…
8 U4K5B9 Netherla… Dutch Tilburg U… Psychology 1st Year 18-21 Femal…
9 V2N2M5 Netherla… Dutch Tilburg U… Psychology 1st Year 18-21 Femal…
10 V5C8I4 Indonesia Bahasa … Bina Nusa… Psychology 1st Year 18-21 Femal…
# ℹ 1,255 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
# in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
# cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
# ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
# r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
# r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …
Keep only cases that had online or hybrid lectures.
- Check the Codebook.
- Variable name:
in_person_lectures
- Possible values: “Online”, “In-person”, “Hybrid”, “Other (please specify)
- Variable name:
- Write the statement using the variable name and values
in_person_lectures %in% c("Online", "Hybrid")
- ‘Which values in the variable
in_person_lectures
are in the possible matches “Online”, “Hybrid”?’
- Use in
filter()
command - Check output!
::filter(
dplyr
smarvus_tib,%in% c("Online", "Hybrid")
in_person_lectures )
# A tibble: 1,459 × 34
unique_id country language university degree_major degree_year age gender
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 L2O9Z1 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
2 B5I6O0 Austria German Universit… Psychology 1st Year 18-21 Femal…
3 N8H9D1 England English Loughboro… Psychology 1st Year 18-21 Male/…
4 F2J7V4 England English Bournemou… Psychology 1st Year 18-21 Femal…
5 O3F8F8 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
6 V9K7F4 Estonia Estonian Universit… Psychology 1st Year 26+ Male/…
7 G9R8D5 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
8 Q9C7L5 England English Universit… Psychology 1st Year 18-21 Femal…
9 Q4E2V4 England English Universit… Psychology 1st Year 18-21 Male/…
10 C9Q9E3 Northern… English Queen's U… Psychology 1st Year 18-21 Femal…
# ℹ 1,449 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
# in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
# cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
# ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
# r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
# r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …
Keep only cases that were NOT missing a value for SPLD, and whose ATMS persistence scores were between 2 and 4.
- Check the Codebook.
- Variable name:
spld
- Possible values: “Yes”, “No”
- Variable name:
atms_per
- Possible values: numerical score between 1 and 5
- Variable name:
- Write the two statements using the variable name and values
!is.na(spld)
- “Which values in the
spld
variable are NOTNA
s?”
- “Which values in the
dplyr::between(atms_per, 2, 4)
- “Which values in the
atms_per
variable fall between 2 and 4?”
- “Which values in the
- Choose an operator to combine them
!is.na(spld) & dplyr::between(atms_per, 2, 4)
- “Which cases have values in the
spld
variable that are NOTNA
s AND values in theatms_per
variable between 2 and 4?”
- “Which cases have values in the
- Use in
filter()
command - Check output!
::filter(
dplyr
smarvus_tib,!is.na(spld) & dplyr::between(atms_per, 2, 4)
)
# A tibble: 2,065 × 34
unique_id country language university degree_major degree_year age gender
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 X8V0T6 Netherla… English Universit… Psychology 1st Year 18-21 Femal…
2 J3W3Y7 England English Universit… Psychology 1st Year 18-21 Femal…
3 S7C2L2 England English Universit… Psychology 1st Year 22-25 Femal…
4 Y4Z6A6 Scotland English Universit… Psychology 1st Year 26+ Femal…
5 L2O9Z1 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
6 N8H9D1 England English Loughboro… Psychology 1st Year 18-21 Male/…
7 F2J7V4 England English Bournemou… Psychology 1st Year 18-21 Femal…
8 O3F8F8 Australia English Macquarie… Psychology 1st Year 18-21 Femal…
9 V9K7F4 Estonia Estonian Universit… Psychology 1st Year 26+ Male/…
10 Q9C7L5 England English Universit… Psychology 1st Year 18-21 Femal…
# ℹ 2,055 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
# in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
# cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
# ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
# r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
# r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …