Skills Lab 01: Selecting variables, logical assertions and filtering

Author

JM

You can access the Skills Lab project for Week 1 on Posit Cloud.

Also check the Analysing Data Panopto page for recordings or the main Posit Cloud page for other materials.

Setup

Packages

library(tidyverse)

Data

Run this code chunk to read in the data into your Environment.

smarvus_tib <- readr::read_csv("data/smarvus_data.csv")

Codebook

Run this code chunk to open the Codebook in the Viewer tab.

ricomisc::rstudio_viewer("smarvus_codebook.html", "data")

Logical Assertions

Claims or statements that can be either TRUE or FALSE - they return logical data.

Single Assertions

Numbers

## Single equals is reserved for assignment (similar to <-)
2 = 2

## Use two equals for "exactly equals"
2 == 2

## Single symbols for less than/greater than
38 > 8
90 < -50

## Use two symbols for less/greater than or equal to
3068345098 <= 1

## Order of symbols matters!
3068345098 =< 1
Error: <text>:15:13: unexpected '<'
14: ## Order of symbols matters!
15: 3068345098 =<
                ^

Strings

## Exact text matching
"black" != "white"
[1] TRUE
## Case-sensitive!
"Hello!" == "hello!"
[1] FALSE

Vectorised Assertions

## Create and store a vector
scores <- 95:105
scores
 [1]  95  96  97  98  99 100 101 102 103 104 105
## Make an assertion about every element
scores >= 100
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Matching Multiple Values

## Matching to a vector of values with %in%
c("cat", "rat", "dragon", "dinosaur", "opossum", "honeybee", "dog") %in% c("dog", "dinosaur") 
[1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

Convenience function dplyr::between() to get all values between and including two values.

dplyr::between(scores, 98, 102)
 [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

Missing Values

Family of is.*() functions that check for particular types of data. (The * is a placeholder for lots of different things, like is.numeric(), is.logical(), etc.)

Missing values are NA (not available) in R. The function is.na() returns TRUE if a piece of data IS NA (that is, it IS missing) and FALSE for anything else.

## Single assertions
is.na("My will to live")
[1] FALSE
is.na(NA)
[1] TRUE
## Vectorised assertion
missing_things <- c("My faith in humanity", "My love for R", NA, "My obsession with hex stickers")
is.na(missing_things)
[1] FALSE FALSE  TRUE FALSE

Combining Assertions

  • & AND

  • | OR

## Combine with & (AND) - all must be TRUE to return TRUE
0 < 1 & "dragons" == "real"
[1] FALSE
## Combine with | (OR) - at least one must be TRUE to return TRUE
0 < 1 | "dragons" == "real"
[1] TRUE

Using filter()

View the Data

## Print out the dataset in the document
smarvus_tib
# A tibble: 2,776 × 34
   unique_id country   language university degree_major degree_year age   gender
   <chr>     <chr>     <chr>    <chr>      <chr>        <chr>       <chr> <chr> 
 1 X8V0T6    Netherla… English  Universit… Psychology   1st Year    18-21 Femal…
 2 J3W3Y7    England   English  Universit… Psychology   1st Year    18-21 Femal…
 3 S7C2L2    England   English  Universit… Psychology   1st Year    22-25 Femal…
 4 Y4Z6A6    Scotland  English  Universit… Psychology   1st Year    26+   Femal…
 5 L2O9Z1    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 6 B5I6O0    Austria   German   Universit… Psychology   1st Year    18-21 Femal…
 7 N8H9D1    England   English  Loughboro… Psychology   1st Year    18-21 Male/…
 8 F2J7V4    England   English  Bournemou… Psychology   1st Year    18-21 Femal…
 9 N9M3V8    Germany   German   Universit… Psychology   1st Year    18-21 Femal…
10 O3F8F8    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
# ℹ 2,766 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
#   in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
#   cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
#   ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
#   r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
#   r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …
## Print a vector of variable names
names(smarvus_tib)
 [1] "unique_id"            "country"              "language"            
 [4] "university"           "degree_major"         "degree_year"         
 [7] "age"                  "gender"               "spld"                
[10] "in_person_lectures"   "in_person_practicals" "atms_per"            
[13] "belief"               "bfne"                 "cas_cre"             
[16] "cas_non"              "crt"                  "ius_sf_inh"          
[19] "ius_sf_pro"           "lsas_sr_per"          "lsas_sr_soc"         
[22] "ngse"                 "r_mars_course"        "r_mars_num"          
[25] "r_mars_test"          "r_tas_bod"            "r_tas_ten"           
[28] "r_tas_tes"            "r_tas_worry"          "stars_ask"           
[31] "stars_int"            "stars_test"           "sticsa_cog"          
[34] "sticsa_som"          
## Open in new tab in RStudio
View(smarvus_tib)

Assertions about Variables

Use the name of the variable in the dataset to write the assertions.

Keep only cases whose language was NOT English.

  1. Check the Codebook.
    1. Variable name: language
    2. Possible values: English, German, Spanish, French, Bahasa Indonesia, Italian, Dutch, Turkish, Estonian, Hungarian, Romanian, Hebrew
  2. Write the statement using the variable name and values
    1. language != "English"
    2. ‘Which values in the variable language are NOT equal to the value “English”?’
  3. Use in filter() command
  4. Check output!
dplyr::filter(
  smarvus_tib,
  language != "English"
)
# A tibble: 1,265 × 34
   unique_id country   language university degree_major degree_year age   gender
   <chr>     <chr>     <chr>    <chr>      <chr>        <chr>       <chr> <chr> 
 1 B5I6O0    Austria   German   Universit… Psychology   1st Year    18-21 Femal…
 2 N9M3V8    Germany   German   Universit… Psychology   1st Year    18-21 Femal…
 3 V9K7F4    Estonia   Estonian Universit… Psychology   1st Year    26+   Male/…
 4 E2D7J2    Indonesia Bahasa … Padjadjar… Psychology   1st Year    18-21 Femal…
 5 P6C1F2    Estonia   Estonian Universit… Psychology   1st Year    <NA>  Femal…
 6 N8E5D3    Germany   German   Universit… Psychology   1st Year    18-21 Femal…
 7 Z0L8D0    Indonesia Bahasa … Bina Nusa… Psychology   1st Year    18-21 Femal…
 8 U4K5B9    Netherla… Dutch    Tilburg U… Psychology   1st Year    18-21 Femal…
 9 V2N2M5    Netherla… Dutch    Tilburg U… Psychology   1st Year    18-21 Femal…
10 V5C8I4    Indonesia Bahasa … Bina Nusa… Psychology   1st Year    18-21 Femal…
# ℹ 1,255 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
#   in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
#   cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
#   ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
#   r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
#   r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …

Keep only cases that had online or hybrid lectures.

  1. Check the Codebook.
    1. Variable name: in_person_lectures
    2. Possible values: “Online”, “In-person”, “Hybrid”, “Other (please specify)
  2. Write the statement using the variable name and values
    1. in_person_lectures %in% c("Online", "Hybrid")
    2. ‘Which values in the variable in_person_lectures are in the possible matches “Online”, “Hybrid”?’
  3. Use in filter() command
  4. Check output!
dplyr::filter(
  smarvus_tib,
  in_person_lectures %in% c("Online", "Hybrid")
)
# A tibble: 1,459 × 34
   unique_id country   language university degree_major degree_year age   gender
   <chr>     <chr>     <chr>    <chr>      <chr>        <chr>       <chr> <chr> 
 1 L2O9Z1    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 2 B5I6O0    Austria   German   Universit… Psychology   1st Year    18-21 Femal…
 3 N8H9D1    England   English  Loughboro… Psychology   1st Year    18-21 Male/…
 4 F2J7V4    England   English  Bournemou… Psychology   1st Year    18-21 Femal…
 5 O3F8F8    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 6 V9K7F4    Estonia   Estonian Universit… Psychology   1st Year    26+   Male/…
 7 G9R8D5    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 8 Q9C7L5    England   English  Universit… Psychology   1st Year    18-21 Femal…
 9 Q4E2V4    England   English  Universit… Psychology   1st Year    18-21 Male/…
10 C9Q9E3    Northern… English  Queen's U… Psychology   1st Year    18-21 Femal…
# ℹ 1,449 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
#   in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
#   cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
#   ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
#   r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
#   r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …

Keep only cases that were NOT missing a value for SPLD, and whose ATMS persistence scores were between 2 and 4.

  1. Check the Codebook.
    1. Variable name: spld
      1. Possible values: “Yes”, “No”
    2. Variable name: atms_per
      1. Possible values: numerical score between 1 and 5
  2. Write the two statements using the variable name and values
    1. !is.na(spld)
      1. “Which values in the spld variable are NOT NAs?”
    2. dplyr::between(atms_per, 2, 4)
      1. “Which values in the atms_per variable fall between 2 and 4?”
  3. Choose an operator to combine them
    1. !is.na(spld) & dplyr::between(atms_per, 2, 4)
      1. “Which cases have values in the spld variable that are NOT NAs AND values in the atms_per variable between 2 and 4?”
  4. Use in filter() command
  5. Check output!
dplyr::filter(
  smarvus_tib,
  !is.na(spld) & dplyr::between(atms_per, 2, 4)
)
# A tibble: 2,065 × 34
   unique_id country   language university degree_major degree_year age   gender
   <chr>     <chr>     <chr>    <chr>      <chr>        <chr>       <chr> <chr> 
 1 X8V0T6    Netherla… English  Universit… Psychology   1st Year    18-21 Femal…
 2 J3W3Y7    England   English  Universit… Psychology   1st Year    18-21 Femal…
 3 S7C2L2    England   English  Universit… Psychology   1st Year    22-25 Femal…
 4 Y4Z6A6    Scotland  English  Universit… Psychology   1st Year    26+   Femal…
 5 L2O9Z1    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 6 N8H9D1    England   English  Loughboro… Psychology   1st Year    18-21 Male/…
 7 F2J7V4    England   English  Bournemou… Psychology   1st Year    18-21 Femal…
 8 O3F8F8    Australia English  Macquarie… Psychology   1st Year    18-21 Femal…
 9 V9K7F4    Estonia   Estonian Universit… Psychology   1st Year    26+   Male/…
10 Q9C7L5    England   English  Universit… Psychology   1st Year    18-21 Femal…
# ℹ 2,055 more rows
# ℹ 26 more variables: spld <chr>, in_person_lectures <chr>,
#   in_person_practicals <chr>, atms_per <dbl>, belief <dbl>, bfne <dbl>,
#   cas_cre <dbl>, cas_non <dbl>, crt <dbl>, ius_sf_inh <dbl>,
#   ius_sf_pro <dbl>, lsas_sr_per <dbl>, lsas_sr_soc <dbl>, ngse <dbl>,
#   r_mars_course <dbl>, r_mars_num <dbl>, r_mars_test <dbl>, r_tas_bod <dbl>,
#   r_tas_ten <dbl>, r_tas_tes <dbl>, r_tas_worry <dbl>, stars_ask <dbl>, …

Kahoot! Time