Skills Lab 03: Logic of piping |>

Author

MS

Functions and arguments

Task 1

  1. Run the chunk below to create and object called my_numbers
  2. Look up the help documentation for the round() function
  3. Round my_numbers to 2 decimal places.
my_numbers <- c(2.6822919, 1.8485851, 1.0039014, 1.9612068, 0.5475432)

Look up documentation:

# run this in the console: 
help(round)

Round to 2 decimal places

round(x = my_numbers, digits = 2)
[1] 2.68 1.85 1.00 1.96 0.55

Using the pipe

Think of piped code like a production line in a factory, where each function is a step in the process and the pipe is the conveyor belt that moves the product along from step to step.

Task 2

  1. Round my numbers to 2 decimal places by piping my_numbers into the round() function.
  2. Round my numbers to 2 decimal places by piping number 2 into the round() function.
my_numbers |> round(????)
2 |> round(????)
my_numbers |> round(x = _, digits = 2)
2 |> round(x = my_numbers, digits = _)
[1] 2.68 1.85 1.00 1.96 0.55

Task 3

What will the following code produce? Why? Have a guess BEFORE you run it.

my_numbers |> round()
my_numbers |> round()
[1] 3 2 1 2 1

It may seem strange that this command works no problem. After all, running round() empty doesn’t:

round()
Error in eval(expr, envir, enclos): 0 arguments passed to 'round' which requires 1 or 2 arguments

Look again at the help documentation for round()’s two arguments:

  • x
  • digits = 0

Notice that the second, digits, is already set to be equal to something, namely 0. This means that 0 is the default value for this argument. In other words, if we don’t explicitly change this argument in our code, R will use the default value. If we’re happy for round() to use that default, we only need to supply x - which has no default.

So, in the code my_numbers |> round(), what is happening is:

  • x is unnamed, so the object my_numbers is automatically piped into the x argument.
  • digits is not changed, so R uses the default value, which is 0.

So, what we get is the object my_numbers rounded to 0 decimal places, exactly as if we had written round(my_numbers).

ExtRa Task!

Write a single command that does all of the following operations, without using the pipe:

  • Find the square root of my_numbers (Hint: try the sqrt() function)
  • Then take the mean of those square roots
  • Then round that mean to three decimal places

What are the benefits and drawbacks to this style? Which do you prefer?

round(mean(sqrt(my_numbers)), digits = 3)
[1] 1.228

This “nested” style of functions inside functions (i.e. round(mean(sqrt()))) is very common in coding. It’s also “base R” style. “Base R” just means the default way R works, without any extra bits or fancy packages, like {tidyverse}.

However, as commands become more complex, this nested style becomes - at least for me! - harder and harder to read. In particular, keeping track of the right number and location of brackets and arguments gets complicated quickly. Notice that mean(sqrt(my_numbers)) are all inside the x argument of round(), which is why there’s comma directly after. It’s very easy to lose track of which bits go where, or belong to which functions.

If you want an additional challenge, try converting this nested command into a single pipe command instead. You’ll know it’s right when you get the same result as above. Let us know if you want any help!

Connecting functions with a pipe

Task 4

  1. Run the code below to read in spotify_data.
  2. Without using the pipe, change the dataset so that:
    • It only contains rows with tracks in the F Major key
    • It only contains columns with track name, artists name, and year of release
    • Save the amended dataset into an object called spotify_edited using the assignment operator
spotify_data <- readr::read_csv("data/tutorial_03_data.csv")
spotify_edited <- dplyr::filter(spotify_data, key == "F" & mode == "Major")
spotify_edited <- dplyr::select(spotify_edited, track_name, `artist(s)_name`, released_year)

Task 5

Change the code from the previous task into a pipeline.

spotify_edited <- spotify_data |> 
  dplyr::filter(.data = _, key == "F" & mode == "Major") |> 
  dplyr::select(.data = _, track_name, `artist(s)_name`, released_year)

and this will also work:

spotify_edited <- spotify_data |> 
  dplyr::filter(key == "F" & mode == "Major") |> 
  dplyr::select(track_name, `artist(s)_name`, released_year)

Task 6

Using the pipe, complete the following tasks.

  1. Amend spotify_data so that:
    • It only contains columns with track name, release year, and number of spotify charts
    • Create a new column in spotify_data called score, computed by diving the number of Spotify charts by the number of playlists.
    • Save this new version of the dataset into an object called spotify_new

If you try to follow the instructions step by step, you’ll get an error:

spotify_new <- spotify_data |> 
  dplyr::select(track_name, released_year, in_spotify_charts) |> 
  dplyr::mutate(
    score = in_spotify_charts/in_spotify_playlists
  )
Error in `dplyr::mutate()`:
ℹ In argument: `score = in_spotify_charts/in_spotify_playlists`.
Caused by error:
! object 'in_spotify_playlists' not found

This is because in Step 1, we did not select in_spotify_playlists. This means that we cannot use it in Step 2.

If we wanted to complete this task, we would need to either create the new variable first OR select in_spotify_playlists before using mutate()

spotify_new <- spotify_data |> 
  dplyr::select(track_name, released_year, in_spotify_charts, in_spotify_playlists) |> 
  dplyr::mutate(
    score = in_spotify_charts/in_spotify_playlists
  )

ExtRa SupeR Bonus Challenge Task

This one has no solution. Give it your best shot and let us know if you want help.

Starting with spotify_data, write a single pipe that does the following:

  • Take spotify_data, and then
  • Create a new variable called playlist_per_artist computed as the number of playlists a song is in divided by the number of artists
  • Keep only the track name, playlists per artist, key, and mode
  • Create a new variable called song_type with the following characteristics:
    • Tracks that are in a major key and have more than 5000 playlists per artist are “bright”
    • Tracks that have a key that is in the first three letters of the alphabet (no sharps or flats) are “first”
    • Everything else is “boring”
  • Keep only tracks that aren’t “boring”
  • Arrange the dataset alphabetically by track name in descending order.

What is the very first track name in the final version of the dataset?

Hint: We haven’t covered how to arrange rows in a dataset. You can do this manually in View mode by clicking on the name of the variable to reorder it. It will not be useful or relevant for any assessment, but check out the dplyr::arrange() function if you want to do this with code.