<- c(2.6822919, 1.8485851, 1.0039014, 1.9612068, 0.5475432) my_numbers
Skills Lab 03: Logic of piping |>
Functions and arguments
Task 1
- Run the chunk below to create and object called
my_numbers
- Look up the help documentation for the
round()
function - Round
my_numbers
to 2 decimal places.
Look up documentation:
# run this in the console:
help(round)
Round to 2 decimal places
round(x = my_numbers, digits = 2)
[1] 2.68 1.85 1.00 1.96 0.55
Using the pipe
Think of piped code like a production line in a factory, where each function is a step in the process and the pipe is the conveyor belt that moves the product along from step to step.
Task 2
- Round
my numbers
to 2 decimal places by pipingmy_numbers
into theround()
function. - Round
my numbers
to 2 decimal places by piping number2
into theround()
function.
|> round(????) my_numbers
2 |> round(????)
|> round(x = _, digits = 2) my_numbers
2 |> round(x = my_numbers, digits = _)
[1] 2.68 1.85 1.00 1.96 0.55
Task 3
What will the following code produce? Why? Have a guess BEFORE you run it.
|> round() my_numbers
|> round() my_numbers
[1] 3 2 1 2 1
It may seem strange that this command works no problem. After all, running round()
empty doesn’t:
round()
Error in eval(expr, envir, enclos): 0 arguments passed to 'round' which requires 1 or 2 arguments
Look again at the help documentation for round()
’s two arguments:
x
digits = 0
Notice that the second, digits
, is already set to be equal to something, namely 0
. This means that 0 is the default value for this argument. In other words, if we don’t explicitly change this argument in our code, R will use the default value. If we’re happy for round()
to use that default, we only need to supply x
- which has no default.
So, in the code my_numbers |> round()
, what is happening is:
x
is unnamed, so the objectmy_numbers
is automatically piped into thex
argument.digits
is not changed, so R uses the default value, which is 0.
So, what we get is the object my_numbers
rounded to 0 decimal places, exactly as if we had written round(my_numbers)
.
ExtRa Task!
Write a single command that does all of the following operations, without using the pipe:
- Find the square root of
my_numbers
(Hint: try thesqrt()
function) - Then take the mean of those square roots
- Then round that mean to three decimal places
What are the benefits and drawbacks to this style? Which do you prefer?
round(mean(sqrt(my_numbers)), digits = 3)
[1] 1.228
This “nested” style of functions inside functions (i.e. round(mean(sqrt()))
) is very common in coding. It’s also “base R” style. “Base R” just means the default way R works, without any extra bits or fancy packages, like {tidyverse}.
However, as commands become more complex, this nested style becomes - at least for me! - harder and harder to read. In particular, keeping track of the right number and location of brackets and arguments gets complicated quickly. Notice that mean(sqrt(my_numbers))
are all inside the x
argument of round()
, which is why there’s comma directly after. It’s very easy to lose track of which bits go where, or belong to which functions.
If you want an additional challenge, try converting this nested command into a single pipe command instead. You’ll know it’s right when you get the same result as above. Let us know if you want any help!
Connecting functions with a pipe
Task 4
- Run the code below to read in
spotify_data.
- Without using the pipe, change the dataset so that:
- It only contains rows with tracks in the F Major key
- It only contains columns with track name, artists name, and year of release
- Save the amended dataset into an object called
spotify_edited
using the assignment operator
<- readr::read_csv("data/tutorial_03_data.csv") spotify_data
<- dplyr::filter(spotify_data, key == "F" & mode == "Major")
spotify_edited <- dplyr::select(spotify_edited, track_name, `artist(s)_name`, released_year) spotify_edited
Task 5
Change the code from the previous task into a pipeline.
<- spotify_data |>
spotify_edited ::filter(.data = _, key == "F" & mode == "Major") |>
dplyr::select(.data = _, track_name, `artist(s)_name`, released_year) dplyr
and this will also work:
<- spotify_data |>
spotify_edited ::filter(key == "F" & mode == "Major") |>
dplyr::select(track_name, `artist(s)_name`, released_year) dplyr
Task 6
Using the pipe, complete the following tasks.
- Amend
spotify_data
so that:- It only contains columns with track name, release year, and number of spotify charts
- Create a new column in
spotify_data
calledscore
, computed by diving the number of Spotify charts by the number of playlists. - Save this new version of the dataset into an object called
spotify_new
- It only contains columns with track name, release year, and number of spotify charts
If you try to follow the instructions step by step, you’ll get an error:
<- spotify_data |>
spotify_new ::select(track_name, released_year, in_spotify_charts) |>
dplyr::mutate(
dplyrscore = in_spotify_charts/in_spotify_playlists
)
Error in `dplyr::mutate()`:
ℹ In argument: `score = in_spotify_charts/in_spotify_playlists`.
Caused by error:
! object 'in_spotify_playlists' not found
This is because in Step 1, we did not select in_spotify_playlists
. This means that we cannot use it in Step 2.
If we wanted to complete this task, we would need to either create the new variable first OR select in_spotify_playlists
before using mutate()
<- spotify_data |>
spotify_new ::select(track_name, released_year, in_spotify_charts, in_spotify_playlists) |>
dplyr::mutate(
dplyrscore = in_spotify_charts/in_spotify_playlists
)
ExtRa SupeR Bonus Challenge Task
This one has no solution. Give it your best shot and let us know if you want help.
Starting with spotify_data
, write a single pipe that does the following:
- Take
spotify_data
, and then - Create a new variable called
playlist_per_artist
computed as the number of playlists a song is in divided by the number of artists - Keep only the track name, playlists per artist, key, and mode
- Create a new variable called
song_type
with the following characteristics:- Tracks that are in a major key and have more than 5000 playlists per artist are “bright”
- Tracks that have a key that is in the first three letters of the alphabet (no sharps or flats) are “first”
- Everything else is “boring”
- Keep only tracks that aren’t “boring”
- Arrange the dataset alphabetically by track name in descending order.
What is the very first track name in the final version of the dataset?
Hint: We haven’t covered how to arrange rows in a dataset. You can do this manually in View mode by clicking on the name of the variable to reorder it. It will not be useful or relevant for any assessment, but check out the dplyr::arrange()
function if you want to do this with code.