BEGIN ASSIGNMENT init_cell: false export_cell: true

# code packages
library("palmerpenguins")
library("tidyverse")

str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

HW1

In this assignment, we will build a custom function to compute sample statistics. Pay attention to the usage of the curly braces {{...}}, and we will use the functions on the palmerpenguins data set

  1. Write a custom function called summary_stats that takes 3 inputs

    • data_frame
    • grouping_variable
    • numerical_variable

and outputs the summarize command on the following sample statistics: minimum, mean, median, standard deviation, and maximum. Please follow the given stencil.

# BEGIN SOLUTION NO PROMPT
summary_stats <- function(data_frame, grouping_variable, numerical_variable){
  data_frame |>
    filter(!is.na({{grouping_variable}})) |>
    group_by({{grouping_variable}}) |>
    summarize(min = min({{numerical_variable}}, na.rm = TRUE),
            xbar = mean({{numerical_variable}}, na.rm = TRUE),
            med = median({{numerical_variable}}, na.rm = TRUE),
            s = sd({{numerical_variable}}, na.rm = TRUE),
            max = max({{numerical_variable}}, na.rm = TRUE))
}
# END SOLUTION
. = " # BEGIN PROMPT
summary_stats <- function(data_frame, grouping_variable, numerical_variable){
  data_frame |>
    filter(!is.na({{grouping_variable}})) |>
    group_by({{grouping_variable}}) |>
    summarize(min = min({{numerical_variable}}, na.rm = TRUE),
            xbar = _____,
            med = _____,
            s = _____,
            max = _____)
}
" # END PROMPT
  1. Use your summary_stats function with the penguins data frame, grouped by the species categorical variable, on the bill_length_mm numerical variable.
# BEGIN SOLUTION NO PROMPT
summary_stats(penguins, species, bill_length_mm)
# END SOLUTION
. = " # BEGIN PROMPT
summary_stats(penguins, _____, _____)
" # END PROMPT
A tibble: 3 × 6
speciesminxbarmedsmax
<fct><dbl><dbl><dbl><dbl><dbl>
Adelie 32.138.7913938.802.66340546.0
Chinstrap40.948.8338249.553.33925658.0
Gentoo 40.947.5048847.303.08185759.6
  1. Use your summary_stats function with the penguins data frame, grouped by the island categorical variable, on the body_mass_g numerical variable.
# BEGIN SOLUTION NO PROMPT
summary_stats(penguins, island, body_mass_g)
# END SOLUTION
. = " # BEGIN PROMPT

" # END PROMPT
A tibble: 3 × 6
islandminxbarmedsmax
<fct><int><dbl><dbl><dbl><int>
Biscoe 28504716.0184775.0782.85576300
Dream 27003712.9033687.5416.64414800
Torgersen29003706.3733700.0445.10794700
# If you want to check your code right now, uncomment the following line of code and run it
# testthat::expect_equal(summary_stats(penguins, island, body_mass_g)$s[1], 782.8557, tol = 0.01)

Submission

Once you are done with the tasks above,

  • Go to “File”
  • Click “Download as”
  • Download as “Notebook (.ipynb)

That will download a copy of this notebook onto your computer (probably into your Downloads folder). Please upload the .ipynb file back into our CatCourses site.