BEGIN ASSIGNMENT init_cell: false export_cell: true

# code packages
library("palmerpenguins")
library("tidyverse")

str(penguins)

tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

HW1

In this assignment, we will build a custom function to compute sample statistics. Pay attention to the usage of the curly braces {{...}}, and we will use the functions on the palmerpenguins data set

Write a custom function called summary_stats that takes 3 inputs
- data_frame
- grouping_variable
- numerical_variable

and outputs the summarize command on the following sample statistics: minimum, mean, median, standard deviation, and maximum. Please follow the given stencil.

# BEGIN SOLUTION NO PROMPT
summary_stats <- function(data_frame, grouping_variable, numerical_variable){
  data_frame |>
    filter(!is.na({{grouping_variable}})) |>
    group_by({{grouping_variable}}) |>
    summarize(min = min({{numerical_variable}}, na.rm = TRUE),
            xbar = mean({{numerical_variable}}, na.rm = TRUE),
            med = median({{numerical_variable}}, na.rm = TRUE),
            s = sd({{numerical_variable}}, na.rm = TRUE),
            max = max({{numerical_variable}}, na.rm = TRUE))
}
# END SOLUTION
. = " # BEGIN PROMPT
summary_stats <- function(data_frame, grouping_variable, numerical_variable){
  data_frame |>
    filter(!is.na({{grouping_variable}})) |>
    group_by({{grouping_variable}}) |>
    summarize(min = min({{numerical_variable}}, na.rm = TRUE),
            xbar = _____,
            med = _____,
            s = _____,
            max = _____)
}
" # END PROMPT

Use your summary_stats function with the penguins data frame, grouped by the species categorical variable, on the bill_length_mm numerical variable.

# BEGIN SOLUTION NO PROMPT
summary_stats(penguins, species, bill_length_mm)
# END SOLUTION
. = " # BEGIN PROMPT
summary_stats(penguins, _____, _____)
" # END PROMPT

A tibble: 3 × 6
species	min	xbar	med	s	max
<fct>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
Adelie	32.1	38.79139	38.80	2.663405	46.0
Chinstrap	40.9	48.83382	49.55	3.339256	58.0
Gentoo	40.9	47.50488	47.30	3.081857	59.6

Use your summary_stats function with the penguins data frame, grouped by the island categorical variable, on the body_mass_g numerical variable.

# BEGIN SOLUTION NO PROMPT
summary_stats(penguins, island, body_mass_g)
# END SOLUTION
. = " # BEGIN PROMPT

" # END PROMPT

A tibble: 3 × 6
island	min	xbar	med	s	max
<fct>	<int>	<dbl>	<dbl>	<dbl>	<int>
Biscoe	2850	4716.018	4775.0	782.8557	6300
Dream	2700	3712.903	3687.5	416.6441	4800
Torgersen	2900	3706.373	3700.0	445.1079	4700

# If you want to check your code right now, uncomment the following line of code and run it
# testthat::expect_equal(summary_stats(penguins, island, body_mass_g)$s[1], 782.8557, tol = 0.01)

Submission

Once you are done with the tasks above,

Go to “File”
Click “Download as”
Download as “Notebook (.ipynb)

That will download a copy of this notebook onto your computer (probably into your Downloads folder). Please upload the .ipynb file back into our CatCourses site.