tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
$ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
$ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
$ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
$ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
$ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
HW1
In this assignment, we will build a custom function to compute sample statistics. Pay attention to the usage of the curly braces {{...}}, and we will use the functions on the palmerpenguins data set
Write a custom function called summary_stats that takes 3 inputs
data_frame
grouping_variable
numerical_variable
and outputs the summarize command on the following sample statistics: minimum, mean, median, standard deviation, and maximum. Please follow the given stencil.
# BEGIN SOLUTION NO PROMPTsummary_stats <-function(data_frame, grouping_variable, numerical_variable){ data_frame |>filter(!is.na({{grouping_variable}})) |>group_by({{grouping_variable}}) |>summarize(min =min({{numerical_variable}}, na.rm =TRUE),xbar =mean({{numerical_variable}}, na.rm =TRUE),med =median({{numerical_variable}}, na.rm =TRUE),s =sd({{numerical_variable}}, na.rm =TRUE),max =max({{numerical_variable}}, na.rm =TRUE))}# END SOLUTION. =" # BEGIN PROMPTsummary_stats <- function(data_frame, grouping_variable, numerical_variable){ data_frame |> filter(!is.na({{grouping_variable}})) |> group_by({{grouping_variable}}) |> summarize(min = min({{numerical_variable}}, na.rm = TRUE), xbar = _____, med = _____, s = _____, max = _____)}"# END PROMPT
Use your summary_stats function with the penguins data frame, grouped by the species categorical variable, on the bill_length_mm numerical variable.
# BEGIN SOLUTION NO PROMPTsummary_stats(penguins, species, bill_length_mm)# END SOLUTION. =" # BEGIN PROMPTsummary_stats(penguins, _____, _____)"# END PROMPT
A tibble: 3 × 6
species
min
xbar
med
s
max
<fct>
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
Adelie
32.1
38.79139
38.80
2.663405
46.0
Chinstrap
40.9
48.83382
49.55
3.339256
58.0
Gentoo
40.9
47.50488
47.30
3.081857
59.6
Use your summary_stats function with the penguins data frame, grouped by the island categorical variable, on the body_mass_g numerical variable.
# BEGIN SOLUTION NO PROMPTsummary_stats(penguins, island, body_mass_g)# END SOLUTION. =" # BEGIN PROMPT"# END PROMPT
A tibble: 3 × 6
island
min
xbar
med
s
max
<fct>
<int>
<dbl>
<dbl>
<dbl>
<int>
Biscoe
2850
4716.018
4775.0
782.8557
6300
Dream
2700
3712.903
3687.5
416.6441
4800
Torgersen
2900
3706.373
3700.0
445.1079
4700
# If you want to check your code right now, uncomment the following line of code and run it# testthat::expect_equal(summary_stats(penguins, island, body_mass_g)$s[1], 782.8557, tol = 0.01)
Submission
Once you are done with the tasks above,
Go to “File”
Click “Download as”
Download as “Notebook (.ipynb)
That will download a copy of this notebook onto your computer (probably into your Downloads folder). Please upload the .ipynb file back into our CatCourses site.