R dplyr summarize percent11/15/2023 ![]() EDIT: The Puromycin data is in the base R installation My data look like this: library (plyr) data.p <- as.ame (Puromycin ,3) names (data. x, na.rm = TRUE ) ) ) #> # A tibble: 10 × 4 #> homeworld height mass birth_year #> #> 1 Alderaan 176. 1 I'm trying to use summarise () from the plyr-packge to calculate percentages of occurences of each level in a factor. When there are multiple functions, they create new # variables instead of modifying the variables in place: by_species %>% summarise_all ( list ( min, max ) ) #> # A tibble: 3 × 9 #> Species Sepal.Length_fn1 Sepal.Width_fn1 Petal.Length_fn1 #> #> 1 setosa 4.3 2.3 1 #> 2 versicolor 4.9 2 3 #> 3 virginica 4.9 2.2 4.5 #> # ℹ 5 more variables: Petal.Width_fn1, Sepal.Length_fn2, #> # Sepal.Width_fn2, Petal.Length_fn2, Petal.Width_fn2 # -> by_species %>% summarise ( across ( everything ( ), list (min = min, max = max ) ) ) #> # A tibble: 3 × 9 #> Species Sepal.Length_min Sepal.Length_max Sepal.Width_min #> #> 1 setosa 4.3 5.8 2.3 #> 2 versicolor 4.9 7 2 #> 3 virginica 4.9 7.9 2.2 #> # ℹ 5 more variables: Sepal.Width_max, Petal.Length_min, #> # Petal.Length_max, Petal.Width_min, Petal.Starwars %>% summarise ( across ( where ( is.character ), n_distinct ) ) #> # A tibble: 1 × 8 #> name hair_color skin_color eye_color sex gender homeworld species #> #> 1 87 13 31 15 5 3 49 38 starwars %>% group_by ( species ) %>% filter ( n ( ) > 1 ) %>% summarise ( across ( c ( sex, gender, homeworld ), n_distinct ) ) #> # A tibble: 9 × 4 #> species sex gender homeworld #> #> 1 Droid 1 2 3 #> 2 Gungan 1 1 1 #> 3 Human 2 2 16 #> 4 Kaminoan 2 2 1 #> # ℹ 5 more rows starwars %>% group_by ( homeworld ) %>% filter ( n ( ) > 1 ) %>% summarise ( across ( where ( is.numeric ), ~ mean (. 97.3 87.6 by_species % group_by ( Species ) # If you want to apply multiple transformations, pass a list of # functions. x, na.rm = TRUE ) ) ) #> # A tibble: 1 × 3 #> height mass birth_year #> #> 1 174. 97.3 87.6 starwars %>% summarise ( across ( where ( is.numeric ), ~ mean (. Here we apply mean() to the numeric columns: starwars %>% summarise_if ( is.numeric, mean, na.rm = TRUE ) #> # A tibble: 1 × 3 #> height mass birth_year #> #> 1 174. 97.3 # The _if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. 97.3 # -> starwars %>% summarise ( across ( height : mass, ~ mean (. ![]() 97.3 # You can also supply selection helpers to _at() functions but you have # to quote them with vars(): starwars %>% summarise_at ( vars ( height : mass ), mean, na.rm = TRUE ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. NAs conf.interval: the percent range of the confidence interval (default is. x, na.rm = TRUE ) ) ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. You want to do summarize your data (with mean, standard deviation, etc.). 97.3 # -> starwars %>% summarise ( across ( c ( "height", "mass" ), ~ mean (. count () is paired with tally (), a lower-level helper that is equivalent to df > summarise (n n ()). In tidy data: B C & B C pipes Each variable is in its own column Each observation, or case, is in its own rowx > f(y) becomes f(x, y) Summarize Cases Apply summary functions to columns to create a new table of summary statistics. count () lets you quickly count the unique values of one or more variables: df > count (a, b) is roughly equivalent to df > groupby (a, b) > summarise (n n ()). # The _at() variants directly support strings: starwars %>% summarise_at ( c ( "height", "mass" ), mean, na.rm = TRUE ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. dplyr functions work with pipes and expect tidy data. Name collisions in the new columns are disambiguated using a unique suffix. vars is named, a new column by that name will be created. Similarly, vars() accepts named and unnamed arguments. If a function is unnamed and the name cannot be derived automatically, funs argument can be a named or unnamed list. The names of the functions are used to name the new columns Ĭoncatenating the names of the input variables and the names of theįunctions, separated with an underscore "_". vars is of the form vars(a_single_column)) and. The scoped variants of summarise () make it easy to apply the same transformation to multiple variables. ![]() The names of the input variables are used to name the new columns įor _at functions, if there is only one unnamed variable (i.e., Scoped verbs ( if, at, all) have been superseded by the use of pick () or across () in an existing verb. ![]() If there is only one unnamed function (i.e. ![]() Input variables and the names of the functions. The names of the new columns are derived from the names of the ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |