groupr

Groups with Inapplicable Values


License
MIT

Documentation

groupr

CRAN status R-CMD-check Codecov test coverage

The groupr package is designed to work with tibbles and dplyr. It provides replacements for tidyverse grouping and pivoting operations, and uses richer data structures to make these operations easier to think about.

Motivation

There are two main ideas behind this package. First is the idea of inapplicable data. While we often use NA as a placeholder for unknown but important information, R doesn’t provide a way to mark data that should definitely be ignored. groupr provides an inapplicable value (printed <I>).

The second main idea is that pivoting is just a way to rearrange groups of data. Some kinds of pivots cannot be expressed by a single tidyr pivot statement and require two or even three consecutive pivot calls. Inapplicable groups can be used to describe some of these more complex operations in a very straightforward way.

Usage

Install

devtools::install_github("ngriffiths21/groupr")

Easier pivots using groups

library(groupr)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

Make columns out of row groups:

p_df2
#> # A tibble: 5 × 3
#>   grp1   grp2   val
#>   <chr> <dbl> <dbl>
#> 1 A         1   1.9
#> 2 A         2  10.1
#> 3 B         2   3.1
#> 4 B         1   4.7
#> 5 C        NA   4.9

# group and make the NA an inapplicable grouping
p_df2 <- group_by2(p_df2, grp1, grp2 = NA)

group_data(p_df2)
#> # A tibble: 5 × 3
#>         grp1       grp2       .rows
#>   <polymiss> <polymiss> <list<int>>
#> 1          A          1         [1]
#> 2          A          2         [1]
#> 3          B          1         [1]
#> 4          B          2         [1]
#> 5          C        <I>         [1]

# groups version of pivot
pivot_grps(p_df2, cols = "grp1")
#> # A tibble:    2 × 2
#> # Row indices: grp2 [2]
#> # Col index:   grp1
#>    grp2 val$A    $B    $C
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1   1.9   4.7   4.9
#> 2     2  10.1   3.1   4.9

# tidyr version
pivot_wider(p_df2, names_from = grp1, values_from = val)
#> # A tibble: 3 × 4
#>    grp2     A     B     C
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1   1.9   4.7  NA  
#> 2     2  10.1   3.1  NA  
#> 3    NA  NA    NA     4.9

Note that with this inapplicable grouping, the value from the “C” group is applied to both subgroups. This behavior is not possible using tidyr.

Make row groups out of columns (pivot longer):

p_df <- group_by2(iris, Species)

# groups version: make column grouping, then pivot
colgrouped <- sep_colgrp(p_df, ".", index_name = "Measurement")
colgrouped
#> # A tibble:    150 × 3
#> # Row indices: Species [3]
#> # Col index:   Measurement
#>    Species Sepal$Length $Width Petal$Length $Width
#>    <fct>          <dbl>  <dbl>        <dbl>  <dbl>
#>  1 setosa           5.1    3.5          1.4    0.2
#>  2 setosa           4.9    3            1.4    0.2
#>  3 setosa           4.7    3.2          1.3    0.2
#>  4 setosa           4.6    3.1          1.5    0.2
#>  5 setosa           5      3.6          1.4    0.2
#>  6 setosa           5.4    3.9          1.7    0.4
#>  7 setosa           4.6    3.4          1.4    0.3
#>  8 setosa           5      3.4          1.5    0.2
#>  9 setosa           4.4    2.9          1.4    0.2
#> 10 setosa           4.9    3.1          1.5    0.1
#> # … with 140 more rows
pivot_grps(colgrouped, rows = "Measurement")
#> # A tibble:    300 × 4
#> # Row indices: Species, Measurement [6]
#>    Species Measurement Sepal Petal
#>    <fct>   <chr>       <dbl> <dbl>
#>  1 setosa  Length        5.1   1.4
#>  2 setosa  Length        4.9   1.4
#>  3 setosa  Length        4.7   1.3
#>  4 setosa  Length        4.6   1.5
#>  5 setosa  Length        5     1.4
#>  6 setosa  Length        5.4   1.7
#>  7 setosa  Length        4.6   1.4
#>  8 setosa  Length        5     1.5
#>  9 setosa  Length        4.4   1.4
#> 10 setosa  Length        4.9   1.5
#> # … with 290 more rows

# tidyr version
pivot_longer(iris, cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
             values_to = "value")
#> # A tibble: 600 × 3
#>    Species name         value
#>    <fct>   <chr>        <dbl>
#>  1 setosa  Sepal.Length   5.1
#>  2 setosa  Sepal.Width    3.5
#>  3 setosa  Petal.Length   1.4
#>  4 setosa  Petal.Width    0.2
#>  5 setosa  Sepal.Length   4.9
#>  6 setosa  Sepal.Width    3  
#>  7 setosa  Petal.Length   1.4
#>  8 setosa  Petal.Width    0.2
#>  9 setosa  Sepal.Length   4.7
#> 10 setosa  Sepal.Width    3.2
#> # … with 590 more rows

Using this approach we can preserve separate columns for each flower part rather than combining them into one.

Pivot both rows and columns:

p_df3 <- pivot_grps(p_df2, cols = "grp1")

p_df3
#> # A tibble:    2 × 2
#> # Row indices: grp2 [2]
#> # Col index:   grp1
#>    grp2 val$A    $B    $C
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1   1.9   4.7   4.9
#> 2     2  10.1   3.1   4.9

# groups version
pivot_grps(p_df3, rows = "grp1",
           cols = "grp2")
#> # A tibble:    3 × 2
#> # Row indices: grp1 [3]
#> # Col index:   grp2
#>   grp1  val$`1`  $`2`
#>   <chr>   <dbl> <dbl>
#> 1 A         1.9  10.1
#> 2 B         4.7   3.1
#> 3 C         4.9   4.9

Lifecycle

At this point this is experimental, with limited testing. The API is likely to change.

License

MIT