tidymodels/broom


Convert statistical analysis objects from R into tidy format

https://broom.tidyverse.org

License: Other

Language: R

Keywords: modeling, r, tidy-data


broom

CRAN status Travis-CI Build Status AppVeyor build status Coverage Status

Overview

broom summarizes key information about models in tidy tibble()s. broom provides three verbs to make it convenient to interact with model objects:

  • tidy() summarizes information about model components
  • glance() reports information about the entire model
  • augment() adds informations about observations to a dataset

For a detailed introduction, please see vignette("broom").

broom tidies 100+ models from popular modelling packages and almost all of the model objects in the stats package that comes with base R. vignette("available-methods") lists method availabilty.

If you aren’t familiar with tidy data structures and want to know how they can make your life easier, we highly recommend reading Hadley Wickham’s Tidy Data.

Installation

# we recommend installing the entire tidyverse modeling set, which includes broom:
install.packages("tidymodels")

# alternatively, to install just broom:
install.packages("broom")

# to get the development version from GitHub:
install.packages("devtools")
devtools::install_github("tidymodels/broom")

If you find a bug, please file a minimal reproducible example in the issues.

Usage

tidy() produces a tibble() where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.

library(broom)

fit <- lm(Sepal.Width ~ Petal.Length + Petal.Width, iris)
tidy(fit)
#> # A tibble: 3 x 5
#>   term         estimate std.error statistic  p.value
#>   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     3.59     0.0937     38.3  2.51e-78
#> 2 Petal.Length   -0.257    0.0669     -3.84 1.80e- 4
#> 3 Petal.Width     0.364    0.155       2.35 2.01e- 2

glance() returns a tibble with exactly one row of goodness of fitness measures and related statistics. This is useful to check for model misspecification and to compare many models.

glance(fit)
#> # A tibble: 1 x 11
#>   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
#> *     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <int>  <dbl> <dbl> <dbl>
#> 1     0.213         0.202 0.389      19.9 2.24e-8     3  -69.8  148.  160.
#> # ... with 2 more variables: deviance <dbl>, df.residual <int>

augment adds columns to a dataset, containing information such as fitted values, residuals or cluster assignments. All columns added to a dataset have . prefix to prevent existing columns from being overwritten.

augment(fit, data = iris)
#> # A tibble: 150 x 12
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species .fitted
#>  *        <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa     3.30
#>  2          4.9         3            1.4         0.2 setosa     3.30
#>  3          4.7         3.2          1.3         0.2 setosa     3.33
#>  4          4.6         3.1          1.5         0.2 setosa     3.27
#>  5          5           3.6          1.4         0.2 setosa     3.30
#>  6          5.4         3.9          1.7         0.4 setosa     3.30
#>  7          4.6         3.4          1.4         0.3 setosa     3.34
#>  8          5           3.4          1.5         0.2 setosa     3.27
#>  9          4.4         2.9          1.4         0.2 setosa     3.30
#> 10          4.9         3.1          1.5         0.1 setosa     3.24
#> # ... with 140 more rows, and 6 more variables: .se.fit <dbl>,
#> #   .resid <dbl>, .hat <dbl>, .sigma <dbl>, .cooksd <dbl>,
#> #   .std.resid <dbl>

Contributing

We welcome contributions of all types!

If you have never made a pull request to an R package before, broom is an excellent place to start. Find an issue with the Beginner Friendly tag and comment that you’d like to take it on and we’ll help you get started.

We encourage typo corrections, bug reports, bug fixes and feature requests. Feedback on the clarity of the documentation is especially valuable.

If you are interested in adding new tidiers methods to broom, please read vignette("adding-tidiers").

We have a Contributor Code of Conduct. By participating in broom you agree to abide by its terms.

Project Statistics

Sourcerank 16
Repository Size 18.2 MB
Stars 842
Forks 232
Watchers 60
Open issues 147
Dependencies 89
Contributors 99
Tags 11
Created
Last updated
Last pushed

Top Contributors See all

alex hayes David Robinson Derek Chiu Matthew Lincoln David Hugh-Jones Jonah Gabry Ben Bolker Matthieu Benjamin Lily Medina Michael Kuehn Nic Joseph Indrajeet Patil Cory Brunson Michal Bojanowski François Briatte Håkon Malmedal Jay Hesselberth cwang23

Packages Referencing this Repo

broom
Convert Statistical Analysis Objects into Tidy Tibbles
Latest release 0.5.2 - Updated - 842 stars

Recent Tags See all

0.5.1 December 05, 2018
v0.5.0 July 17, 2018
v0.4.0 November 30, 2015
v0.3.7 May 06, 2015
v0.3.6 February 18, 2015
v0.3.5 January 05, 2015
v0.3.4 November 24, 2014
0.3.4 November 22, 2014
v0.3 October 19, 2014
v0.2 September 16, 2014
v0.1 September 11, 2014

Something wrong with this page? Make a suggestion

Last synced: 2019-04-07 20:00:39 UTC

Login to resync this repository