diffdfs

Compute the Difference Between Data Frames


License
MIT

Documentation

diffdfs

A small R package to compute the difference between data frames.

Install

It's not on CRAN, so install via devtools.

devtools::install_github("riazarbi/diffdfs")

Use

This package just has two functions, checkkey and diffdfs.

checkkey is just a helper for diffdfs but you can use it if it suits your purposes.

here are some examples you can run in your R session:

library(diffdfs)
iris$key <- 1:nrow(iris)

old_df <- iris[1:100,]
old_df[75,1] <- 100
new_df <- iris[50:150,]
> diffdfs(new_df, old_df, key_cols = "key")
    operation Sepal.Length Sepal.Width Petal.Length Petal.Width    Species key
1         new          6.3         3.3          6.0         2.5  virginica 101
2         new          5.8         2.7          5.1         1.9  virginica 102
3         new          7.1         3.0          5.9         2.1  virginica 103
4         new          6.3         2.9          5.6         1.8  virginica 104
5         new          6.5         3.0          5.8         2.2  virginica 105
6         new          7.6         3.0          6.6         2.1  virginica 106
...
...
irisint = iris
irisint$rownum = 1:nrow(irisint)
key_cols = c("rownum")
> checkkey(irisint, key_cols, TRUE)
Checking that key column rows are unique
[1] TRUE
> checkkey(irisint, "Species", TRUE)
Checking that key column rows are unique
[1] FALSE

More detail

If you'd like to see more detail on the rationale behind this package, and a toy implementation of a diffdfs driven data versioning strategy, read my blog post on the subject at here.

Contributing

Riaz Arbi is the maintainer of this package. If you'd like to point out a bug or make a suggestion, create an issue in this repo.