data.table provides a high-performance version of base R's
data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.
- concise syntax: fast to type, fast to read
- fast speed
- memory efficient
- careful API lifecycle management
- feature rich
- fast and friendly delimited file reader:
?fread, see also convenience features for small data
- fast and feature rich delimited file writer:
- low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
- fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
- fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to
IRanges::findOverlaps), non-equi joins (i.e. joins using operators
>, >=, <, <=), aggregate on join (
by=.EACHI), update on join
- fast add/update/delete columns by reference by group using no copies at all
- fast and feature rich reshaping data:
any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type
- has no dependencies at all other than base R itself, for simpler production/maintenance
- the R dependency is as old as possible for as long as possible and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0
or update only if newer revision is available
See the Installation wiki for more details.
[ operator the same way you would use
data.frame one, but...
- no need to prefix columns with
- any valid expression is allowed in
- extra argument
library(data.table) DT = as.data.table(iris) # FROM[WHERE, SELECT, GROUP BY] # DT [i, j, by] DT[Petal.Width > 1.0, mean(Petal.Length), by = Species] # Species V1 #1: versicolor 4.362791 #2: virginica 5.552000
data.table is widely used by the R community. As of July 2019, it was used by over 680 CRAN and Bioconductor packages and was the 9th most starred R package on GitHub. If you need help, the
data.table community is active StackOverflow, with nearly 9,000 questions.
- click the Watch button at the top and right of GitHub project page
- read NEWS file
- follow #rdatatable on twitter
- watch recent Presentations
- read recent Articles
Guidelines for filing issues / pull requests: Contribution Guidelines.