quanteda on CRAN

About

quanteda* is an R package for managing and analyzing text, created and maintained by Kenneth Benoit and Kohei Watanabe. Its creation was funded by the European Research Council grant ERC-2011-StG 283794-QUANTESS and its continued development is supported by the Quanteda Initiative CIC.

For more details, see https://quanteda.io.

quanteda version 4

The quanteda 4.0 is a major release that improves functionality and performance and further improves function consistency by removing previously deprecated functions. It also includes significant new tokeniser rules that makes the default tokeniser smarter than ever before, with new Unicode and ICU-compliant rules that enable it to work more consistently with even more languages.

We describe more fully these significant changes in: * an article about the new external pointer tokens objects; * an article showing performance benchmarks for the new external pointer tokens objects, as well as some of the tokeniser improvements in v4; and * the changelog for v4 a full listing of the changes, improvements, and deprecations in v4.

The quanteda family of packages

We completed the trend of splitting quanteda into modular packages with the release of v3. The quanteda family of packages includes the following:

quanteda: contains all of the core natural language processing and textual data management functions
quanteda.textmodels: contains all of the text models and supporting functions, namely the textmodel_*() functions. This was split from the main package with the v2 release
quanteda.textstats: statistics for textual data, namely the textstat_*() functions, split with the v3 release
quanteda.textplots: plots for textual data, namely the textplot_*() functions, split with the v3 release

We are working on additional package releases, available in the meantime from our GitHub pages:

quanteda.sentiment: Functions and lexicons for sentiment analysis using dictionaries
quanteda.tidy: Extensions for manipulating document variables in core quanteda objects using your favourite tidyverse functions

and more to come.

How To…

Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda")

Or for the latest development version:

# remotes package required to install quanteda from Github 
remotes::install_github("quanteda/quanteda")

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers to build the development version.

Use

See the quick start guide to learn how to use quanteda.

Get Help

Read out documentation at https://quanteda.io.
Submit a question on the quanteda channel on StackOverflow.
See our tutorial site.

Cite the package

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leave Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contribute

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Fork the source code, modify, and issue a pull request through the project GitHub page. See our Contributor Code of Conduct and the all-important quanteda Style Guide.
Issues, bug reports, and wish lists: File a GitHub issue.
Contact the maintainer by email.