A fresh approach to string manipulation in R


License: GPL-2.0

Language: R

Keywords: r, regular-expression, strings


CRAN status Travis build status AppVeyor Build Status Codecov test coverage Lifecycle: stable


Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provide a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

stringr is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you’ve mastered stringr, you should find stringi similarly easy to use.


# Install the released version from CRAN:

# Install the cutting edge development version from GitHub:
# install.packages("devtools")



All functions in stringr start with str_ and take a vector of strings as the first argument.

x <- c("why", "video", "cross", "extra", "deal", "authority")
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression "[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if there’s any match to the pattern.

    str_detect(x, "[aeiou]")
  • str_count(x, pattern) counts the number of patterns.

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
  • str_subset(x, pattern) extracts the matching components.

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
  • str_locate(x, pattern) gives the position of the match.

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
  • str_extract(x, pattern) extracts the text of the match.

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
  • str_match(x, pattern) extracts parts of the match defined by parentheses.

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
  • str_replace(x, pattern, replacement) replaces the matches with new text.

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
  • str_split(x, pattern) splits up a string into multiple pieces.

    str_split(c("a,b", "c,d,e"), ",")
    #> [[1]]
    #> [1] "a" "b"
    #> [[2]]
    #> [1] "c" "d" "e"

As well as regular expressions (the default), there are three other pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

RStudio Addin

The RegExplain RStudio addin provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions.

This addin can easily be installed with devtools:

# install.packages("devtools")

Compared to base R

R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
  • Simplifies string operations by eliminating options that you don’t need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

Project Statistics

Sourcerank 17
Repository Size 2.21 MB
Stars 308
Forks 125
Watchers 35
Open issues 12
Dependencies 10
Contributors 43
Tags 12
Last updated
Last pushed

Top Contributors See all

Hadley Wickham Marek Gagolewski Stefan Milton Bache Chel Hee Lee Mara Averick Hiroaki Yutani Christopher Gandrud TJ Mahr Shian Su Jon Harmon Derek Chiu zhaoy Clayton Yochum Stephanie Locke Jim Hester Jennifer (Jenny) Bryan Philipp Riemer Andrew Nesbitt Gábor Csárdi Richard Cotton

Packages Referencing this Repo

Simple, Consistent Wrappers for Common String Operations
Latest release 1.4.0 - Updated - 308 stars

Recent Tags See all

v1.3.1 May 10, 2018
v1.3.0 January 29, 2018
v1.2.0 February 17, 2017
v1.1.0 August 19, 2016
v1.0.0 April 29, 2015
stringr-0.6.2 December 05, 2012
stringr-0.6 December 09, 2011
stringr-0.5 April 28, 2011
stringr-0.4 August 24, 2010
stringr-0.3 February 15, 2010
stringr-0.2 November 16, 2009
string-0.1 November 11, 2009

Interesting Forks See all

Wrapper for R string functions to make them more consistent, simpler and easier to use
R - Updated - 5 stars

Something wrong with this page? Make a suggestion

Last synced: 2019-02-10 03:51:05 UTC

Login to resync this repository