Installing the package
cyphr from github:
libsodium will be needed to compile the package. On Linux, this is not in many mainstream repos, so see installation instructions in the sodium package. On mac, get libsodium from homebrew or install from source.
It provides high level functions to:
- Encrypt and decrypt
- User-friendly wrappers (
decrypt) around R's file reading and writing functions that enable transparent encryption (support included for
The package aims to make encrypting and decrypting as easy as
cyphr::encrypt(save.csv(dat, "file.csv"), key)
dat <- cyphr::decrypt(read.csv("file.csv", stringsAsFactors=FALSE), key)
In addition, the package implements a workflow that allows a group to securely share data by encrypting it with a shared ("symmetric") key that is in turn encrypted with each users ssh keys. The use case is a group of researchers who are collaborating on a dataset that cannot be made public, for example containing sensitive data. However, they have decided or need to store it in a setting that they are not 100% confident about the security of the data. So encrypt the data at each read/write.
Objects to handle keys:
Decide on a style of encryption and create a
config_sodium_symmetric: Symmetric encryption, using sodium -- everyone shares the same key (which must be kept secret!) and can encrypt and decrpt data with it. This is used as a building block but is inflexible because of the need to keep the key secret.
config_sodium_public: Public key encryption -- this lets people encrypt messages using your public key that only you can read using your private key.
config_sodium_authenticated: Public key authenticated encryption, using sodium -- this is used for secure messaging and combines two people's public keys and uses a combination of public and private keys to communicate.
config_sodium_openssl: Public key authenticaed encryption, using openssl (see
?encrypt_envelopein the the
To generate keys, you really should read the underling documentation in the
openssl packages! The
sodium keys do not have a file format: they are simply random data. So a secret symmetric key in
sodium might be:
key <- sodium::keygen() key
##  a7 a3 aa a7 74 ab d4 62 45 37 46 08 cb 3a ee 12 f9 11 37 b0 93 8b 87 ##  c3 93 25 9e 85 7b 73 90 a1
With this key we can create the
cfg <- cyphr::config_sodium_symmetric(key) class(cfg)
##  "cyphr_config"
## <cyphr: sodium_symmetric>
If the key was saved to file that would work too:
path <- tempfile() writeBin(key, path) cfg <- cyphr::config_sodium_symmetric(path) cfg
## <cyphr: sodium_symmetric>
If you load a password protected ssh key you will be prompted for your passphrase.
cyphr will ensure that this is not echoed onto the console.
cfg <- cyphr::config_openssl() cfg ## Please enter private key passphrase:
Encrypt / decrypt a file
If you have files that already exist and you want to encrypt or decrypt, the functions
cyphr::decrypt_file will do that (these are workhorse functions that are used internally throughout the package)
saveRDS(iris, "myfile") cyphr::encrypt_file("myfile", "myfile.encrypted", cfg)
The file is encrypted now:
## Error in readRDS("myfile.encrypted"): unknown input format
Decrypt the file and read it:
cyphr::decrypt_file("myfile.encrypted", "myfile.clear", cfg) identical(readRDS("myfile.clear"), iris)
##  TRUE
Wrappers around R's file functions
While encrypting files is nice, the aim of the package is
To encrypt the output of a file producing command, wrap it in
cyphr::encrypt(saveRDS(iris, "myfile.rds"), cfg)
To decrypt the a file to feed into a file consuming command, wrap it in
dat <- cyphr::decrypt(readRDS("myfile.rds"), cfg)
The roundtrip preserves the data:
identical(dat, iris) # yay
##  TRUE
But without the key, it cannot be read:
readRDS("myfile.rds") # unknown format
## Error in readRDS("myfile.rds"): unknown input format
The above commands work through computing on the language, rewriting the
saveRDS commands. Commands for reading and writing tabular and plain text files (
readLines, etc) are also supported, and the way the rewriting is done is designed to be extensible.
With (probably) some limitations, the argument to the wrapped functions can be connection objects. In this case the actual command is written to a file and the contents of that file are encrypted and written to the connection. When reading/writing multiple objects from/to a single connection though, this is likely to go very badly.
cyphr::decrypt compute on the language, standard evaluation forms
cyphr::decrypt_ are provided that take a quoted expression as their first argument.
Supporting additional functions
The functions supported so far are:
However, there are bound to be more functions that could be useful to add here (e.g.,
readxl::read_excel). Either pass the name of the file argument to
cyphr::decrypt(readxl::read_excel("myfile.xlsx"), key, file_arg="path")
or register the function with the package using
cyphr::rewrite_register("readxl", "read_excel", "path")
Then you can use
to decrypt the file.
It's possible that this means there are two packages here, but I have a single use case so they're together for now at least. The package contains support for a group of people are working on a sensitive data set. The data will be stored with a symmetric key. However, we never actually store the key directly, instead we'll store a copy that is encrypted with the user key. Any user with access to the data can authorise another user to access the data. This is described in more detail in the vignette (in R:
Why not a connection object?
A proper connection could be nice but there are two issues stopping this:
sodiumdoes not support streaming encryption/decrption. It might be possible (bindings to node and swift have it). In general this would be great and allow the sort of cool things you can do with streaming large data in curl.
- R plays pretty loose and free with creating connections when given a filename;
saveRDSwill open files with decompression on in binary mode,
write.tableadds encoding information when openning the connection object. The logic around what happens is entirely within the functions themselves so is hard to capture in a general way.
- Connection objects look like a pain to write.
There are still problems with the approach I've taken:
- Appending does not work: we'd need to unencrypt the file first for that to be OK. This is an issue for
write.table, but not
- Non-file arguments are going to suck (though it's possible that something could be done to detect connections)
In the end, you can always write things out however you like and use
encrypt_file to encrypt the file afterwards.
Why are wrappers needed?
The low level functions in
openssl work with raw data, for generality. Few users encounter raw vectors in their typical use of R, so these require serialisation. Most of the encryption involves a little extra random data (the "nonce" in
sodium and similar additional pieces with
openssl). These need storing with the data, and then separating from the dadta when decryption happens.