ropensci/cyphr


shipit Humane encryption

https://ropensci.github.io/cyphr

License: Other

Language: R

Keywords: encryption, openssl, peer-reviewed, r, r-package, rstats, sodium


cyphr

Installing the package

To install cyphr from github:

devtools::install_github("richfitz/cyphr")

Note that libsodium will be needed to compile the package. On Linux, this is not in many mainstream repos, so see installation instructions in the sodium package. On mac, get libsodium from homebrew or install from source.

Enryption Wrappers

Encryption wrappers, using low-level support from sodium and openssl. This package is designed to be extremely easy to use, rather than the most secure thing (you're using R, remember).

It provides high level functions to:

  • Encrypt and decrypt
    • strings: encrypt_string / decrypt_string
    • objects: encrypt_object / decrypt_object
    • raw data: encrypt_data / decrypt_data
    • files: encrypt_file / decrypt_file
  • User-friendly wrappers (encrypt and decrypt) around R's file reading and writing functions that enable transparent encryption (support included for readRDS/writeRDS, read.csv/write.csv, etc).

The package aims to make encrypting and decrypting as easy as

cyphr::encrypt(save.csv(dat, "file.csv"), key)

and

dat <- cyphr::decrypt(read.csv("file.csv", stringsAsFactors=FALSE), key)

In addition, the package implements a workflow that allows a group to securely share data by encrypting it with a shared ("symmetric") key that is in turn encrypted with each users ssh keys. The use case is a group of researchers who are collaborating on a dataset that cannot be made public, for example containing sensitive data. However, they have decided or need to store it in a setting that they are not 100% confident about the security of the data. So encrypt the data at each read/write.

Objects to handle keys:

Decide on a style of encryption and create a cyphr_config object

  • config_sodium_symmetric: Symmetric encryption, using sodium -- everyone shares the same key (which must be kept secret!) and can encrypt and decrpt data with it. This is used as a building block but is inflexible because of the need to keep the key secret.
  • config_sodium_public: Public key encryption -- this lets people encrypt messages using your public key that only you can read using your private key.
  • config_sodium_authenticated: Public key authenticated encryption, using sodium -- this is used for secure messaging and combines two people's public keys and uses a combination of public and private keys to communicate.
  • config_sodium_openssl: Public key authenticaed encryption, using openssl (see ?encrypt_envelope in the the openssl package.

To generate keys, you really should read the underling documentation in the sodium or openssl packages! The sodium keys do not have a file format: they are simply random data. So a secret symmetric key in sodium might be:

key <- sodium::keygen()
key
##  [1] a7 a3 aa a7 74 ab d4 62 45 37 46 08 cb 3a ee 12 f9 11 37 b0 93 8b 87
## [24] c3 93 25 9e 85 7b 73 90 a1

With this key we can create the cyphr_config object:

cfg <- cyphr::config_sodium_symmetric(key)
class(cfg)
## [1] "cyphr_config"
cfg
## <cyphr: sodium_symmetric>

If the key was saved to file that would work too:

path <- tempfile()
writeBin(key, path)
cfg <- cyphr::config_sodium_symmetric(path)
cfg
## <cyphr: sodium_symmetric>

If you load a password protected ssh key you will be prompted for your passphrase. cyphr will ensure that this is not echoed onto the console.

cfg <- cyphr::config_openssl()
cfg
## Please enter private key passphrase:

Encrypt / decrypt a file

If you have files that already exist and you want to encrypt or decrypt, the functions cyphr::encrypt_file and cyphr::decrypt_file will do that (these are workhorse functions that are used internally throughout the package)

saveRDS(iris, "myfile")
cyphr::encrypt_file("myfile", "myfile.encrypted", cfg)

The file is encrypted now:

readRDS("myfile.encrypted")
## Error in readRDS("myfile.encrypted"): unknown input format

Decrypt the file and read it:

cyphr::decrypt_file("myfile.encrypted", "myfile.clear", cfg)
identical(readRDS("myfile.clear"), iris)
## [1] TRUE

Wrappers around R's file functions

While encrypting files is nice, the aim of the package is

To encrypt the output of a file producing command, wrap it in cyphr::encrypt

cyphr::encrypt(saveRDS(iris, "myfile.rds"), cfg)

To decrypt the a file to feed into a file consuming command, wrap it in cyphr::decrypt:

dat <- cyphr::decrypt(readRDS("myfile.rds"), cfg)

The roundtrip preserves the data:

identical(dat, iris) # yay
## [1] TRUE

But without the key, it cannot be read:

readRDS("myfile.rds") # unknown format
## Error in readRDS("myfile.rds"): unknown input format

The above commands work through computing on the language, rewriting the readRDS and saveRDS commands. Commands for reading and writing tabular and plain text files (read.csv, readLines, etc) are also supported, and the way the rewriting is done is designed to be extensible.

With (probably) some limitations, the argument to the wrapped functions can be connection objects. In this case the actual command is written to a file and the contents of that file are encrypted and written to the connection. When reading/writing multiple objects from/to a single connection though, this is likely to go very badly.

Because cyphr::encrypt and cyphr::decrypt compute on the language, standard evaluation forms cyphr::encrypt_ and cyphr::decrypt_ are provided that take a quoted expression as their first argument.

Supporting additional functions

The functions supported so far are:

  • readLines / writeLines
  • readRDS / writeRDS
  • read / save
  • read.table / write.table
  • read.csv / read.csv2 / write.csv
  • read.delim / read.delim2

However, there are bound to be more functions that could be useful to add here (e.g., readxl::read_excel). Either pass the name of the file argument to cyphr::encrypt / cyphr::decrypt as

cyphr::decrypt(readxl::read_excel("myfile.xlsx"), key, file_arg="path")

or register the function with the package using rewrite_register:

cyphr::rewrite_register("readxl", "read_excel", "path")

Then you can use

cyphr::decrypt(readxl::read_excel("myfile.xlsx"), key)

to decrypt the file.

Workflow support

It's possible that this means there are two packages here, but I have a single use case so they're together for now at least. The package contains support for a group of people are working on a sensitive data set. The data will be stored with a symmetric key. However, we never actually store the key directly, instead we'll store a copy that is encrypted with the user key. Any user with access to the data can authorise another user to access the data. This is described in more detail in the vignette (in R: vignette("data", package="cyphr")).

Why not a connection object?

A proper connection could be nice but there are two issues stopping this:

  1. sodium does not support streaming encryption/decrption. It might be possible (bindings to node and swift have it). In general this would be great and allow the sort of cool things you can do with streaming large data in curl.
  2. R plays pretty loose and free with creating connections when given a filename; readRDS/saveRDS will open files with decompression on in binary mode, read.csv/write.csv don't. write.table adds encoding information when openning the connection object. The logic around what happens is entirely within the functions themselves so is hard to capture in a general way.
  3. Connection objects look like a pain to write.

There are still problems with the approach I've taken:

  • Appending does not work: we'd need to unencrypt the file first for that to be OK. This is an issue for write.table, but not writeLines.
  • Non-file arguments are going to suck (though it's possible that something could be done to detect connections)

In the end, you can always write things out however you like and use encrypt_file to encrypt the file afterwards.

Why are wrappers needed?

The low level functions in sodium and openssl work with raw data, for generality. Few users encounter raw vectors in their typical use of R, so these require serialisation. Most of the encryption involves a little extra random data (the "nonce" in sodium and similar additional pieces with openssl). These need storing with the data, and then separating from the dadta when decryption happens.

Project Statistics

Sourcerank 7
Repository Size 274 KB
Stars 67
Forks 6
Watchers 7
Open issues 3
Dependencies 6
Contributors 3
Tags 0
Created
Last updated
Last pushed

Top Contributors See all

Rich FitzJohn Jeroen Ooms Thibaut Jombart

Packages Referencing this Repo

cyphr
High Level Encryption Wrappers
Latest release 1.0.1 - Published - 67 stars

Something wrong with this page? Make a suggestion

Last synced: 2016-05-27 17:37:32 UTC

Login to resync this repository