Securely hash CSV data with HMAC keyed authentication.


Keywords
cryptography, csv, data, encrypt, encryption, hash, hashing, HMAC, PII, secret, secure
License
MIT
Install
pip install hmac4csv==2019.12.13

Documentation

hmac4csv: conveniently apply HMAC to hash CSV files

License: MIT Distributed via PyPI Maintainability rated at Code Climate Test coverage at Coveralls Latest commit at GitHub

This package applies the Hashed Message Authentication Code (HMAC) Python built-in module to turn plain text CSV data into hashes, combining a secret key with the SHA-256 hashing algorithm.

The hmac4csv utility provided is a one-line executable to securely hash entire data files.

Usage

$ cat ./sample.csv
ID,SSN,FNAME,MNAME,LNAME
555555,123456789,John,Deer,Doe
10101010,111111111,Jane,Cat,Coe
346712,999999999,Cynthia,Mouse,Moe
987654,444444444,David,Fish,Foe
23232323,888888888,Susan,Duck,Hoe

$ hmac4csv sample.csv --key="a-key-shared-only-with-those-who-need-it-for-hashing" --exclude=ID
1 files to hash:
    sample.csv --> hashed/sample.csv
1 hashed files written.

$ cat ./hashed/sample.csv
ID,SSN,FNAME,MNAME,LNAME
555555,4714786a9f0ab7ef3ac6bc6faa3ab41b7ab306c386a92bd6adb4d1c24618ef0a,71049a572a1e480fbaea8e1995da39aad40ad6b81f8c0745067da4f19f30b1b1,4192cf14e31eedc899dfad95f37ac5ebbbc4e39cd3927a543bca0ea3cb162010,a59c25e6feac6e8e43ca240c39d9d0ad26c09b4167dadac17c93948df83d5709
10101010,58288e804cbf742f6c3d1fdd2df8e86e1434c9eb86f94961692c6dc7fcf589a5,9ff5bf6df33b322fb13b3029918b0d6ab837ff383ca55eb26761e96f8500a90b,d5faef47cf8ac6d4e8adb78e615501ef7d03f21f87935a42b245eda3197e43b5,be6bf5f565e94095d6094595cc96fdce80ab865966fa231ad5e708ad83f21abe
346712,71237681882d923c87358e16caafe1a98842e8aab1c9d7c30aa275afad2391a9,4f4e1ec36d273876665ac75003c153a3e010db9f3761fed7d0f04314b55ab61f,2daaab4a7ae793fd1c99a3636adc1d5d529e65a6021baeec9a2d1cff7e409550,aa6a549b8dd83563be4ef2b5de4088b2e5f6ca28648dd9b58a425eec290054ff
987654,02578a90ca1c30d130658582c2cb9c0c22817539d382ac21a35f6f3e8b2cb98e,ddaa6b44cf6d157c2a18438e50f7b9673d00073ad993553c28c3cf4b6c39dce6,a6be494f3e918808893e25323fecf765be085d5c98e414fdcd49e925ffc7f96b,91454db4e8550d1dc33113d99464d903d7ab47cde10ee82500f5d7528756b232
23232323,dcebe57e6ca62f10cbdd80957df0fcc2e16b7ee28c5938cfa8ffbd5864aac119,14c7b04b45c5cbe49f9bdbfc56580a1bf5f4c43ada2a3cdce5b66cdaf33d16d2,c73fda97afc3738bf000925ca18b81144db34c9c0a5eb6da4ba411fafdddb100,0562a66686e6df6cfd08732c49ac080a327eaade6b36b38ca318fdb844f9f41b

Running hmac4csv later (or any HMAC-SHA-256 implementation) using the same secret key (a-key-shared-only-with-those-who-need-it-for-hashing) will yield identical hashes (e.g., Doe becomes a59c25e6feac6e8e43ca240c39d9d0ad26c09b4167dadac17c93948df83d5709).

Without knowing the secret key, it is functionally impossible to directly encode or decode these values.

0. Get some data in CSV

Presumably, if you didn't have data, you wouldn't be looking at hmac4csv.

1. Create the secret key

hmac4csv is designed to allow the user to pass a text passphrase as a key. hmac4csv will automatically convert any ordinary string into the byte-based key for HMAC. (Specifically, it will first encode the string via utf-8 and then hash it with SHA-256.) If the key is being shared across parties, a lengthy passphrase* can be useful:

$ cat secret.ini
key = how to recognise different trees from quite a long way away

* See https://www.useapassphrase.com/.

If you'd prefer to separately create the key, pass the hexademical representation. hmac4csv will decode the hex into bytes to use directly as the HMAC key. For example, you might prefer a different hash function, or you might want to generate a completely random key separately.

$ cat secret.ini
key = 6083717701d662b94314ea9de278224c2800e854deeef1091a9b99734f46d7f7

2. Store the secret key

Using the hmac4csv command line utility, the key may be specified in three ways. hmac4csv will search for the key in this order:

  1. The --key option on the command line.

  2. An environment variable called HMAC4CSV_KEY.

  3. secret.ini in the current directory.

    • hmac4csv will look for a key= line in the first section.
    • A different file may be specified using the --config option.

3. Consider excluding columns

It's often useful to preserve one or a few columns. In the first example above, we preserved the ID column in its original form. That would let us link back to any other information from our data source.

With the --exclude option, pass a comma-separated list of columns that will not be altered in the hashed csv that hmac4csv creates.

4. Consider a dry run

Using the --dry-run or simply -n option, the list of CSVs to be hashed and their output filenames will be listed but no files will be written. Files that would be overwritten on a full hmac4csv run are noted.

5. Engage

You've created your key and considered whether to keep any columns as they are. You've done a dry run to confirm that it's collecting and translating the files you expect. You're ready for a full run of hmac4csv to get that information securely hashed.

Caution

Bear in mind that this is a hashing algorithm. It is useful because we can compare exact values without observing any raw values.

Without the key, you can learn nothing from the hashed data and you cannot (reasonably) create comparable hashes.

Even with the key, the original content cannot (reasonably) be recovered directly from the hashed value. This is hashing, not encryption.

Additional reading

If you'd like to understand more about HMAC, head over to Wikipedia:

In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. It may be used to simultaneously verify both the data integrity and the authenticity of a message, as with any MAC. Any cryptographic hash function, such as SHA-256 or SHA-3, may be used in the calculation of an HMAC; the resulting MAC algorithm is termed HMAC-X, where X is the hash function used (e.g. HMAC-SHA256 or HMAC-SHA3). The cryptographic strength of the HMAC depends upon the cryptographic strength of the underlying hash function, the size of its hash output, and the size and quality of the key.

I also found the following helpful in learning about HMAC: