Redacting classified documents
This repository holds the code base for my redacted
libraries and executables.
It is mainly based off my Feistel cipher for Format-Preserving Encryption to which I added a few tools to handle document, database and file manipulation to ease out the operation.
In some fields (like healthcare for instance), protecting the privacy of data whilst being able to conduct in-depth studies is both vital and mandatory. Redacting documents and databases is therefore the obligatory passage.
With redacted
, I provide a simple yet secure tool to help redacting documents based on either a dictionary, a record layout or a tag to decide which parts should actually be redacted.
As of the latest version, this repository comes with four different flavours:
- Executables (to use on either Linux, MacOS or Windows environments);
- A Go library;
- A Python library;
- A Scala library to use in the JVM (which is not yet available on Maven Central Repository);
- A TypeScript library (which is also available on NPM).
You can use either a dictionary or a tag (or both) to identify the words you want to redact in a document.
The tag should be placed before any word that should be redacted. The default tag is the tilde character (~
).
For example, the following sentence will only see the word tagged
redacted: "This is a ~tagged sentence"
.
Usage of ./redacted:
-b add to use both dictionary and tag
-d string
the optional path to the dictionary of words to redact
-h string
the hash engine for the round function (default "sha-256")
-i string
the path to the document to be redacted
-k string
the optional key for the FPE scheme (leave it empty to use default)
-o string
the name of the output file
-r int
the number of rounds for the Feistel cipher (default 10)
-t string
the optional tag that prefixes words to redact (default "~")
-x add to expand a redacted document
The dictionary file must contain a list of word separated by a space.
Download the version for the platform of your choice then execute the following command:
$ ./redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b
@also Installation procedure here
IMPORTANT: Do not use with input texts having lines longer than 65536 characters.
$ java -cp path/to/redacted.jar com.cyrildever.redacted.Main -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b
$ redacted -i myFile.txt -o myRedactedFile.txt -d myDictionary.txt -b
@see Installation procedure here
$ python3 -m redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b
Go
$ go get github.com/cyrildever/redacted/go
import (
"github.com/cyrildever/feistel"
"github.com/cyrildever/redacted/go/core"
"github.com/cyrildever/redacted/go/model"
)
// Load dictionary
dic, err := model.FileToDictionary("/path/to/dictionary.txt")
// Prepare FPE cipher
cipher := feistel.NewFPECipher(hashEngine, key, rounds)
// Instantiate redactor
redactor := core.NewRedactorWithDictionary(dic, cipher)
// Redact a line
redacted := redactor.Redact(line)
fmt.Println(redacted)
// Expand a redacted line
assert.Equal(t, redactor.Expand(redacted), line)
See the Dictionary
and the Redactor
implementations to use other kinds of dictionaries (as a slice or from a string) and/or redactors (with or without tag and dictionary).
NB: You may use any other kind of Format-Preserving Encryption library as long as it respects the following interface:
type FPE interface {
Decrypt(base256.Readable) (string, error)
Encrypt(string) (base256.Readable, error)
}
See my implementation of the base256.Readable
string type alias in its module.
To build in 64-bits (after cloning the repository and assuming you are on MacOS):
(for MacOS)
$ cd go
$ GOOS=darwin GOARCH=amd64 go build -o bin/redacted main.go
(for Linux)
$ brew install FiloSottile/musl-cross/musl-cross --with-arm
$ git clone https://github.com/cyrildever/redacted.git && cd redacted/go
$ CGO_ENABLED=1 GOOS=linux GOARCH=amd64 CC="x86_64-linux-musl-gcc" go build -o bin/redacted-linux --ldflags '-w -linkmode external -extldflags "-static"' main.go
   @see https://github.com/FiloSottile/homebrew-musl-cross
(for Windows)
$ brew install mingw-w64
$ git clone https://github.com/cyrildever/redacted.git && cd redacted/go
$ CGO_ENABLED=1 GOOS=windows GOARCH=amd64 CC="x86_64-w64-mingw32-gcc" go build -o bin/redacted.exe main.go
Python
$ pip install redacted-py
from redacted import DefaultRedactor, Dictionary
from feistel import FPECipher, SHA_256
source = "Some text ~tagged or using words in a dictionary"
cipher = FPECipher(SHA_256, key, 10)
redactor = DefaultRedactor(cipher)
redacted = redactor.redact(source)
expanded = redactor.expand(redacted)
assert expanded == source, "Original data should equal ciphered then deciphered data"
cleansed = redactor.clean(expanded)
assert cleansed == "Some text tagged or using words in a dictionary", "Cleaning should remove any tag mark"
Scala
In a Scala 2.12 project:
libraryDependencies ++= Seq(
"com.cyrildever" %% "feistel-jar" % "1.5.6",
"com.cyrildever" %% "redacted" % "1.0.5"
)
import com.cyrildever.feistel.common.utils.hash.Engine._
import com.cyrildever.feistel.Feistel
import com.cyrildever.redacted.core.Redactor
val source = "Some text ~tagged or using words in a dictionary"
val cipher = Feistel.FPECipher(SHA_256, key, 10)
val redactor = Redactor(dictionary, tag, cipher, true)
val redacted = redactor.redact(source)
val expanded = redactor.expand(redacted)
assert(expanded == source)
NB: You might need to provide the expected BouncyCastle JAR file, eg. bcprov-jdk15to18-1.73.jar
.
TypeScript/JavaScript
$ npm install redacted-ts
import { DefaultRedactor, Dictionary } from 'redacted-ts'
import { FPECipher, SHA_256 } from 'feistel-cipher'
const source = 'Some text ~tagged or using words in a dictionary'
const cipher = new FPECipher(SHA_256, key, 10)
const redactor = DefaultRedactor(cipher)
const redacted = redactor.redact(source)
const expanded = redactor.expand(redacted)
assert(expanded === source)
const cleansed = redactor.clean(expanded)
assert(cleansed === 'Some text tagged or using words in a dictionary')
The use of the redacted
libraries and executables are subject to fees for commercial purpose and to the respect of the BSD-2-Clause-Patent license.
Please contact me to get further information.
© 2021-2024 Cyril Dever. All rights reserved.