redacted-py

Redacting classified documents


Keywords
data, obfuscation, masking, redacted, classified, data-masking, documents, executables, golang, javascript, library, python, typescript
License
Other
Install
pip install redacted-py==1.0.5

Documentation

redacted

Redacting classified documents

GitHub tag (latest by date) GitHub last commit GitHub issues npm NPM PyPI - Version

This repository holds the code base for my redacted libraries and executables. It is mainly based off my Feistel cipher for Format-Preserving Encryption to which I added a few tools to handle document, database and file manipulation to ease out the operation.

Motivation

In some fields (like healthcare for instance), protecting the privacy of data whilst being able to conduct in-depth studies is both vital and mandatory. Redacting documents and databases is therefore the obligatory passage. With redacted, I provide a simple yet secure tool to help redacting documents based on either a dictionary, a record layout or a tag to decide which parts should actually be redacted.

As of the latest version, this repository comes with four different flavours:

  • Executables (to use on either Linux, MacOS or Windows environments);
  • A Go library;
  • A Python library;
  • A Scala library to use in the JVM (which is not yet available on Maven Central Repository);
  • A TypeScript library (which is also available on NPM).

Usage

You can use either a dictionary or a tag (or both) to identify the words you want to redact in a document. The tag should be placed before any word that should be redacted. The default tag is the tilde character (~).

For example, the following sentence will only see the word tagged redacted: "This is a ~tagged sentence".

1. Executables

Usage of ./redacted:
  -b    add to use both dictionary and tag
  -d string
        the optional path to the dictionary of words to redact
  -h string
        the hash engine for the round function (default "sha-256")
  -i string
        the path to the document to be redacted
  -k string
        the optional key for the FPE scheme (leave it empty to use default)
  -o string
        the name of the output file
  -r int
        the number of rounds for the Feistel cipher (default 10)
  -t string
        the optional tag that prefixes words to redact (default "~")
  -x    add to expand a redacted document

The dictionary file must contain a list of word separated by a space.

Download the version for the platform of your choice then execute the following command:

$ ./redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b

@also Installation procedure here

IMPORTANT: Do not use with input texts having lines longer than 65536 characters.

Alternative using Java and the redacted JAR
$ java -cp path/to/redacted.jar com.cyrildever.redacted.Main -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b

Alternative using the TypeScript CLI

$ redacted -i myFile.txt -o myRedactedFile.txt -d myDictionary.txt -b

@see Installation procedure here

Alternative using Python

$ python3 -m redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b

2. Libraries

Go

$ go get github.com/cyrildever/redacted/go
import (
    "github.com/cyrildever/feistel"
    "github.com/cyrildever/redacted/go/core"
    "github.com/cyrildever/redacted/go/model"
)

// Load dictionary
dic, err := model.FileToDictionary("/path/to/dictionary.txt")

// Prepare FPE cipher
cipher := feistel.NewFPECipher(hashEngine, key, rounds)

// Instantiate redactor
redactor := core.NewRedactorWithDictionary(dic, cipher)

// Redact a line
redacted := redactor.Redact(line)
fmt.Println(redacted)

// Expand a redacted line
assert.Equal(t, redactor.Expand(redacted), line)

See the Dictionary and the Redactor implementations to use other kinds of dictionaries (as a slice or from a string) and/or redactors (with or without tag and dictionary).

NB: You may use any other kind of Format-Preserving Encryption library as long as it respects the following interface:

type FPE interface {
    Decrypt(base256.Readable) (string, error)
    Encrypt(string) (base256.Readable, error)
}

See my implementation of the base256.Readable string type alias in its module.

To build in 64-bits (after cloning the repository and assuming you are on MacOS):

(for MacOS)

$ cd go
$ GOOS=darwin GOARCH=amd64 go build -o bin/redacted main.go

(for Linux)

$ brew install FiloSottile/musl-cross/musl-cross --with-arm
$ git clone https://github.com/cyrildever/redacted.git && cd redacted/go
$ CGO_ENABLED=1 GOOS=linux GOARCH=amd64 CC="x86_64-linux-musl-gcc" go build -o bin/redacted-linux --ldflags '-w -linkmode external -extldflags "-static"' main.go

   @see https://github.com/FiloSottile/homebrew-musl-cross

(for Windows)

$ brew install mingw-w64
$ git clone https://github.com/cyrildever/redacted.git && cd redacted/go
$ CGO_ENABLED=1 GOOS=windows GOARCH=amd64 CC="x86_64-w64-mingw32-gcc" go build -o bin/redacted.exe main.go

Python

$ pip install redacted-py
from redacted import DefaultRedactor, Dictionary
from feistel import FPECipher, SHA_256

source = "Some text ~tagged or using words in a dictionary"

cipher = FPECipher(SHA_256, key, 10)
redactor = DefaultRedactor(cipher)
redacted = redactor.redact(source)

expanded = redactor.expand(redacted)
assert expanded == source, "Original data should equal ciphered then deciphered data"

cleansed = redactor.clean(expanded)
assert cleansed == "Some text tagged or using words in a dictionary", "Cleaning should remove any tag mark"

Scala

In a Scala 2.12 project:

libraryDependencies ++= Seq(
    "com.cyrildever" %% "feistel-jar" % "1.5.6",
    "com.cyrildever" %% "redacted" % "1.0.5"
)
import com.cyrildever.feistel.common.utils.hash.Engine._
import com.cyrildever.feistel.Feistel
import com.cyrildever.redacted.core.Redactor

val source = "Some text ~tagged or using words in a dictionary"

val cipher = Feistel.FPECipher(SHA_256, key, 10)
val redactor = Redactor(dictionary, tag, cipher, true)
val redacted = redactor.redact(source)

val expanded = redactor.expand(redacted)
assert(expanded == source)

NB: You might need to provide the expected BouncyCastle JAR file, eg. bcprov-jdk15to18-1.73.jar.

TypeScript/JavaScript

$ npm install redacted-ts
import { DefaultRedactor, Dictionary } from 'redacted-ts'
import { FPECipher, SHA_256 } from 'feistel-cipher'

const source = 'Some text ~tagged or using words in a dictionary'

const cipher = new FPECipher(SHA_256, key, 10)
const redactor = DefaultRedactor(cipher)
const redacted = redactor.redact(source)

const expanded = redactor.expand(redacted)
assert(expanded === source)

const cleansed = redactor.clean(expanded)
assert(cleansed === 'Some text tagged or using words in a dictionary')

License

The use of the redacted libraries and executables are subject to fees for commercial purpose and to the respect of the BSD-2-Clause-Patent license.
Please contact me to get further information.


© 2021-2024 Cyril Dever. All rights reserved.