hebsafeharbor

De-identification toolkit for clinical text in Hebrew


Keywords
hebrew, nlp, spacy, phi, pii
License
MIT
Install
pip install hebsafeharbor==0.1.1

Documentation

HebSafeHarbor

A de-identification toolkit for clinical text in Hebrew.

MIT license Release PyPI version Pypi Downloads

HebSafeHarbor was developed according to the requirements described in 讛爪注讛 诇诪转讜讜讛 讛转诪诪讛 砖诇 讟拽住讟 专驻讜讗讬 - 谞诪诇 诪讘讟讞讬诐 (read more here)

The toolkit integrates and uses open source libraries and assets, including HebSpacy (that runs NER model based on AlephBERT, Nemo and BMC), Presidio, Wikipedia and public lexicons.

馃摑 Demo website is available here

Contents

Installation

To install the package, run the following commands - preferably in a virtual environment

# Create conda env (optional)
conda create --name hebsafeharbor python=3.8
conda activate hebsafeharbor

# Install HebSafeHarbor
pip install hebsafeharbor

# Download the he_ner_news_trf model used by hebsafeharbor
pip install https://github.com/8400TheHealthNetwork/HebSpacy/releases/download/he_ner_news_trf-3.2.1/he_ner_news_trf-3.2.1-py3-none-any.whl

Alternatively, you may clone the repo and install all dependencies:

# Create conda env (optional)
conda create --name hebsafeharbor python=3.8
conda activate hebsafeharbor

# Install dependencies
pip install -r requirements.txt

# (Optional) Install package locally
pip install -e .

# Download the he_ner_news_trf model used by hebsafeharbor
pip install https://github.com/8400TheHealthNetwork/HebSpacy/releases/download/he_ner_news_trf-3.2.1/he_ner_news_trf-3.2.1-py3-none-any.whl

Getting started

Python

from hebsafeharbor import HebSafeHarbor

hsh = HebSafeHarbor()

text = """砖专讜谉 诇讜讬 讛转讗砖驻讝讛 讘02.02.2012 讜讙专讛 讘讗专诇讜讝专讜讘 16 专诪转 讙谉"""
doc = {"text": text}

output = hsh([doc])

print(output)

Docker Compose

The easiest way to consume HebSafeHarbor as a service with a REST API and demo application is through docker-compose setup.

Run the docker-compose command against the docker-compose.yml file in the root directory to get the latest containers from Docker Hub

docker-compose up -d --build

Navigate in the browser to http://server.localhost/docs to access the service swagger. Navigate in the browser to http://demo.localhost to test the demo application.

Development mode

To run the containers against the repo's code, run the following command:

docker-compose -f docker-compose-development.yml up -d --build

Server

Local REST endpoint

HebSafeHarbor can be consumed as a REST endpoint for the service powered by the FastAPI library. To run the service locally, you will need to first install the environment requirements as follows:

# Create conda env (optional)
conda create --name hebsafeharbor-server python=3.8
conda activate hebsafeharbor-server

# Install dependencies
pip install -r requirements.txt

# Install server dependencies
pip install -r requirements-server.txt

Once installation is done, run the following command to initiate the service:

uvicorn server:app --reload

The service should be running at http://127.0.0.1:8000. Navigate to this URL and make sure you see the following message:

{"message":"Welcome to the Hebrew Safe Harbor!"}

Now go to http://127.0.0.1:8000/docs to see interactive API documentation (Swagger).

Alternatively, you can query the service directly by send POST requests to http://127.0.0.1:8000/query with the payload as described in the API documentation.

Server Docker

To download and run the official release as a Docker container, run the following commands:

# Download image 
docker pull hebsafeharbor/hebsafeharbor

# Run container
docker run --name hsh_server -d -p 8000:8000 hebsafeharbor/hebsafeharbor:latest

Alternatively, you can build and run the Docker container:

# Build image 
docker build ./ --tag hebsafeharbor_server

# Run container
docker run --name hsh_server -d -p 8000:8000 hebsafeharbor_server

Navigate to http://localhost:8000 to validate the service is up and running. Similarly, you can go to http://localhost:8000/docs and http://localhost:8000/query.

Demo Application

For experimentation and testing purposes, we provide a Streamlit demo application. Note that the demo application functions as a (semi-)web client in the sense that it queries the remote REST endpoint and visualizes the results.

In order for the demo to work and interact with the server application, you will need to set HSH_SERVER environment variable to the REST point URL.

For example, assuming you are running the server as docker container:

[For Windows]
$env:HSH_SERVER="http://localhost:8000"

[For Linux]
export HSH_SERVER=http://localhost:8000

Run locally

To run the demo locally, first install the requirements as follows:

# Create conda env (optional)
conda create --name hebsafeharbor_demo python=3.8
conda activate hebsafeharbor_demo

# Install dependencies
pip install -r demo/requirements-streamlit.txt

Once installation is done, run the following command:

streamlit run demo/demo_app.py

Now go to http://localhost:8501 to access the application

Demo Docker

To download the official Docker container, run the following commands:

# Download image
docker pull hebsafeharbor/demo_application

# Run container - replace <SERVICE URL> with server url (e.g.,docker run -d -e HSH_SERVER=http://172.17.0.2:8000 -p 8501:8501 hebsafeharbor/demo_application:latest)
docker run --name hsh_demo -d -e "HSH_SERVER=<SERVICE URL>" -p 8501:8501 hebsafeharbor/demo_application:latest

Alternatively, you can build and run the Docker container:

# Build image 
docker build --name hsh_demo demo/. --tag hebsafeharbor_demo

# Run container - replace <SERVICE URL> with server url (e.g.,docker run -d -e HSH_SERVER=http://172.17.0.2:8000 -p 8501:8501 hebsafeharbor_demo)
docker run -d -e "HSH_SERVER=<SERVICE URL>" -p 8501:8501 hebsafeharbor_demo 

Navigate to http://localhost:8501 to access the demo application.

FAQ

Why does the server contatiner exits with 137 error code?

Note that the server container includes a trsnssformer model that consume more than 2gb of memory, increase Docker memory allocation to 6gb

Read more here.

What is the server (Docker container) IP address?

Run the network inspect bridge command to get networking information regarding running dockers

docker network inspect bridge

Your response should look like the following:

[
    {
        "Name": "bridge",
        ...
        "Containers": {
            "9611cad28701cfe0877c2bfed9ad2710202492de6d574d42a6714f439cf4f2d2": {
                "Name": "hsh_server",
                ...
                "IPv4Address": "172.17.0.2/16",
                ...
            }
        },
      ...
    }
]

The IP address is under IPv4Address associated with "Name": "hsh_server" container under Containers. In our case, HSH_SERVER = 172.17.0.2

Read more here


HebSafeHarbor is an open-source project developed by 8400 The Health Network.