pfdicom_tagSub
Table of Contents
Quick Overview
-
pfdicom_tagSub
reads/edits/saves DICOM meta information. It can be used to anonymize DICOM header data.
Overview
pfdicom_tagSub
replaces a set of <tag, value>
pairs in a DICOM header with values passed in a JSON structure. Individual DICOM tags can be explicitly referenced in the JSON structure, as well as a regular expression construct to capture all tags satisfying that expression (allowing for idiomatic bulk substitution of <tag, value>
pairs).
Tag regular expression constructs are python
string expressions and are prefixed by "re:<pythonRegex>"
. For example, "re:.*hysician"
will perform some substitution on all tags that contain the letters hysician
. The value substitution has access to a special lookup, #tag
, which is the current tag hit. It is possible to apply built in functions to the tag hit, for example md5
hashing, using "%_md5|4_#tag"
,
{
"re:.*hysician": "%_md5|4_#tag"
}
will be expanded to
{
"PerformingPhysiciansName" : "%_md5|4_PerformingPhysiciansName"
"PhysicianofRecord" : "%_md5|4_PhysicianofRecord"
"ReferringPhysiciansName" : "%_md5|4_ReferringPhysiciansName"
"RequestingPhysician" : "%_md5|4_RequestingPhysician"
}
The tag regular expression construct allows for simple and powerful bulk substition of <tag, value>
pairs.
The script accepts an <inputDir>
, and then from this point an os.walk()
is performed to extract all the subdirs. Each subdir is examined for DICOM files (in the simplest sense by a file extension mapping) are passed to a processing method that reads and replaces specified DICOM tags, saving the result in a corresponding directory and filename in the output tree.
Installation
Dependencies
The following dependencies are installed on your host system/python3 virtual env (they will also be automatically installed if pulled from pypi):
-
pfmisc
(various misc modules and classes for the pf* family of objects) -
pftree
(create a dictionary representation of a filesystem hierarchy) -
pfdicom
(handle underlying DICOM file reading)
Using PyPI
The best method of installing this script and all of its dependencies is by fetching it from PyPI
pip3 install pfdicom_tagSub
Command line arguments
-I|--inputDir <inputDir>
Input DICOM directory to examine. By default, the first file in this
directory is examined for its tag information. There is an implicit
assumption that each <inputDir> contains a single DICOM series.
[-i|--inputFile <inputFile>]
An optional <inputFile> specified relative to the <inputDir>. If
specified, then do not perform a directory walk, but convert only
this file.
[-e|--extension <DICOMextension>]
An optional extension to filter the DICOM files of interest from the
<inputDir>.
-O|--outputDir <outputDir>
The output root directory that will contain a tree structure identical
to the input directory, and each "leaf" node will contain the analysis
results.
[--outputLeafDir <outputLeafDirFormat>]
If specified, will apply the <outputLeafDirFormat> to the output
directories containing data. This is useful to blanket describe
final output directories with some descriptive text, such as
'anon' or 'preview'.
This is a formatting spec, so
--outputLeafDir 'preview-%s'
where %%s is the original leaf directory node, will prefix each
final directory containing output with the text 'preview-' which
can be useful in describing some features of the output set.
[-F|--tagFile <JSONtagFile>]
Parse the tags and their "subs" from a JSON formatted <JSONtagFile>.
[-T|--tagStruct <JSONtagStructure>]
Parse the tags and their "subs" from a JSON formatted <JSONtagStucture>
string passed directly in the command line. Note that sometimes protecting
a JSON string can be tricky, especially when used in scripts or as variable
expansions. If the JSON string is problematic, use the [--tagInfo <string>]
instead.
[--tagInfo <delimited_parameters>]
A token delimited string that is reconstructed into a JSON structure by the
script. This is often useful if the [--tagStruict] JSON string is hard to
parse in scripts and variable passing within scripts. The format of this
string is:
"<tag1><splitKeyValue><value1><split_token><tag2><splitKeyValue><value2>"
for example:
--splitToken ","
--splitKeyValue ':'
--tagInfo "PatientName:anon,PatientID:%_md5|7_PatientID"
or more complexly (esp if the ':' is part of the key):
--splitToken "++"
--splitKeyValue "="
--tagInfo "PatientBirthDate = %_strmsk|******01_PatientBirthDate ++
re:.*hysician" = %_md5|4_#tag"
[-s|--splitToken <split_token>]
The token on which to split the <delimited_parameters> string.
Default is '++'.
[-k|--splitKeyValue <keyValueSplit>]
The token on which to split the <key> <value> pair. Default is ':'
but this can be problematic if the <key> itself has a ':' (for example
in the regular expression expansion).
[-o|--outputFileStem <outputFileStem>]
The output file stem to store data. This should *not* have a file
extension, or rather, any "." chars. Dots in the name are considered
part of the stem and are *not* considered extensions.
[--threads <numThreads>]
If specified, break the innermost analysis loop into <numThreads>
threads.
[-x|--man]
Show full help.
[-y|--synopsis]
Show brief help.
[--json]
If specified, output a JSON dump of final return.
[--followLinks]
If specified, follow symbolic links.
[-v|--verbosity <level>]
Set the app verbosity level.
0: No internal output;
1: Run start / stop output notification;
2: As with level '1' but with simpleProgress bar in 'pftree';
3: As with level '2' but with list of input dirs/files in 'pftree';
5: As with level '3' but with explicit file logging for
- read
- analyze
- write
Examples
Perform a DICOM anonymization by processing specific tags:
pfdicom_tagSub \
-e dcm \
-I /var/www/html/normsmall \
-O /var/www/html/anon \
--tagStruct '
{
"PatientName": "%_name|patientID_PatientName",
"PatientID": "%_md5|7_PatientID",
"AccessionNumber": "%_md5|8_AccessionNumber",
"PatientBirthDate": "%_strmsk|******01_PatientBirthDate",
"re:.*hysician": "%_md5|4_#tag",
"re:.*stitution": "#tag",
"re:.*ddress": "#tag"
}
' --threads 0 --printElapsedTime
-- OR equivalently --
pfdicom_tagSub \
-e dcm \
-I /var/www/html/normsmall \
-O /var/www/html/anon \
--splitToken "," \
--splitKeyValue "=" \
--tagInfo '
PatientName = %_name|patientID_PatientName,
PatientID = %_md5|7_PatientID,
AccessionNumber = %_md5|8_AccessionNumber,
PatientBirthDate = %_strmsk|******01_PatientBirthDate,
re:.*hysician = %_md5|4_#tag,
re:.*stitution = #tag,
re:.*ddress = #tag
' --threads 0 --printElapsedTime
will replace the explicitly named tags as shown:
- the
PatientName
value will be replaced with a Fake Name, seeded on thePatientID
; - the
PatientID
value will be replaced with the first 7 characters of an md5 hash of thePatientID
; - the
AccessionNumber
value will be replaced with the first 8 characters of an md5 hash of the AccessionNumber; - the
PatientBirthDate
value will set the final two characters, i.e. the day of birth, to01
and preserve the other birthdate values; - any tags with the substring
hysician
will have their values replaced with the first 4 characters of the corresponding tag value md5 hash; - any tags with
stitution
andddress
substrings in the tag contents will have the corresponding value simply set to the tag name.
NOTE:
Spelling matters! Especially with the substring bulk replace, please make sure that the substring has no typos, otherwise the target tags will most probably not be processed.
_-30-_