Python code to parse and denormalize COBOL Copybooks

cobol, mainframe, copybook
pip install python-cobol==0.1.4


COBOL Copybook parser in Python

PyPI version Build Status

This is a COBOL Copybook parser in Python featuring the following options:

  • Parse the Copybook into a usable format to use in Python
  • Clean up the Copybook by processing REDEFINES statements and remove unused definitions
  • Denormalize the Copybook
  • Write the cleaned Copybook in COBOL
  • Strip prefixes of field names and ensure that the Copybook only contains unique names
  • Can be used from the command-line or included in own Python projects

Because I couldn't find a COBOL Copybook parser that fitted all my needs I wrote my own. It doesn't support all functions found in the Copybook, just the ones that I met on my path: REDEFINES, INDEXED BY, OCCURS.

On a day to day basis I use it so that Informatica PowerCenter creates only 1 table of my COBOL data instead multiple.

The code uses the pic parser code from pyCOBOL.

This code is licensed under GPLv3.


The easiest way to install python-cobol is using pip:

$ pip install python-cobol

Example output

Below is an example Copybook file before and after being processed.


	00000 * Example COBOL Copybook file                                     AAAAAAAA
	00000  01  PAULUS-EXAMPLE-GROUP.                                        AAAAAAAA
	00000       05  PAULUS-ANOTHER-GROUP OCCURS 0003 TIMES.                 AAAAAAAA
	00000           10  PAULUS-FIELD-1 PIC X(3).                            AAAAAAAA
	00000           10  PAULUS-FIELD-3 OCCURS 0002 TIMES                    AAAAAAAA
	00000                           PIC S9(3)V99.                           AAAAAAAA
	00000       05  PAULUS-THIS-IS-ANOTHER-GROUP.                           AAAAAAAA
	00000           10  PAULUS-YES PIC X(5).                                AAAAAAAA


	         01  EXAMPLE-GROUP.                                                     
	           05  FIELD-2-1 PIC 9(3).                                              
	           05  FIELD-3-1-1 PIC S9(3)V99.                                        
	           05  FIELD-3-1-2 PIC S9(3)V99.                                        
	           05  FIELD-2-2 PIC 9(3).                                              
	           05  FIELD-3-2-1 PIC S9(3)V99.                                        
	           05  FIELD-3-2-2 PIC S9(3)V99.                                        
	           05  FIELD-2-3 PIC 9(3).                                              
	           05  FIELD-3-3-1 PIC S9(3)V99.                                        
	           05  FIELD-3-3-2 PIC S9(3)V99.                                        
	           05  THIS-IS-ANOTHER-GROUP.                                           
	             10  YES PIC X(5).                                                 

How to use

You can use it in two ways: inside your own python code or as a stand-alone command-line utility.


Do a git clone from the repository and inside your brand new python-cobol folder run:

$ python example.cbl

This will process the redefines, denormalize the file, strip the prefixes and ensure all names are unique.

The utility allows for some command-line switches to disable some processing steps.

	$ python --help
	usage: [-h] [--skip-all-processing] [--skip-unique-names]
	                      [--skip-denormalize] [--skip-strip-prefix]

	Parse COBOL Copybooks

	positional arguments:
	  filename              The filename of the copybook.

	optional arguments:
	  -h, --help            show this help message and exit
	                        Skips unique names, denormalization and .
	  --skip-unique-names   Skips making all names unique.
	  --skip-denormalize    Skips denormalizing the COBOL.
	  --skip-strip-prefix   Skips stripping the prefix from the names.	

From within your Python code

The parser can also be called from your Python code. All you need is a list of lines in COBOL Copybook format. See how one would do it:

from python_cobol import python_cobol as cobol

with open("example.cbl",'r') as f:
    for row in cobol.process_cobol(f.readlines()):
    	print row['name']

It is also possible to call one of the more specialized functions within

  • clean_cobol(lines)

    Cleans the COBOL by converting the cobol informaton to single lines

  • parse_cobol(lines)

    Parses a list of COBOL field definitions into a list of dictionaries containing the parsed information.

  • denormalize_cobol(lines)

    Denormalizes parsed COBOL lines

  • clean_names(lines, ensure_unique_names=False, strip_prefix=False, make_database_safe=False)

    Cleans names of the fields in a list of parsed COBOL lines. Options to strip prefixes, enforce unique names and make the names database safe by converting dashes (-) to underscores (_)

  • print_cobol(lines)

    Prints parsed COBOL lines in the Copybook format