screenplay-pdf-to-json

Parse PDF screenplays into rich JSON format


License
MIT
Install
pip install screenplay-pdf-to-json==0.3.4

Documentation

Screenplay Parser

Parse PDF screenplay into rich JSON format

Install

pip install screenplay-pdf-to-json

Package Dependencies

Contributing

Clone this repository and run the following:

pipenv install

# or

pip3 install -r requirements.txt

Usage

As a CLI:

python $PATH_OF_PACKAGE/src/convert.py -s path_of_screenplay.pdf --start page_number_to_start_analyzing

As a library:

from screenplay_pdf_to_json import convert
fp =  open('screenplay.pdf', 'rb')
scriptJSON = convert(fp, 0)
print(scriptJSON)

Notes

  • Works well for "clean" PDF screenplays, not OCR PDFs.

  • Production screenplays works pretty well.

JSON structure

[{

// page number

"page": 1,



// scene info

"scene_info": {

"region":  "EXT.",  //region of scene [EXT., INT., EXT./INT, INT./EXT]

"location":  "VILLA",

"time": ["DAY"] // time of scene [DAY, NIGHT, DAWN, DUSK, ...]

},

"scene": [{

"type":  "ACTION",  // type of snippet [ACTION, CHARACTER, TRANSITION, DUAL_DIALOGUE]

"content": {...} // content differs based on ACTION

}, {...}]



}, {...}]
  • The initial pages of a screenplay that's a part of the title page, TOC, cast list, ... is included as type FIRST_PAGES.

  • It's really an array of dictionaries rather than a JSON object.

Type Content Structure

  • ACTION
"content": [{

"text":  "an action paragraph",

"x": 108,

"y": 120 // Y-axis of last line in paragraph

}, {...}]
  • CHARACTER
"content": {

"character":  "MILES",

"modifier": null,  // V.O, O.S., and more. null if no modifier

"dialogue": [

"Hey good morning. How you doing?... Weekend was short, huh? ",

"(he turns to another kid)",  //parentheticals are seperated

" Oh my gosh this is embarrassing, we wore the same jacket--"

]

}
  • DUAL_DIALOGUE
"content": {

"character1": {

"character": {

"character":  "PETER",

"modifier": null

},

"dialogue": [

"(groggy)",

" Why are you trying to kill me?--"

]

},

"character2": {

"character": {

"character":  "MILES",

"modifier":  "CONT'D"

},

"dialogue": [

"--I’m not! I’m trying to save you!"

]

}

}
  • TRANSITION
"content": {

"text":  "SMASH TO:",

"metadata": {

"x": 448,

"y": 720

}

}

Run tests

python -m pytest tests/

Notes

  • Do poetry install OUTSIDE of poetry shell before entering the shell and running the script.

Todos

  • Add unit tests

  • Skip to start of screenplay

  • More documentation

  • Add option to use as a library

  • detect end of screenplay

Author

👤 Egan Bisma

Show your support

Give a ⭐️ if this project helped you!