blister

Import hook that minimizes file stats


Keywords
import, hook, file, stat, disk, python import hook scandir stat
License
MIT
Install
pip install blister==1.1.3

Documentation

Blister

An import hook for Python 2.7 that minimizes the amount of disk access. There will be no change in functionality or feature when the import hook is installed, applications run over the network have reduced startup by 45 seconds!

Applications like Autodesk Maya and Sidefx Houdini have Python deeply embedded. Studios will deploy many custom and third party modules in these environments. This can lead to over 60 entries in the sys.path, which is not a situation the native Python importer is equipped to deal with.

The software is distributed under the MIT open source license.

How To

Get the blister.py module onto your $PYTHONPATH, or use a package manager like pip install blister. You'll also want to edit or create a sitecustomize.py module with code that looks like this:

import blister
blister.install()

Results

A test script was created that imports several large dependencies. This operation creates 479 new modules into sys.modules. On Linux, using strace -e trace=file this generates 13,619 file operations. By inventing a new unit, disk operations per module, the vanilla Python interpreter runs at an abusive 28.4 DOPM.

By installing the import hook, this is reduced to 4,139 operations, giving 8.6 DOPM.

Performance

So what does this mean for actual performance? These results from several environments should give you an idea of what to expect. The environments tested below were done with a cold filesystem cache, meaning the system had a fresh reboot and had never run Python or imported any of these modules. The "hot cache" represents the time to run the script repeatedly.

Linux on network nfs drive

The performance gained for the cold cache on the network drive is a big success. In this case the cold cache time are actually faster coming over the network than on a local spinning disk.

Hot Cache Cold Cache
Vanilla Python 2.7 0.70 sec 7.89 sec
Blister import hook 0.65 sec 2.71 sec

Linux on local hard disk

It appears cutting thousands of file operations has no affect on performance. The cold cache timing likely has too much fluxuation to gather any results. This is on a traditional spinning platters drive, not SSD.

Hot Cache Cold Cache
Vanilla Python 2.7 0.43 sec 9.21 sec
Blister import hook 0.43 sec 9.97 sec

Windows on local drive

Windows does a good job with the cold cache. It seemed odd that the less file operations resulted in slightly slower times. I expect windows cold cache timings are subject to fluxuations. Blister gives a moderate performance bump for the hot cache imports.

Hot Cache Cold Cache
Vanilla Python 2.7 1.37 sec 2.15 sec
Blister import hook 0.99 sec 2.37 sec

Windows on network drive

This network drive was both twice as fast for the cold and warm cache tests on Windows.

Hot Cache Cold Cache
Vanilla Python 2.7 6.63 sec 8.92 sec
Blister import hook 3.95 sec 5.12 sec