Python getdents
Iterate large directories efficiently with python.
About
python-getdents
is a simple wrapper around Linux system call getdents64
(see man getdents
for details). More details on approach.
TODO
- Verify that implementation works on platforms other than
x86_64
.
Install
pip install getdents
For development
python3 -m venv env
. env/bin/activate
pip install -e .[test]
Building Wheels
pip install cibuildwheel
cibuildwheel --platform linux --output-dir wheelhouse
Run tests
ulimit -v 33554432 && py.test tests/
Or
ulimit -v 33554432 && ./setup.py test
Usage
from getdents import getdents
for inode, type, name in getdents('/tmp', 32768):
print(name)
Advanced
import os
from getdents import *
fd = os.open('/tmp', O_GETDENTS)
for inode, type, name in getdents_raw(fd, 2**20):
print({
DT_BLK: 'blockdev',
DT_CHR: 'chardev ',
DT_DIR: 'dir ',
DT_FIFO: 'pipe ',
DT_LNK: 'symlink ',
DT_REG: 'file ',
DT_SOCK: 'socket ',
DT_UNKNOWN: 'unknown ',
}[type], {
True: 'd',
False: ' ',
}[inode == 0],
name,
)
os.close(fd)
CLI
Usage
python-getdents [-h] [-b N] [-o NAME] PATH
Options
Option | Description |
---|---|
-b N |
Buffer size (in bytes) to allocate when iterating over directory. Default is 32768, the same value used by glibc, you probably want to increase this value. Try starting with 16777216 (16 MiB). Best performance is achieved when buffer size rounds to size of the file system block. |
--buffer-size N |
|
-o NAME |
Output format:
|
--output-format NAME |
Exit codes
- 3 - Requested buffer is too large
- 4 -
PATH
not found. - 5 -
PATH
is not a directory. - 6 - Not enough permissions to read contents of the
PATH
.
Examples
python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv