pyhdfs-client

A py4j based hdfs client for python for native hdfs CLI performance.


Keywords
pyhdfs_client
License
MIT
Install
pip install pyhdfs-client==0.1.2

Documentation

pyhdfs-client : Powerful HDFS Client for python

https://pypi.python.org/pypi/pyhdfs_client

Why it's fast powerful?

Native hdfs client offers much better performance than webhdfs clients. However calling native client for hadoop operations have an additional overhead of starting jvm. pyhdfs-client brings the performance of native hdfs client without any overhead of starting jvm on every command execution.

Features

  • HDFS client for python
  • Easy to integrate with python applications
  • Better Performance than webhdfs clients
  • Provide native hadoop client performance without any overhead
  • Support both UNIX and Windows

Whats new in 0.1.3?

  • Multiple instances of HDFS client enabled.
  • [fix] Temporary folder deletion
  • [fix] Java process shutdown issues on UNIX

Installation

pip install pyhdfs-client

Requirements: hadoop binaries and py4j installed

Sample Usage

>>> from pyhdfs_client.pyhdfs_client import HDFSClient
>>> hdfs_client = HDFSClient()
>>> ret, out, err = hdfs_client.run(['-ls', '/'])
>>> print(out)
Found 1 items
drwxr-xr-x   - gp supergroup          0 2021-03-21 01:10 /f1
>>> hdfs_client.stop() # to terminate hdfs client

Contribution

  • Any contribution for enhancements and bug fixes is welcome.

Credits