AzFS is to provide convenient Python read/write functions for Azure Storage Account.


Keywords
Azure, StorageAccount, Blob, DataLake, Queue
License
MIT
Install
pip install azfs==0.2.14

Documentation

AzFS

pytest codecov Language grade: Python Documentation Status

PythonVersion PiPY Downloads

AzFS is to provide convenient Python read/write functions for Azure Storage Account.

AzFS can

  • list files in blob (also with wildcard *),
  • check if file exists,
  • read csv as pd.DataFrame, and json as dict from blob,
  • write pd.DataFrame as csv, and dict as json to blob.

install

$ pip install azfs

usage

For Blob Storage.

import azfs
from azure.identity import DefaultAzureCredential
import pandas as pd

# credential is not required if your environment is on AAD(Azure Active Directory)
azc = azfs.AzFileClient()

# credential is required if your environment is not on AAD
credential = "[your storage account credential]"
# or
credential = DefaultAzureCredential()
azc = azfs.AzFileClient(credential=credential)

# connection_string is also supported
connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxx;EndpointSuffix=core.windows.net"
azc = azfs.AzFileClient(connection_string=connection_string)

# data paths
csv_path = "https://testazfs.blob.core.windows.net/test_caontainer/test_file.csv"

# read csv as pd.DataFrame
df = azc.read_csv(csv_path, index_col=0)
# or
with azc:
    df = pd.read_csv_az(csv_path, header=None)


# write csv
azc.write_csv(path=csv_path, df=df)
# or
with azc:
    df.to_csv_az(path=csv_path, index=False)

# you can read multiple files
csv_pattern_path = "https://testazfs.blob.core.windows.net/test_caontainer/*.csv" 
df = azc.read().csv(csv_pattern_path)

# to apply additional filter or another process
df = azc.read().apply(function=lambda x: x[x['id'] == 'AAA']).csv(csv_pattern_path)

# in addition, you can use multiprocessing
df = azc.read(use_mp=True).apply(function=lambda x: x[x['id'] == 'AAA']).csv(csv_pattern_path)

For Queue Storage

import azfs
queue_url = "https://{storage_account}.queue.core.windows.net/{queue_name}"

azc = azfs.AzFileClient()
queue_message = azc.get(queue_url)
# message will not be deleted if `delete=False`
# queue_message = azc.get(queue_url, delete=False)

# get message content
queue_content = queue_message.get('content')

For Table Storage

import azfs
cons = {
    "account_name": "{storage_account_name}",
    "account_key": "{credential}",
    "database_name": "{database_name}"
}

table_client = azfs.TableStorageWrapper(**cons)

# put data, according to the keyword you put
table_client.put(id_="1", message="hello_world")

# get data
table_client.get(id_="1")

check more details in Documentation Status

types of authorization

Supported authentication types are

types of storage account kind

The table below shows if AzFS provides read/write functions for the storage.

account kind Blob Data Lake Queue File Table
StorageV2 O O O X O
StorageV1 O O O X O
BlobStorage O - - - -
  • O: provides basic functions
  • X: not provides
  • -: storage type unavailable

dependencies

pandas
azure-identity >= "1.3.1"
azure-storage-blob >= "12.3.0"
azure-storage-file-datalake >= "12.0.0"
azure-storage-queue >= "12.1.1"
azure-cosmosdb-table

references