advarchs

Data retrieval from remote archives


Keywords
archive, transfer, zip, rar, advarchs
License
Apache-2.0
Install
pip install advarchs==0.1.7

Documentation

Advarchs: Data retrieval from remote archives

PyPI Version Supported Python Versions Build Status Wheel Status

Overview

Advarchs is simple tool for retrieving data from web archives. It is especially useful if you are working with remote data stored in compressed spreadsheets or of similar format.

Getting Started

Say you need to perform some data anlytics on an excel spreadsheet that gets refreshed every month and stored in RAR format. You can target a that file and convert it to a pandas dataframe with the following procedure:

import pd
import os
import tempfile
from advarchs import webfilename,extract_web_archive

TEMP_DIR = tempfile.gettempdir()

url = "http://www.site.com/archive.rar"
arch_file_name = webfilename(url)
arch_path = os.path.join(TEMP_DIR, arch_file_name)
xlsx_files = extract_web_archive(url, arch_path, ffilter=['xlsx'])
for xlsx_f in xlsx_files:
    xlsx = pd.ExcelFile(xlsx_f)

...

Requirements

  • Python 3.5+
  • p7zip

Special note

On CentOS and Ubuntu <= 16.04, the following packages are needed:

  • unrar

Installation

pip install advarchs

Contributing

See CONTRIBUTING

Code of Conduct

This project adheres to the Contributor Covenant 1.2. By participating, you are advised to adhere to this Code of Conduct in all your interactions with this project.

License

Apache-2.0