avalares

Automatic text data extraction tool


Keywords
avalares
License
MIT
Install
pip install avalares==0.1.4

Documentation

Avalares

Automatic text data extraction tool

Tool to automatically extract a pandas dataframe or numpy array out of a string of text.

Often times we want to quickly extract data from some dataset like this, but the dataset isn't in a standard form. It's very clear to a human what parts of the text represent the data, and we could easily copy and paste part of the text, writing some simple parser by splitting on a delimiter, etc., but this project aims to automate that process.

Usage

Via command line:

url=https://www1.udel.edu/htr/Statistics/Data/smokingcancer.txt
curl $url | avalares
curl $url | avalares -t json | jq '.[] | .LEUK'  # Gets LEUK row from data
curl $url | avalares -t json | jq '[.[] | .LUNG] | add / length'  # Gets average of LUNG row
curl $url > data.txt
avalares data.txt -o data.csv  # .csv, .json, .pkl, or .npy

Via python API:

from avalares import to_pandas, to_numpy, parse
df = to_pandas('https://www1.udel.edu/htr/Statistics/Data/smokingcancer.txt')
df = to_pandas('data.txt')
data = to_numpy('1 2 3;4 5 6')

result = parse('Letter Number;a 1;b 2;c 3;d 4')
print(result.labels)  # ['Letter', 'Number']
print(result.types)  # ['string', 'int']
print(result.rows)  # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]

Installation

Install via pip3:

pip3 install --user avalares