table-extractor

Extract normalized tables from CSVs, Excel Spreadsheets, PDFs, Word Docs, and Web Pages


Keywords
table, python, csv, doc, docx, excel, tsv, word-documents
License
Apache-2.0
Install
pip install table-extractor==1.1

Documentation

Build Status

table-extractor

Extract normalized tables from CSVs, Excel Spreadsheets, Word Docs, and Web Pages

A table is basically a list of rows. And a row is basically a list of values.

Installation

pip install table-extractor

Use

from table_extractor import extract_tables
tables = extract_tables("/tmp/top_5_movies.docx")
# [[["Name", "Rating"], ["The Shawshank Redemption", 9.2], ["The Godfather", 9.2], ["The Godfather: Part II", 9.2], ["The Dark Knight", 8.9], ["12 Angry Men", 8.9]]]

Testing

To test the package run

python3 -m unittest table_extractor.tests.test