data mining tool for extract text for files


Keywords
text, extraction, mining, tool, data, pdf2txt, csv2txt, txt, doc, docx, pdf, csv, png, jpg, doc2txt, docx2txt, convert
License
BSD-3-Clause
Install
pip install aTXT==1.0.4

Documentation

aTXT

Extract the text from files. Text mining tool.

PyPI Package latest release Code Quality Status Scrtinizer Status PyPI Package monthly downloads GitHub issues for python-atxt

Usage

You can use aTXT with his name package or with 2txt in your console. For example, if you want to show the help usage:

aTXT -h
2txt -h

Also, you can run the graphical interface (with PySide):

aTXT -i

You should something like this:

GUI

Note: aTXT will always generate a FILE for each file path.

Examples: :

$ 2txt prueba.html
$ 2txt prueba.html -o
$ 2txt --file ~/Documents/prueba.html
$ 2txt --file ~/Documents/prueba.html --to ~/htmls

Searching all textable files in a level-2 of depth over ~: :

$ 2txt ~ -d 2
$ 2txt --path ~ -d 2 --format 'txt,html'

Installation

pip install atxt

Check dependencies for avoiding surprises:

aTXT --check

Requirements

This software is available thanks to others open sources projects. The following list itemizes some of those more hard to install:

  • PySide (GUI lib)
  • Tessaract OCR
  • Xpdf
  • scandir (trasversal folders fast)

Meta

  • Author: Jonathan S. Prieto C.
  • Email: prieto.jona@gmail.com
  • Notes: Have feedback? Please send me an email.
  • Free software: BSD license

Issues

Please be free to comment whatever issue or problem with the installation. http://github.com/d555/python-atxt/issues