CSV-anamoly-detector

A Python tool to detect Anamolies


Keywords
detect, Anamolies, in, CSV, files
License
MIT
Install
pip install CSV-anamoly-detector==1.2.6

Documentation

CSV-anomaly-detector

A tool to detect anomalies in CSV files (especially large files)

Description

This tool is handy if you are working with a large csv file wherein scanning each line for anomalies is a daunting task. Even if the file is received from a reliable source it is always safe to verify the veracity of the file before proceeding further.

Each column has a title, all of which will be mentioned in the very first line of any csv file which we shall refer as "HEADER" throughout this page.

The tool takes a header-wise scanning approach.After scanning each Header, the dominant datatype is identified and any another datatype is assumed ("we are not concluding because the final decision rests with the user") to be defective.

Command line execution

Following commands are available in the tool :

columns --> prints the headers of the csv file.
count --> gives the total number of rows in the csv file.
executeColumns --> scan the particular (mentioned) column to find out bugs.
execute --> scan the whole file (i.e all columns) to spot bugs.
sample --> prints the first 10 rows of the csv file.
sampleHeader--> prints the first 10 rows, but only that of the (mentioned) header.

Sample command prompt execution for each of the above commands

python AnomalyDetector.py columns --filename=mock.csv
python AnomalyDetector.py count --filename=mock.csv
python AnomalyDetector.py executeColumns --filename=mock.csv --columns=email
python AnomalyDetector.py execute --filename=mock.csv
python AnomalyDetector.py sample --filename=mock.csv
python AnomalyDetector.py sampleHeader --column=email --filename=mock.csv

Upon completion of the scanning process (either execute/executeColumns), you will see either of these two responses:

  • This Column appears bug free.
  • PLEASE OPEN improperData.txt**
  • To view the commands available:
    python AnomalyDetector.py --help

    Please "avoid" spacing in the following areas:
    --filename = mock.csv (will throw error)
    --filename= mock.csv(will throw error)
    --filename =mock.csv(will throw error)
    --filename=mock.csv(will give result)

    All entries are case sensitive
    ** improperData.txt contains all errors. It will be created automatically when .py file is executed.

    Please ensure that the source file (AnomalyDetector.py) and the .csv file are in the same directory.

    Note:

  • If you are using compound words (more than a word ex. first name), please make sure it is enclosed inside quotes.
  • python AnomalyDetector.py executeColumns --filename=mock.csv --columns=first name (WON'T WORK)
    python AnomalyDetector.py executeColumns --filename=mock.csv --columns="first name" (WORKS LIKE A CHARM)
  • Relative addressing from the terminal also works:
  • python AnomalyDetector.py execute --filename="./Verticals/sample.csv"