anonymized-fraud-detection

A small package to parse and train an ML model for anonymized credit card transactions. Refer to github wikis for more details. Package was built for PythonVirtualenvOperator() on GCP Airflow.


Keywords
airflow, airflow-docker, data-wrangling, google-cloud-functions, google-cloud-platform, google-cloud-storage, machine-learning, pandas, python, python-package, random-forest, sklearn
License
MIT
Install
pip install anonymized-fraud-detection==0.1.3

Documentation

Credit Card Fraud Detection

  1. Came across this mocked-up dataset of customer transactions at [Capital One Recruitment Challenge](https://github.com/CapitalOneRecruiting/DS).
  2. The unbalanced dataset is comprised of artificial customer transactions with a few outlier cases where fraud was detected. There's only ~1.6% fraudulent cases.
  3. Our primary goal is to successfully predict whether a transaction is Fraudulent or not, and avoid Type-II errors as much as possible as in most sensitive classification problems: we'll try not to point accusatory-fingers at genuine-transactions 🙂.
  4. The secondary goal is to identify interesting anomalies in the transactions like multi-swipes, reversal of suspicious transactions, etc. by performing exploratory-data-analysis.
  5. Most numerical-fields seem to follow Power-law distributions rather than Gaussian distributions.
  6. We'll engineer some time-dependent categorical features by parsing the datetime fields, exclude the fields which have just one categorical value (makes no sense keeping these around 🙁), and also create a new feature to indicate if credit-card-CVV is wrongly entered.
  7. Baseline classifiers chosen are Logistic Regression, Decision Tree, and Isolation Forest.
  8. Performance is kinda poor on these Baseline classifiers: precision, and recall vary greatly across the models. Logistic Regression just predicts the majority class (ie- transaction is OK).
  9. Final Random-Forest achieves a Recall-score of approximately 99.99% indicating that False-Negatives are absolutely minimal.