django-super-deduper

Utilities for deduping Django model instances


Keywords
dedupe, django, python
License
MIT
Install
pip install django-super-deduper==0.1.4

Documentation

Django Super Deduper

Build status codecov Python version

A collection of classes and utilities to aid in de-duping Django model instances.

Requirements

  • Python 3.6
  • Django 1.11

Install

pip install django-super-deduper

Usage

Merging Duplicate Instances

By default any empty values on the primary object will take the value from the duplicates. Additionally, any related one-to-one, one-to-many, and many-to-many related objects will be updated to reference the primary object.

> from django_super_deduper.merge import MergedModelInstance
> primary_object = Model.objects.create(attr_A=None, attr_B='')
> alias_object_1 = Model.objects.create(attr_A=X)
> alias_object_2 = Model.objects.create(attr_B=Y)
> merged_object = MergedModelInstance.create(primary_object, [alias_object_1, alias_object_2])
> merged_object.attr_A
X
> merged_object.attr_B
Y

Improvements

  • Support multiple merging strategies
  • Recursive merging of related one-to-one objects

Logging

This package does have some rudimentary logging for debugging purposes. Add this snippet to your Django logging settings to enable it:

LOGGING = {
    'loggers': {
        'django_super_deduper': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
    },
}

References

Releasing

Pre-reqs:

pip install pypandoc twine
brew install pandoc
  1. Draft a new release and create new tag in Github
  2. Run python3 setup.py sdist bdist_wheel on master
  3. Upload to pypi python -m twine upload dist/*