Deduplicate your uploaded file
If making sure your file uploads are never duplicated is more important than organising your files into neat folders, you might want to try this package.
Usage
You can use the storage backend on a global level by adding the following to your django settings:
DEFAULT_FILE_STORAGE = 'dedupebackend.storage.DedupedStorage'
If you want to use the other features offered by dedupebackend, you need to add
dedupebackend to INSTALLED_APPS
like this:
INSTALLED_APPS = [ 'dedupebackend', # does not matter what spot ... ]
Admin integration
Adding dedupebackend to INSTALLED_APPS
gives you an admin page where you
can check your uploaded files. I allready let you know dedupebackend just
throws verything in a large folder, but that does not mean you can not add
structure to the storage. Just not on a filesystem level. You should add
structure by adding relations to other models. It is easy enough to add
categories or something:
class FileCategory(models.Model): files = models.ManyToManyField('dedupebackend.UniqueFile') name = models.TextField()
If you want to add a filter to the dedupebackend admin, try something like this:
from dedupebackend.admin import UniqueFileAdmin from dedupebackend.models import UniqueFile admin.site.unregister(UniqueFile) class CategoryUniqueFileAdmin(UniqueFileAdmin): list_filter = UniqueFileAdmin.list_filter + ('filecategory__name',) admin.site.register(UniqueFile, CategoryUniqueFileAdmin)
that might need some work, I never tested it :p
fields
There are some fields in dedupebackend you can use instead of the django
FileField
and ImageField
. You get a picker added to that, you can use
to select a file from the existing uploaded files.
Use something like this:
from dedupebackend.fields import * class KoeHenkModel(model.Model): name = models.TextField() file = UniqueFileField("A normal file, nothing special") image = UniqueImageField("an image")
How does it work?
Well, for each uploaded file, dedupebackend creates a file on disk named after the hash of the file. Mostly the same as git does (I actually tried to use libgit2 for this, but git is bad with deletions). Next to that file, a table holds a record with some information about the file. The primary key of this table is the hash value of the file. So it is really impossible to add duplicates (but but, hash collisions).
The fields actually render a file form field on a foreign key model field. The storage backend returns the hash value as the file name. And it can return file objects when given such a hash value.