camSort 0.4.1!
Working on python 3.7.3
! (at least I think.)
camSort
is a Python library that makes sorted() look slow while also using sorted(). It's built on Cython, and provides preallocation to string sorting by using a custom key calculation for each string. Normal key calculation and object creation are often computationally expensive—but here we optimize that with Cython >:)!
Overview
The camSort
library sorts a list of strings by creating a custom object for each string. This custom object, StringWithKey
, holds the string and a unique key calculated from it. The key is a long integer derived from the sum of Unicode code points of the characters in the string, and the string's length.
The actual sorting operation uses Python's built-in sorted
function (based on Timsort), however, now with precalculated keys, rather than comparing the strings themselves. Making sorting quite faster.
The primary benefit of using camSort
is realized when working with very large lists of strings, see: Performance.
Installation
pip install camSort
, or build locally :)
Usage
from camSort import sortStrings
myList = ['your', 'list', 'of', 'strings', ':)!']
# By length
sortedStrings = sortStrings.byLength(myList)
# By alphabetically
sortedStringsCam = stringSort.byAlphabet(myList)
# By sorting alphabetically and length
sortedStringsCam = stringSort.byLengthAndAlphabet(myList)
# Reversing! (This is the exact same as Python's)
# Please read the demo.py file for an explaination.
sortedStringsCam = stringSort.byLength(myList).reverse()
# Returns a list of strings that contain a given substring/key, this is list comprehension.
# Speed increase will be looked at in future.
filteredStrings = sortedStringsCam.filterWithSubstring('hello'))
Performance
Try out the demo.py file!
Average output on my 2017 macbook pro: (1 million strings, with a random legnth of 1 to 1000 chararacters)
Time taken by Python sorted(): 52.74097275733948 seconds
Using camSort!
Time taken sorting by length: 2.0430028438568115 seconds
By sorting alphabetically: 3.7439329624176025 seconds
By sorting alphabetically and length: 2.3696372509002686 seconds
Python reverse: 2.4335076808929443 seconds
Camsort reverse: 2.4054980278015137 seconds
Using python list comprehension: 0.40741705894470215 seconds
Using filterWithSubstring: 0.5404400825500488 seconds
Results pre 0.4.
Python reverse: 5.526482105255127 seconds
Camsort reverse: 5.083630084991455 seconds
Using Python's list comprehension: 0.4732799530029297 seconds
Using filterWithSubstring: 0.38806891441345215 seconds