SortPerf.jl: Module to test the performance of sorting algorithms
The purpose of this module is to test the performance of the different sort (and related) algorithms in Julia. See https://github.com/kmsquire/SortPerf.jl/raw/master/sortperf.pdf for an example output from Version 0.3.0-prerelease+125.
Run with:
std_sort_tests(;sort_algs=SortPerf.sort_algs, # [InsertionSort, HeapSort, MergeSort,
# QuickSort, RadixSort, TimSort]
types=SortPerf.std_types, # [Int32, Int64, Int128, Float32, Float64, String]
range=6:20, # Array size 2^6 through 2^20, by powers of 2
replicates=3, #
lt::Function=isless, # \
by::Function=identity, # | sort(...) options
rev::Bool=false, # |
order::Ordering=Forward, # /
save::Bool=false, # create and save timing tsv and pdf plot
prefix="sortperf") # prefix for saved files
You can also test individual algorithms with
sortperf(Algorithm(s), data, [size,] [replicates=xxx])
Some examples:
sortperf(QuickSort, Int, 10_000) # Test QuickSort on 10,000 random ints
sortperf(MergeSort, [Float32, String], 6:2:10) # Test MergeSort on 2^6, 2^8, and 2^10 float 32s and strings
sortperf([QuickSort, MergeSort, TimSort], # Test QuickSort, MergeSort, and TimSort on
[Int, Float32, Float64, String], # Arrays of Int, Float32, Float64, and String
6:20; # ranging from 2^6 elements to 2^20 elements, by
replicates=5) # powers of 2, and run each test 5 times
Ordering parameters accepted by sort!(...) will be passed through.
Sorting Tests
The actual tests run include sorting arrays with the following characteristics:
- random
- sorted
- reversed
- sorted, but with 3 random exchanges
- sorted, with 10 random values appended
- 4 unique values
- all equal
- quicksort median killer: first half descending, second half ascending
The tests were inspired by similar tests used by sortperf in Python. See http://svn.python.org/projects/python/trunk/Objects/listsort.txt for more details.
Suggestions based on basic tests
Here is a table and some notes on the Julia implementations of the
various algorithms. The table indicates the recommended sort
algorithm for the given size (small < ~2^12 (=8,192) items < large
)
and type (string, floating point, or integer) of data.
- Random means that the data is permuted randomly.
- Structured here means that the data contains partially sorted runs (such as when adding random data to an already sorted array).
- Few unique indicates that the data only contains a few unique values.
(Un)stable (small) | Stable (small) | (Un)stable (large) | Stable (large) | In-place (large) | |
---|---|---|---|---|---|
Strings | |||||
- Random | M | M | M | M | Q |
- Structured | M | M | T | T | Q |
- Few Unique | Q | M | Q | M | Q |
Float64 | |||||
- Random | Q | M | R | R | Q |
- Structured | M | M | T | T | Q |
- Few Unique | Q | M | Q | R | Q |
Int64 | |||||
- Random | Q | M | R | R | Q |
- Structured | Q | M | uT | R/T | Q |
- Few Unique | Q | M | R | R | Q |
Key:
Symbol | Algorithm |
---|---|
H | HeapSort |
I | InsertionSort |
M | MergeSort |
Q | QuickSort |
T | TimSort |
uT | TimSortUnstable |
R | RadixSort |
Current Recommendations
Except for pathological cases, small arrays are sorted best with
QuickSort
(unstable) orMergeSort
` (stable)When sorting large arrays with sections of already-sorted data, use
TimSort
. The only structured case it does not handle well is reverse-sorted data with large numbers of repeat elements. An unstable version ofTimSort
(to be contributed to Julia soon) will handle this caseFor numerical data (Ints or Floats) without structure,
RadixSort
is the best choice, except for 1) 128-bit values, or 2) 64-bit integers which span the full range of values.When memory is tight,
QuickSort
is the best in-place algorithm. If there is concern about pathological cases, useHeapSort
. All stable algorithms use additional memory, butTimSort
is (probably) the most frugal.Composite types may behave differently. If sorting is important to your application, you should test the different algorithms on your own data. This package facilitates that.