hollerith-codec

Binary encoding for Hollerith that provides a total ordering for primitive datatypes and lists of those.


Keywords
binary, sort, collation, serialization, leveldb, indexeddb
License
MIT
Install
npm install hollerith-codec@1.0.22

Documentation

Table of Contents generated with DocToc

Table of Contents generated with DocToc

hollerith-codec

Binary encoding for Hollerith that provides a total ordering for primitive datatypes and lists of those. Used by Hollerith and KWIC.

Breaking Change in v2.x: Prior to v2.x, when comparing two lists a, b where a is a prefix of b used to be sorted with a first, then b, i.o.w. a shorter word always precedes a longer word that starts out the same (as in an alphabetically sorted dictionary).

Starting with v2, however, this has been changed such that when comparing two lists a, b where a is a prefix of b, b will be sorted before a if the first 'extra' element of b (i.e. b[ a.length ]) is a negative number. By way of example, sorting now looks like this (with hexadecimal byte values):

[ -10,               ] ... 4b bfdbffffffffffff  4c
[  -1,               ] ... 4b c00fffffffffffff  4c
[                    ] ... 4c
[   0,               ] ... 4d 0000000000000000  4c
[   1,   -1,         ] ... 4d 3ff0000000000000  4b c00fffffffffffff  4c
[   1,               ] ... 4d 3ff0000000000000  4c
[   1,    0,         ] ... 4d 3ff0000000000000  4d 0000000000000000  4c
[   1,    1,         ] ... 4d 3ff0000000000000  4d 3ff0000000000000  4c
[  10,   -2,         ] ... 4d 4024000000000000  4b bfffffffffffffff  4c
[  10,   -2,   3,    ] ... 4d 4024000000000000  4b bfffffffffffffff  4d 4008000000000000 4c
[  10,   -1,         ] ... 4d 4024000000000000  4b c00fffffffffffff  4c
[  10,   -1,   3,    ] ... 4d 4024000000000000  4b c00fffffffffffff  4d 4008000000000000 4c
[  10,               ] ... 4d 4024000000000000  4c
[  10,    0,         ] ... 4d 4024000000000000  4d 0000000000000000  4c
[  10,    0,    3,   ] ... 4d 4024000000000000  4d 0000000000000000  4d 4008000000000000 4c
[  10,    1,         ] ... 4d 4024000000000000  4d 3ff0000000000000  4c
[  10,    1,    3,   ] ... 4d 4024000000000000  4d 3ff0000000000000  4d 4008000000000000 4c
[  10,    2,         ] ... 4d 4024000000000000  4d 4000000000000000  4c
[  10,    2,    3,   ] ... 4d 4024000000000000  4d 4000000000000000  4d 4008000000000000 4c
[  10,    4,   -2,   ] ... 4d 4024000000000000  4d 4010000000000000  4b bfffffffffffffff 4c
[  10,    4,   -1,   ] ... 4d 4024000000000000  4d 4010000000000000  4b c00fffffffffffff 4c
[  10,    4,         ] ... 4d 4024000000000000  4d 4010000000000000  4c
[  10,    4,    0,   ] ... 4d 4024000000000000  4d 4010000000000000  4d 0000000000000000 4c
[  10,    4,    1,   ] ... 4d 4024000000000000  4d 4010000000000000  4d 3ff0000000000000 4c
[  10,    4,    2,   ] ... 4d 4024000000000000  4d 4010000000000000  4d 4000000000000000 4c

Benchmarks

Sample run:
hollerith_tng      0.235 s  300,000 items       1,274,500⏶Hz             785⏷nspc
hollerith_bcd      2.543 s  300,000 items         117,963⏶Hz           8,477⏷nspc
hollerith_classic  7.113 s  300,000 items          42,176⏶Hz          23,710⏷nspc

Averages:
00:56 HENGIST/BENCHMARKS  ▶  hollerith_tng      1,248,369 Hz ≙ 1 ÷ 1.0       100.0 % │████████████▌│
00:56 HENGIST/BENCHMARKS  ▶  hollerith_bcd        118,161 Hz ≙ 1 ÷ 10.6        9.5 % │█▏           │
00:56 HENGIST/BENCHMARKS  ▶  hollerith_classic     42,056 Hz ≙ 1 ÷ 29.7        3.4 % │▍            │