TreeLibLong

TreeLib: Balanced Binary Trees, Rank Augmented provides binary tree collections (AVL, red-black and splay) with optional rank information for order-statistic and sparse indexing applications. This package contains implementations with Int64 rank indexing.


Keywords
splay, sparse, red-black, binary, avl, order-statistic, search, tree, rank, augmented, bst
License
GPL-3.0
Install
Install-Package TreeLibLong -Version 1.1.1

Documentation

Welcome to the TreeLib repository

TreeLib provides balanced binary tree implementations with optional rank augmentation for supporting statistical queries (e.g. median or percentile) or sparse mappings. TreeLib is written in C# and targets the .NET Framework.

Using

The documentation for TreeLib is located at http://programmatom.github.io/TreeLib

Contributing

Contributions are welcome. Please contact me if you find a bug, have a feature request, or want to create a pull request.

Please do keep in mind that I want to keep the basic tree implementations as streamlined as possible. On the other hand, I am interested in exposing operations that can be done more efficiently at the level of tree nodes, rather than the collection-oriented interfaces that are exposed to consumers of the library.

There are no formal coding guidelines at this time. Just make new code look like what's already there.

Building

The TreeLib solution builds with Visual Studio 2015 Community Edition. Open the solution file located at TreeLib\Treelib.sln. There are several projects contained therein:

Project Description
BuildTool A set of Roslyn transformations that are used to generate specialized implementations of various trees (see below)
Template A non-building project containing the master (base) implementations of each category of collection
TreeLibInterface Builds the assembly containing the interfaces implemented by various tree specializations
TreeLib Builds the assembly containing "standard" specializations of trees
TreeLibArray Builds the assembly containing array-based specialization of trees
TreeLibLong Builds the assembly containing specialization of trees using long rank values instead of int
TreeLibUtil Builds the assembly containing utility classes that use the trees
TreeLibTest Builds the unit and stochastic test program

BuildTool

A core concept of the building of TreeLib is that there is only one master implementation of each category of class (AVL, Red-Black, Splay, and HugeList) which is processed to derive specializations which have been stripped of all unneeded internals. This is analogous to what one might do in C++ using selector-parameterized templates, relying on the optimizing compiler to eliminate unneeded functionality through constant propagation and dead code elimination. However, the .NET Framework's implementation of generics is much more restrictive and does not provide a workable means of achieving such specialization. Instead, a set of Roslyn transforms are used to specialize the code and eliminate unneeded portions of the implementation.

Currently, BuildTool is still in an experimental state. It gets the job done, but it is quite slow, fairly specific to the task at hand (with a number of hard-coded transformations), and not well integrated into the build. If a change is made to a master implementation file, one must manually run "Rebuild All" since MSBuild does not know about the dependencies around BuildTool.

Making Changes

The first step in adding functionality or making a change is to edit the Interfaces.cs file in the TreeLibInterface project to add relevant public interfaces that define the functionality to be added (or editing existing interfaces - but beware of breaking compatibility).

Then, modify or add to the master/base implementations in the Template project. Adding a new master/base implementation is fairly involved, since you'll need to wrestle with BuildTool and possibly extend it. Making minor changes to existing code is less hairy, but may still involve a confrontation with BuildTool.

If new specializations have been added, be sure to add the products of BuildTool in each target project's Generated subdirectory to the VS project in order to get them built.

Most important, add testing for new or changed functionality.

Testing

Testing is the most important (and most tedious) aspect of maintaining a library like TreeLib. The TreeLibTest.exe provides comprehensive unit, stochastic, and performance testing. There are three important categories of testing. The test tool has several arguments which can all be combined and are interpreted left-to-right, although not all make sense together.

Argument Description
-unit Disable the unit test phase
+unit Enable the unit test phase. Individual tests specifically disabled will remain disabled
-unit:<name> Disable a particular unit test. all may be specified for <name> to disable all tests. Use all followed by +unit:<name> to enable just one test
+unit:<name> Enable a particular unit test
--- ---
-memory Disable the memory allocation regression tests
+memory Enable the memory allocation regression tests
--- ---
-random Disable the stochastic test phase
+random Enable the stochastic test phase. Individual tests specifically disabled will remain disabled
-random:<name> Disable a particular stochastic test. all may be specified for <name> to disable all tests. Use all followed by +random:<name> to enable just one test
+random:<name> Enable a particular stochastic test
--- ---
-perf Disable performance testing (perf testing is enabled by default in the Release build and disabled in the Debug build)
+perf Enable performance testing
baseline Cause a new baseline measurement to be made. If omitted, the current run will be compared to any existing baseline and deviations (faster or slower) will be written to the output
--- ---
break:<iteration> Generate an assert and Debugger.Break() at the specified iteration. This is useful for rerunning and breaking immediately before a test failure.
seed:<number> Specify the seed to use for the random number generator in stochastic tests. This allows failures to be reproduced upon rerun, and debugged if used in conjunction with break.

Unit Tests

Each new category of specialization should be covered by a new unit test module (represented by one source file in TreeLibTest\UnitTests). Modifications to existing implementations should have tests added in existing modules.

Some testing is done by explicitly checking for expected results when invoking exposed methods. A lot of testing is done by comparison of outputs with very simple "reference" implementations that can be seen to be correct to a high degree of confidence.

Stochastic Tests

There are a set of randomized stress tests that can pound on the generated implementations indefinitely (overnight is good). The randomized tests should make an effort to test boundary cases of all exposed methods, including invalid inputs. All stochastic tests include the appropriate "reference" implementation in the battery and ensure that all implementations produce the same results.

Memory Allocation Regression Tests

The test suite includes memory allocation regression tests to detect if code changes have increased (or decreased) the number or size of memory allocations. The tests require CLR Profiler (download: CLRProfiler45Binaries) to be installed as C:\Program Files\CLRProfiler45Binaries\{bits}\CLRProfiler.exe where {bits} is 32 or 64 based on the target settings for TreeLibTest.exe. (This can be overridden with the CLR_PROFILER_PATH_32 and CLR_PROFILER_PATH_64 environment variables, which specify an alternate full path to the appropriate CLRProfiler.exe.)

Performance Tests

If major changes are being made to something, a performance test should be run to ensure regressions do not happen. Performance tests are run only on the Release build.

The first step of running a performance test is to create a baseline on the unchanged project:

TreeLibTest.exe -unit -random +perf baseline

After changes are made, rerun the performance test:

TreeLibTest.exe -unit -random +perf

The tolerances are fairly strict at this time, so there may be spurious failures reported. However, deviations should be investigated before being dismissed.

Also, I have run into problems on CPUs with thermal management, especially AMD's Turbo Core but also Intel's Turbo-Boost. I'd recommend disabling this feature during performance test runs. There can also be an issue with ambient temperature, where the warmer the weather, the slower the processor runs under load. This is particularly an issue if testing on a laptop. Try to run the baseline and the actual test at similar ambient temperatures.

Code Coverage

The project is currently around 97% covered by the unit and stochastic tests. The goal is 100% coverage.

License

TreeLib is licensed under the GNU Lesser General Public License.

Thank You

Thanks for your interest in TreeLib!