This code was derived from T-SNE-Java
See LICENSE.md for further information!
<dependency> <groupId>de.javagl</groupId> <artifactId>bh-tsne</artifactId> <version>0.0.1</version> </dependency>
The project consists of two modules:
bh-tsne: The actual library offering the t-SNE functionality. It has the
ejml(Efficient Java Matrix Library) as its only dependency
bh-tsne-demo: A small application that allows running the t-SNE on different test data sets, and visualizing the results:
Changes done to the original code:
Replaced the use of
Randominstance that is initialized with a fixed (but modifiable) random seed. Since t-SNE is often used for scientific publications, having the possibility to create the same result repeatedly is crucial for reproducability. I also used this to verify (empirically) that all subsequent changes did not affect the validity of the results.
Collected all classes that are required for the
BHTSneinto a single package - and only these classes!
privatewhat could be
private. Made default-visible what could be default-visible. Made
staticwhat could be
static. Removed all code that was then no longer used (which was most of it)
Removed unnecessary dependencies (for example, JAMA was used for a single line of code that was not called at all). Specifcially: Removed all dependencies except for the one to EJML, which has been replaced with the proper one in its latest available version.
System.out.print...calls by some (pragmatic) logging and progress report
Added a trivial parallelization that brought roughly 40% speedup by changing a few lines of code (and seems to be faster than the oddly complicated
ParallelBHTsneof the original implementation...)
Allowed the computation to be interrupted by calling
interrupt()on the executing thread
Significantly reduced the number of memory allocations, mainly by introducing the
Offered the whole functionality in a single public class, with some basic JavaDoc
Offered the library in Maven Central