com.cldellow:gzip

Uncompress GZIP streams and expose member offsets.


Keywords
compression, gzip, warc
Licenses
GPL-3.0/libpng-2.0

Documentation

DEPRECATED

Don't use this library!

It relied on accessing internals of java.util.zip.Inflater, which have changed in JDK 11.

Instead, use GzipCompressorInputStream from Apache compress-commons. An example of how to use it is at https://github.com/cldellow/warc-service/blob/e95f8f5906c39efeb781a47b343a7cec179af7e3/src/main/scala/com/cldellow/warc/framework/WarcHandler.scala#L62

gzip

Build Status codecov Maven Central

Emit offsets of nested GZIP streams.

GZIP has the interesting property that a sequence of concatenated GZIP streams can be read as though it were a single GZIP stream.

The Web Archive (WARC) format takes advantage of this to store tens of thousands of GZIP streams in a single file. When processing such a file, it can be useful to know the start of the underlying stream. The stock java.util.zip.GZIPInputStream class does not expose this.

This library patches that class expose a callback which gets invoked with the offsets of the member streams.

Usage

int[] offsets = new int[100];

GZIPInputStream gzis = new GZIPInputStream(is, (member, offset) -> { offsets[member] = offset; });

License

This library is a fork of java.util.zip.GZIPInputStream as implemented by Oracle.

This library's contents are subject to the GPL "Classpath" exception. You may link it into an executable without that executable itself having to be licensed under the GPL.