github.com/richardlehane/siegfried/pkg/core/riffmatcher

signature-based file format identification


Keywords
code4lib, digital-preservation, format-identification, pronom
License
Apache-2.0
Install
go get github.com/richardlehane/siegfried/pkg/core/riffmatcher

Documentation

Siegfried

Siegfried is a signature-based file format identification tool, implementing:

  • the National Archives UK's PRONOM file format signatures
  • freedesktop.org's MIME-info file format signatures.

Version

1.5.0

Build Status GoDoc

Usage

Command line

sf file.ext
sf DIR

Options

sf -csv file.ext | DIR                     // Output CSV rather than YAML
sf -json file.ext | DIR                    // Output JSON rather than YAML
sf -droid file.ext | DIR                   // Output DROID CSV rather than YAML
sf -                                       // Read list of files piped to stdin
sf -nr DIR                                 // Don't scan subdirectories
sf -z file.zip | DIR                       // Decompress and scan zip, tar, gzip, warc, arc
sf -hash md5 file.ext | DIR                // Calculate md5, sha1, sha256, sha512, or crc hash
sf -sig custom.sig file.ext                // Use a custom signature file
sf -home c:\junk -sig custom.sig file.ext  // Use a custom home directory
sf -serve hostname:port                    // Server mode
sf -version                                // Display version information
sf -throttle 10ms DIR                      // Pause for duration (e.g. 1s) between file scans
sf -log [comma-sep opts] file.ext | DIR    // Log errors etc. to stderr (default) or stdout
sf -log e,w file.ext | DIR                 // Log errors and warnings to stderr
sf -log u,o file.ext | DIR                 // Log unknowns to stdout
sf -log d,s file.ext | DIR                 // Log debugging and slow messages to stderr
sf -log p,t DIR > results.yaml             // Log progress and time while redirecting results

Example

asciicast

Signature files

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.

Install

With go installed:

go get github.com/richardlehane/siegfried/cmd/sf

sf -update

Or, without go installed:

Win:

Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:

sf -update

Mac Homebrew (or Linuxbrew):

brew install mistydemeo/digipres/siegfried

Ubuntu/Debian (64 bit):

wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add -
echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update && sudo apt-get install siegfried

Recent Changes

Version 1.5.0 (14/3/2016)

  • feature: implement freedesktop.org MIME-info signatures (and the Apache Tika variant)
  • feature: implement XML matcher
  • feature: file name matcher now supports glob patterns as well as file extensions
  • default signature file now "default.sig" (was "pronom.sig")
  • changes to YAML and JSON output: "ns" (for namespace) replaces "id", and "id" replaces "puid"
  • changes to CSV output: multi-identifiers now displayed in extra columns, not extra rows

Version 1.4.5 (6/2/2016)

Version 1.4.4 (9/1/2016)

  • fix: speed regression in TIFF mis-identification patch last release
  • code quality: refactor textmatcher package
  • code quality: refactor siegreader package
  • code quality: documentation

Version 1.4.3 (19/12/2015)

Version 1.4.2 (27/11/2015)

Version 1.4.1 (6/11/2015)

  • -log replaces -debug, -slow, -unknown and -known flags (see usage above)
  • highlight empty file/stream with error and warning
  • negative text match overrides extension-only plain text match

Version 1.4.0 (31/10/2015)

  • new MIME matcher; requested by Dragan Espenschied
  • support warc continuations
  • add all.json and tiff.json sets
  • minor speed-up
  • report less redundant basis information
  • report error on empty file/stream

Full change history

Rights

Copyright 2016 Richard Lehane

Licensed under the Apache License, Version 2.0

Contributing

Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.

Thanks

Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm

Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!

Thanks Misty for the brew and ubuntu packaging