filezcache

FilezCache


License
Apache-2.0

Documentation

Filezcache - cache s3 and other files on disk

This is a file caching system optimized for use with cowmachine but then also be used with other services.

It has some unique properties:

  • Lookups return file references (if possible) which can be used with sendfile
  • Cached files can be streamed to requestors while they are being filled
  • On startup the cache is repopulated with existing files
  • Lookups return data formats directly useable for cowmachine serving

The system uses the file system to store files and a disk log. The files are stored in priv/data/ and the disk log in priv/journal.

The disk log is used to rebuild the cache after a start. All files are checked against the checksum from the disk log, non-matching files are deleted from the cache.

Example

$ erl -pa ebin
Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:4:4] [async-threads:0] [kernel-poll:false]
Eshell V5.9.3.1  (abort with ^G)
1> application:start(crypto).
ok
2> application:start(filezcache).
ok
3> filezcache:lookup(mykey).
{error, enoent}
4> filezcache:insert(mykey, <<"foobar">>).
{ok,<0.173.0>}.
5> filezcache:lookup(mykey).
{ok,{file,6,"priv/data/4J0I2F06043V5P0V603D4O4I6L1J5M1B4I5Y2B2C606W28131H164Z421M4X6221"}}
6> filezcache:insert(mykey, <<>>).
{error,{already_started,<0.173.0>}}
7> filezcache:delete(mykey).
ok

Configuration keys

  • max_bytes Maximum size of the cache in bytes, defaults to 10GiB (10737418240 bytes)
  • journal_dir Directory for the disk log, defaults to priv/journal
  • data_dir Directory for the cached files, defaults to priv/data

TODO

There are some known issues that need to be resolved:

  • On startup delete files that are unknown to the disk log
  • Add timeouts to filezcache_entry states wait_for_data and streaming
  • Extra intelligence in filezcache_entry to prevent evicting active entries during garbage collection