Science's Artifact Antiformat


Licenses
MIT/Apache-2.0
Install
pip install reliquery==0.3.2

Documentation

Reliquery

GitHub release (latest by date including pre-releases) example workflow

Science's Artifact Antiformat

An anti-format storage tool aimed towards supporting scientists. Giving them the ability to store data how they want and where they want. Simplifying the storage of research materials making them easy to find and easy to share.

Table of Contents

  1. Production
  2. Development
    1. Local install
  3. Example
  4. HTML
  5. Images
  6. JSON
  7. Pandas DataFrame
  8. Files
  9. Jupyter Notebooks
  10. Query Relics
  11. Config
  12. File Storage
  13. S3 Storage
  14. License

For production

latest version 0.2.6

pip install reliquery

For development

Local Install

cd reliquery
pip install -e .

Quick Example Usage

from reliquery import Relic
import numpy as np
from IPython.display import HTML, Image
 
r = Relic(name="quick", relic_type="tutorial")
ones_array = np.ones((10, 10))
r.add_array("ones", ones_array)
np.testing.assert_array_equal(r.get_array("ones"), ones_array)
r.add_text("long_form", "Some long form text. This is something we can do NLP on later")
r.add_tag({"pass": "yes"})
r.add_json("json", {"One":1, "Two": 2, "Three": 3})
print(r.describe())

HTML supported

Add HTML as a string:

# Example
r.add_html_string("welcome", "<div><p>Hello, World</p></div>")

Add HTML from a file path:

# Example
r.add_html_from_path("figures", <path to html file>)

Get and display HTML using Reliquery:

# Read only S3 demo
r_demo = Relic(name="intro", relic_type="tutorial", storage_name="demo")
print(r_demo.list_html())
display(HTML(r_demo.get_html('nnmf2 resnet34.html')))

Images supported

Add images by passing images as bytes:

# Example
with open("image.png", "rb") as f:
    r.add_image("image-0.png", f.read())

Get and display images:

print(r_demo.list_images())
display(Image(r_demo.get_image("reliquery").read()))

JSON supported

Add json by passing it in as a dictionary:

# Example
r.add_json("json", {"First": 1, "Second": 2, "Third":3})

List json

r.list_json()

Get json by taking the name and returning the dictionary

r.get_json("json")

Pandas DataFrame

Note that json is used to serialize which comes with other caveats that can be found here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.to_json.html

#Example
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
df = pd.DataFrame(d)
r.add_pandasdf("pandasdf", df)

List pandasdf
r.list_pandasdf()

Get pandas dataframe by taking the name 
r.get_pandasdf("pandasdf")

Files

#Example
r.add_files_from_path("TestFileName", test_file_path)

List files
r.list_files()

Get file 
r.get_file("TestFileName")

Save file 
r.save_files_to_path("TestFile", path_to_save)

Jupyter Notebooks

#Example
test_notebook = os.path.join(os.path.dirname(__file__), "notebook_test.ipynb")
r.add_notebook_from_path("TestNotebook", test_notebook)

List Notebooks
notebook_list = r.list_notebooks()

Get Notebook
r.get_notebook("TestNotebook")

Save Notebook to path
path_to_save = os.path.join(tmp_path, "testnotebook.ipynb")
r.save_notebook_to_path("TestNotebook", path_to_save)

View Notebooka via HTML
r.get_notebook_html(TestNotebook)

Query Relics

from reliquery import Reliquery

rel = Reliquery()

relics = rel.get_relics_by_tag("pass", "yes")

relics[0].describe()

Config

A json text file named config located in ~/reliquery
Default looks like...

{
  "default": {
    "storage": {
      "type": "File",
      "args": {
        "root": "/home/user/reliquery"
      }
    }
  },
  "demo": {
    "storage": {
      "type": "S3",
      "args": {
        "s3_signed": false,
        "s3_bucket": "reliquery",
        "prefix": "relics"
      }
    }
  }
}

File Storage

With this configuration, the relic will be persisted to:
/home/user/reliquery/relic_type/relic_name/data_type/data_name
In the quick example that will be:
/home/user/reliquery/reliquery/basic/relic_tutorial/arrays/ones

S3 Storage

s3_signed

  • true = uses current aws_cli configuration
  • false = uses the anonymous IAM role

License

Reliquery is free and open source! All code in this repository is dual-licensed under either:

at your option. This means you can select the license you prefer.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.