pyastroschema

Standardized specifications for astronomy and astrophysics data.


Keywords
astronomy, astrophysics, avro, avro-schema, data-science, data-structures, json, json-schema
License
GPL-3.0
Install
pip install pyastroschema==0.5.4

Documentation

astroschema

This package defines a set of JSON schema relevant to astronomy and astrophysics research. The schema are meant to specify the structure of JSON files used to contain astronomical (and associated) data. The package also contains modules for the use of those schema in python, and in the future additional languages.

master: Build Status (master) Coverage Status (master)

dev: Build Status (dev) Coverage Status (dev)

Structure

  • schema/: the schema specifications themselves
    • metaschema/: the metaschema specifying the structure of each astro-schema
  • pyastroschema/: the python module for interacting with astroschema
  • tests/: directory containing sample JSON files for testing schema validation
  • astroschema.json: description of each schema included in this package.

Definitions and Terminology

  • A struct is an astroschema data structure that has a schema specification. For example source is a particular astroschema struct, that has a particular schema specifying its structure.

  • An entry is data in the form of a struct, i.e. an instance of a struct filled with data.

  • unique vs. distinguishing

    • A unique attribute is one that uniquely identifies what it is referencing, one-to-one. If two things have different unique attributes they are different. If they have the same unique attributes, they are the same.
      • e.g. bibcode is unique, these Sources are the same:
        • {"name": "Open Supernova Catalog", "bibcode": "2017ApJ...835...64G", "alias": 0}
        • {"name": "Guillochon+2017", "bibcode": "2017ApJ...835...64G", "alias": 1}
    • A distinguishing attribute is one that characterizes what it is referencing, not one-to-one. If two things have different distinguishing attributes, they are not necessarily different. If they have the same distinguishing attributes, they are not necessarily the same.
      • e.g. bibcode is unique, these Sources are the same:
        • {"name": "Open Supernova Catalog", "bibcode": "2017ApJ...835...64G", "alias": 0}
        • {"name": "Guillochon+2017", "bibcode": "2017ApJ...835...64G", "alias": 1}

Change Log

Current

  • Modified numerous schema to remove astrocats specific properties: [photometry, quantity, source, spectrum].

  • pyastroschema/

    • [1] Using the defs.json file now, and relative paths in schema references, requires validators to use jsonschema.RefResolver objects with the base path. To do this, when creating struct.SchemaDict instances, the schema specification should be the absolute file-path. The method utils.load_schema_dict now returns the path to the schema also. The methods utils.get_schema_odict and utils.get_list_of_schema have been deprecated (commented out for now), to simplify what types of arguments are acceptable.

    • [2] Code modified to be python2 and python3 compatible.

    • __init__.py

      • copy_schema_files() [NEW-FUNCTION]
        • Copy all, or a single, schema file to the given target directory.
    • schema.py

      • JSONOrderedDict
        • Add hooks to sort before dump and dumps commands by passing sorting function.
      • SchemaDict
        • No longer accepts a list of schema as argument. Schema must be combined using either extend or update methods.
        • Simplified initialization to limit acceptable arguments (see [1]).
        • Store the schema path and a constructed jsonschema.RefResolver when possible (see [1]).
        • extend()
          • Set the check_conflict parameter to True by default.
        • update() [NEW-FUNCTION]
          • Added wrapper around JSONOrderedDict.update() to first convert argument to SchemaDict.
    • utils.py

      • warn_with_traceback() [NEW-FUNCTION]
        • Modify the warnings module to provide tracebacks
      • get_schema_odict() [REMOVED]
        • See [1]
      • get_list_of_schema() [REMOVED]
        • See [1]
      • index_entry_for_schema() [NEW-FUNCTION]
        • Retrieve the index entry (dict) for the target schema.
      • path_for_schema_file() [NEW-FUNCTION]
        • Retrieve the full-path for the target schema.
    • validation.py

      • PAS_Validator()
        • Pass kwargs along so that a resolver can be added to the validator.
  • schema/

    • Restructure schema to reference new defs.json file. Added 'id' attributes with each files name so that both relative and internal references will work; this is likely a bug in the python jsonschema package.
    • entry.json
      • Removed astrocats specific fields.
    • defs.json [NEW-FILE]
      • New file specifically for schema definitions, references from other schema files.
  • astroschema_index.json

    • Updated to include new defs.json.
  • LICENSE

    • Changed from MIT to GNU
  • MANIFEST.in, requirements.in, requirements.txt, setup.py, tox.ini

    • Added package material for distribution.

v0.5.0 - 2018-08-02

  • Add new 'format' schema specifications including 'numeric' and 'astrotime'.

  • New SchemaDict class that stores schema specifications in Struct classes. Provides internal validation method.

    • NOTE: SchemaDict has not been integrated into the Key class yet, but it is stored to each Keychain.
  • Struct subclasses have been upgraded to use protected class-attributes (i.e. shared) to store schema information. A wrapper (struct.set_struct_schema()) and class factor method (struct.Struct.construct()) have been added to provide a customization API for derived classes.

  • pyastroschema/

    • tests/

      • test_schemadict.py [NEW-FILE]
        • Basic construction unittests for the new SchemaDict class.
      • test_struct.py [NEW-FILE]
        • Basic tests for Struct class, specifically making sure that subclass works as expected, and with new SchemaDict class.
      • test_validation.py [NEW-FILE]
        • Tests for new PAS_Validator() method (and customized class).
    • keys.py

      • Key
        • Changed Key instances to be immutable. Once they are created their attributes cannot be changed.
        • __repr__()
          • Cache the result of repr on initialization to save time. Depends on Key being immutable.
        • equals()
          • BUG: in comparison, built-in methods could be compared which would fail, e.g. format method of str.
    • schema.py

      • JSONOrderedDict [NEW-CLASS]
        • This wrapper around an OrderedDict class to add some json methods (e.g. loading/saving to/from strings)
        • extend() [NEW-FUNCTION]
          • Function that will add the elements from a second dict into the first, without overwriting existing parameters (like update() does).
      • SchemaDict [NEW-CLASS]
        • Subclass of JSONOrderedDict designed to contain schema. Adds validation methods. Can be initialized from numerous schema, in which case extend() is used to combine them.
    • struct.py

      • All of the derived structures (subclasses of Struct) now use the decorator instead of subclassing with Meta_Struct.
      • Struct
        • Added keychain, schema and extendable as protected property values.
        • Changed to inherits from schema.JSONOrderedDict to get the nice json-based import/export methods.
        • construct() [NEW-METHOD]
          • Factory method which uses struct.set_struct_schema to create a custom sub-class of Struct for later instantiation.
        • get_keychain() [REMOVED]
          • Deprecated in favor of keychain property.
        • to_json() [REMOVED]
          • Deprecated in favor of inherited JSONOrderedDict methods.
        • validate()
          • BUG: custom validator wasnt being used. Now calls internal SchemaDict for validation.
      • Meta_Struct [REMOVED]
        • Deprecated in favor of new subclassing procedures.
    • validation.py

      • PAS_Validator() <== Default_Validator()
        • New customized validator that not only sets defaults (as before) but also checks the "numeric" 'format' specifier.
        • Tests added for behavior.
  • schema/

    • quantity.json
      • BUG, FIX: Changed value from being numeric to being any-type. This is to accommodate 'alias' values in astrocats... not sure if this should remain or be changed.
      • BUG, FIX: Changed source from being numeric to being any-type. This is to accommodate strings like "1,3,4" currently used in astrocats. This should be fixed in the future.

v0.4.0 - 2018-07-30

  • FIX: Numerous aspects of the structure schema changed (e.g. variable names, new parameters) for consistency with astrocats. This is temporary. These should all be restored back / removed later.

  • pyastroschema

    • tests/

      • test_photometry.py [NEW-FILE]
        • Unittests for the 'photometry' schema and class.
        • Include tests for some of the complex 'dependencies' and requirements in the schema.
      • test_spectrum.py [NEW-FILE]
        • Unittests for the 'spectrum' schema and class.
        • Include tests for some of the complex dependencies and requirements in the schema.
    • __init__.py

      • PATHS
        • test_dir() [NEW-METHOD]
          • Return the directory of test json files for specific schema.
    • keys.py

      • Keychain
        • get_key_by_name() [NEW-METHOD]
          • Based on related method in astrocats.
          • Get the key in this keychain based no its name.
    • struct.py

      • Struct
        • get_keychain()
          • Allow mutable and extendable arguments to be passed through this method.
      • Photometry [NEW-CLASS]
        • New subclass of Struct with associated photometry.json schema.
      • Spectrum [NEW-CLASS]
        • New subclass of Struct with associated spectrum.json schema.
      • Entry [NEW-CLASS]
        • New subclass of Struct with associated entry.json schema.
    • utils.py

      • get_schema_odict() [NEW-FUNCTION
        • Function that will return an OrderedDict given a filename, indexed schema-name, or odict.
      • get_list_of_schema() [NEW-FUNCTION]
        • Returns a list of odict schema given one or more specified by filename, str, or odict.
  • schema/

    • photometry.json
      • Added dependencies which were coded manually into astrocats Photometry class, for example requiring frequency, band or energy when flux is included.
    • entry.json
      • FIX: temporary addition of '...PREF_KINDS' parameters for astrocats consistency.
    • key.json
      • FIX: temporary changes for astrocats compatibility.
    • spectrum.json
      • BUG: fixed some incorrect requirements logic.
      • Added more complex requirements/dependencies logic that was hardcoded into astrocats Spectrum class.

v0.3.0 - 2018-07-28

  • CONVERSION.md [NEW-FILE]

    • File for documenting conversion procedures from astrocats to astroschema.
  • README.md

    • Lots of new 'to-do' items and issues that need to be addressed.
  • schema/

    • entry.json [NEW-FILE]
      • Specifications for a catalog entry (names, sources, quantities, etc). Based on the astrocats Entry class.
    • key.json [NEW-FILE]
      • Schema specification for individual 'keys' of general astroschema schema. Used with meta-schema.json.
    • meta-schema.json <== meta-schema/astro-schema_draft-0.json
      • Schema specification that the properties of all other schema match the key.json schema.
    • photometry.json [NEW-FILE]
      • Schema specifying photometric quantities. Based on the astrocats Photometry class.
    • quantity.json [NEW-FILE]
      • Schema specifying core 'quantities' which are the data points for entries and composite data values (e.g. photometry).
    • spectrum.json [NEW-FILE]
      • Schema specifying spectrum quantities. Based on the astrocats Spectrum class.
    • source.json
      • Use both a 'unique' and 'distinguishing' attributes. A 'unique' attribute is one that uniquely defines what it is referring to (i.e. if two 'unique' attributes match, then these are referring to the same object). A 'distinguishing' attribute is one that can be used to compare two instances (based on the astrocats concept of 'comparable' values). If two 'distinguishing' values are different, then the objects are different; if they are the same, the objects may be the same.
  • pyastroschema/

    • tests/

      • test_entry.py [NEW-FILE]
        • Simplest tests on the new 'entry' schema.
      • test_key.py
        • Minor updates for changes to the Key class.
      • test_keychain.py
        • Minor updates for changes to the KeyChain class.
      • test_quantity.py [NEW-FILE]
        • Basic testing of new 'quantity' schema.
      • test_source.py
        • Minor updates for changes to from Source standalone class to Source(Struct) subclass.
    • keys.py

      • Key
        • Use json validation instead of manual checking (e.g. of requirements).
        • validate() [NEW-METHOD]
          • Load a custom validator that not only validates but sets default values. See validation.py.
        • equals() [NEW-METHOD]
          • Compare two keys each-other (analogous to astrocats is_duplicate_of methods). Has optional identical argument to determine precision of comparison.
    • schema.py [NEW-FILE]

      • Beginning of class to represent schema themselves.
      • NOTE: non-operational.
    • struct.py <== source.py

      • What was previously the Source class has been generalized into the Struct class which can then be used for any data-structured.
      • Struct [NEW-CLASS]
        • Generalized class structure to apply to any catalog-object that is schema-specified. This is analogous (and largely based on) the astrocats.catdict.CatDict class.
        • On initialization, the class uses its corresponding schema to generate a Keychain populated with Key instances that describe each property of this class. Validation is performed using jsonschema.
      • Meta_Struct [NEW-CLASS]
        • Subclass of Struct which is designed to be used for further subclassing to construct particular types of object, e.g. Source, Quantity, etc. Meta_Struct is used as the method to specify the schema describing/constraining the particular structure.
    • validation.py [NEW-FILE]

      • Create a jsonschema validator instance with extended functionality to set default values of parameters. Currently used to set default Key attributes.

v0.2.0 - 2018-07-04

  • schema/

    • meta-schema/
      • astro-schema_draft-0.json [NEW-FILE]
        • First version of a astro-schema specific meta-schema for validating all astro-schema schema. Currently this takes the standard json-schema and extends it slightly: required the 'type' and 'unique' attributes for each 'property'.
    • source.json
      • Schema specification for Source objects.
      • Currently: v0.4
  • pyastroschema/

    • tests/

      • test_keychain.py
        • Unittests for the Keychain class.
      • test_source.py
        • Basic tests for basic functionality of Source class.
        • Tests for both copy and deepcopy behavior.
    • __main__.py

      • main()
        • This is the primary interface routine.
        • Loads the astro-schema metaschema and validates it against the standard json-schema.
        • Loads all astro-schema and validates them against both the meta-schema and the standard json-schema.
        • Produces an 'index' output file listing the current included schema, and their version and modification information.
    • keys.py

      • Moved Keychain from source.py to here.
      • Added new Key class to hold each property key.
    • source.py

      • Removed Keychain class (see keys.py).
      • Source
        • Added overriding of __copy__ and __deepcopy__ methods.
        • is_duplicate_of() [new-function]
          • Duplicated behavior of related method in astrocats class.
    • utils.py

      • json_load_str() [new-function]
        • Load dictionary from json-formatted string.
      • get_relative_path() [new-function]
        • Convert from a full path to a path relative to a given reference path.

v0.1.0 - 2018-06-28

  • Simple schema for 'source' structures created.

  • A few test JSON files added in tests/source for checking validations.

  • pyastroschema/

    • Keychain class to store parameter names ('keys') specified in schema files.
    • Source class to store data associated with the source.json schema. Currently specific to the 'source' structure, and will be generalized in the future to arbitrary schema.
    • Validation works for 'source' entries and Source instances using the jsonschema python package. This uses the example JSON files in tests/source.