Tools for creating and reusing high-quality spreadsheets


Keywords
object, model, schema, workbook, XLSX, Excel, validation, complex-datasets, csv, data-tables, object-mapping, python, relational-data, tsv
License
MIT
Install
pip install obj-tables==1.0.14

Documentation

PyPI package Documentation Test results Test coverage Code analysis License Analytics

ObjTables: Toolkit for modeling complex datasets with collections of user-friendly tables

ObjTables is a toolkit for using schemas to model collections of tables that represent complex datasets, combining the ease of use of Excel with the rigor and power of schemas.

ObjTables makes it easy to:

  • Use collections of tables (e.g., an Excel workbook) as an interface for viewing and editing complex datasets that consist of multiple related objects that have multiple attributes,
  • Use complex data types (e.g., numbers, strings, numerical arrays, symbolic mathematical expressions, chemical structures, biological sequences, etc.) within tables,
  • Use embedded tables and grammars to encode relational information into columns and groups of columns of tables,
  • Define schemas for collections of tables,
  • Use schemas to parse collections of tables into Python data structures for further analysis,
  • Use schemas to validate the syntax and semantics of collections of tables,
  • Conduct operations on complex datasets, such as comparing and merging objects, and
  • Edit schemas and migrate a dataset to a new version of a schema.

The ObjTables toolkit includes five components:

  • Tabular format for collections of tables. This includes syntax for declaring which cells represent each table, instance, and attribute; declaring which entries represent metadata such as the date that a table was updated; and declaring which entries represent comments.

  • Tabular format for schemas for collections of tables. ObjTables schemas capture the format of each table, including the name and data type of each column, which cells represent relationships among the entries in the tables, and constraints on the value of each cell. ObjTables supports three modes of encoding relationships into cells in tables.

    • Columns for relationships among objects represented by entries in tables: Relationships from one (primary) object to other (related) objects can be captured by (a) incorporating a column that represents a unique key for each related object into the table that represents the related objects and (b) encoding the keys for the related objects as a comma-separated list into a column in the table that represents the primary objects.
    • Embedded tables for *-to-one relationships: To help users encode complex datasets into a minimal number of tables, ObjTables can also encode instances of related classes into groups of columns. ObjTables uses merged headings to distinguish these columns.
    • Embedded grammars for relationships: To help users encode complex datasets into a minimal number of tables, grammars can be used to encode instances of related classes into a single column. These grammars can be defined declaratively in EBNF format using Lark.
  • Python API for defining schemas: For more flexibility, the Python API can be used to incorporate custom data types into schemas, utilize multiple inheritance, and define custom validation procedures.

  • Numerous data types including types for mathematics, science, chemoinformatics, and genomics.

  • Software tools for parsing, validating, and manipulating datasets according to schemas. This includes tools for

    • Pretty printing datasets as Excel workbooks. This enables users to use Excel as a graphical interface for quickly browsing and editing datasets as described below.
    • Creating templates for datasets.
    • Analyzing, comparing, merging, and revisioning datasets.
    • Migrating datasets between versions of their schemas.

ObjTables enables users to leverage Excel as a graphical interface for viewing and editing complex datasets. Excel-encoded datasets have the following features:

  • Table of contents: Optionally, each dataset can include a table that describes the classes represented by the data tables, displays the number of instances of each class, and provides hyperlinks to the data tables.
  • Formatted class titles: Each table includes a title bar that describes the class. The title bars are formatted, frozen, and protected from editing.
  • Formatted attribute headings: Each table includes headings for each column and group of columns. The headings are formatted, auto-filtered, frozen, and protected from editing.
  • Inline help for attributes: ObjTables uses Excel comments to embed help information about each attribute into its heading.
  • Select menus for enumerations and relationships: ObjTables provides select menus for each attribute that encodes an enumeration, a one-to-one relationship, or a many-to-one relationship.
  • Instant validation: ObjTables uses Excel to validate several properties of attributes. Note, due to the limitations of Excel, this provides limited validation. The ObjTables software provides far more extensive validation. Furthermore, ObjTables makes it easy to implement domain-specific validation at multiple levels.
  • Hidden extra rows and columns:> To help users focus on the attributes of their classes, ObjTables hides all empty rows and columns.
  • Protection for unintentional editing: To help users avoid mistakes, ObjTables protects worksheets.

ObjTables supports multiple levels of validation of datasets:

  • Attribute validation: Validations of individual attributes can be defined declaratively (e.g. string(min_length=8)). More complex validations can be defined using a Python schema or by implementing custom types of attributes.
  • Instance validation: Users can implement custom instance-level validations by creating a Python module that implements a schema and implementing the validate method of each class.
  • Class-level validation: Most attributes can be constrained to have unique values across all instances (e.g., string(unique=True)). Python modules that implement schemas can also capture tuples of attributes that must be unique across all instances of a class. See the documentation for more information.

ObjTables provides four user interfaces to the software tools:

  • Web app: The web app enables users to use ObjTables without having to install any software.
  • REST API: The REST API enables users to use ObjTables programmatically without having to install any software.
  • Command-line interface: The command-line interface enables users to use ObjTables without having to upload data to this website.
  • Python library: The Python library enables users to extend ObjTables with custom attributes and validation and use ObjTables to analyze complex datasets.

Installing the command-line program and Python API

Please see the documentation.

Examples, tutorials, and documentation

Please see the documentation and tutorials.

License

ObjTables is released under the MIT license.

Development team

ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the Institut National de la Recherche Agronomique in Jouy en Josas, FR.

Questions and comments

Please contact the Karr Lab with any questions or comments.