A comprehensive Python package for intelligent serialization that handles complex data types with ease
π― Perfect Drop-in Replacement for Python's JSON Module with enhanced features for complex data types and ML workflows. Zero migration effort - your existing JSON code works immediately with smart datetime parsing, type preservation, and advanced serialization capabilities.
π Works exactly like
json
module: Useimport datason.json as json
for perfect compatibility, orimport datason
for enhanced features like automatic datetime parsing and ML type support.
- π Perfect Compatibility: Works exactly like Python's
json
module - zero code changes needed - π Enhanced by Default: Main API provides smart datetime parsing and type detection automatically
- β‘ Dual API Strategy: Choose stdlib compatibility (
datason.json
) or enhanced features (datason
) - π οΈ Zero Migration: Existing
json.loads/dumps
code works immediately with optional enhancements
- π§ Smart Type Detection: Automatically handles pandas DataFrames, NumPy arrays, datetime objects, and more
- π Bidirectional: Serialize to JSON and deserialize back to original objects with perfect fidelity
- π Datetime Intelligence: Automatic ISO 8601 string parsing across Python 3.8-3.11+
- π‘οΈ Type Safety: Preserves data types and structure integrity with guaranteed round-trip serialization
- π ML Framework Support: Production-ready support for 10+ ML frameworks with unified architecture
- β‘ High Performance: Sub-millisecond serialization optimized for ML workloads
- π― Simple & Direct API: Intention-revealing functions (
dump_api
,dump_ml
,dump_secure
,dump_fast
) with automatic optimization - π Progressive Loading: Choose your success rate -
load_basic
(60-70%),load_smart
(80-90%),load_perfect
(100%) - ποΈ Production Ready: Enterprise-grade ML serving with monitoring, A/B testing, and security
- π Extensible: Easy to add custom serializers for your own types
- π¦ Zero Dependencies: Core functionality works without additional packages
- π Integrity Verification: Hash, sign, and verify objects for compliance workflows
- π File Operations: Save and load JSON/JSONL files with compression support
datason provides production-ready integration for major ML frameworks with consistent serialization:
- πΌ Pandas - DataFrames with schema preservation
- π’ NumPy - Arrays with dtype and shape preservation
- π₯ PyTorch - Tensors with exact dtype/shape reconstruction
- π§ TensorFlow/Keras - Models with architecture and weights
- π² Scikit-learn - Fitted models with parameters
- π CatBoost - Models with fitted state and parameter extraction
- π Optuna - Studies with trial history and hyperparameter tracking
- π Plotly - Interactive figures with data, layout, and configuration
- β‘ Polars - High-performance DataFrames with schema preservation
- π― XGBoost - Gradient boosting models (via scikit-learn interface)
- π± BentoML - Production services with A/B testing and monitoring
- βοΈ Ray Serve - Scalable deployment with autoscaling
- π¬ MLflow - Model registry integration with experiment tracking
- π¨ Streamlit - Interactive dashboards with real-time data
- π Gradio - ML demos with consistent data handling
- β‘ FastAPI - Custom APIs with validation and rate limiting
- βΈοΈ Seldon Core/KServe - Kubernetes-native model serving
Universal Pattern: All frameworks use the same
get_api_config()
for consistent UUID and datetime handling across your entire ML pipeline.
datason officially supports Python 3.8+ and is actively tested on:
- β Python 3.8 - Minimum supported version (core functionality)
- β Python 3.9 - Full compatibility
- β Python 3.10 - Full compatibility
- β Python 3.11 - Full compatibility (primary development version)
- β Python 3.12 - Full compatibility
- β Python 3.13 - Latest stable version (core features only; many ML libraries still releasing wheels)
We maintain compatibility through:
-
Automated CI testing on all supported Python versions with strategic coverage:
- Python 3.8: Core functionality validation (minimal dependencies)
- Python 3.9: Data science focus (pandas integration)
- Python 3.10: ML focus (scikit-learn, scipy)
- Python 3.11: Full test suite (primary development version)
- Python 3.12: Full test suite
- Python 3.13: Core serialization tests only (latest stable)
- Core functionality tests ensuring basic serialization works on Python 3.8+
- Dependency compatibility checks for optional ML/data science libraries
- Runtime version validation with helpful error messages
Note: While core functionality works on Python 3.8, some optional dependencies (like latest ML frameworks) may require newer Python versions. The package will still work - you'll just have fewer optional features available.
Python 3.13 Caution: Many machine learning libraries have not yet released official 3.13 builds. Datason runs on Python 3.13, but only with core serialization features until those libraries catch up.
Python 3.8 users should be aware:
- β Core serialization - Full support
- β Basic types - datetime, UUID, decimal, etc.
- β Pandas/NumPy - Basic DataFrame and array serialization
-
β οΈ Advanced ML libraries - Some may require Python 3.9+ -
β οΈ Latest features - Some newer configuration options may have limited support
We recommend Python 3.9+ for the best experience with all features.
Replace Python's json
module with zero code changes and get enhanced features automatically!
# Your existing code works unchanged
import datason.json as json
# Exact same API as Python's json module
data = json.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': '2024-01-01T00:00:00Z', 'value': 42}
json_string = json.dumps({"key": "value"}, indent=2)
# Works exactly like json.dumps() with all parameters
# Just use the main datason module for enhanced features
import datason
# Smart datetime parsing automatically enabled
data = datason.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': datetime.datetime(2024, 1, 1, 0, 0, tzinfo=timezone.utc), 'value': 42}
# Enhanced serialization with dict output for chaining
result = datason.dumps({"timestamp": datetime.now(), "data": [1, 2, 3]})
# Returns: dict (not string) with smart type handling
# Phase 1: Drop-in replacement (zero risk)
import datason.json as json # Perfect compatibility
# Phase 2: Enhanced features when ready
import datason # Smart datetime parsing, ML support, etc.
# Phase 3: Advanced features as needed
datason.dump_ml(ml_model) # ML-optimized serialization
datason.dump_secure(data) # Automatic PII redaction
datason.load_perfect(data, template) # 100% accurate reconstruction
Zero Risk Migration: Start with
datason.json
for perfect compatibility, then gradually adopt enhanced features when you need them.
pip install datason
import datason as ds
import uuid
from datetime import datetime
# ML prediction data with UUIDs and complex types
prediction_data = {
"request_id": uuid.uuid4(),
"timestamp": datetime.now(),
"features": {"feature1": 1.0, "feature2": 2.0},
"model_version": "1.0.0"
}
# Simple, direct API with automatic optimizations
api_response = ds.dump_api(prediction_data) # Perfect for web APIs
# β
UUIDs become strings automatically - no more Pydantic errors!
# ML-optimized serialization
import torch
model_data = {"model": torch.nn.Linear(10, 1), "weights": torch.randn(10, 1)}
ml_serialized = ds.dump_ml(model_data) # Automatic ML optimization
# Security-focused with automatic PII redaction
user_data = {"name": "Alice", "email": "alice@example.com", "ssn": "123-45-6789"}
secure_data = ds.dump_secure(user_data) # Automatic PII redaction
# Works across ALL ML frameworks with same simple pattern
import bentoml
from bentoml.io import JSON
@svc.api(input=JSON(), output=JSON())
def predict(input_data: dict) -> dict:
features = ds.load_smart(input_data) # 80-90% success rate
prediction = model.predict(features)
return ds.dump_api({"prediction": prediction}) # Clean API response
import datason as ds
from datetime import datetime
import pandas as pd
import numpy as np
# Complex nested data structure
data = {
"timestamp": datetime.now(),
"dataframe": pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}),
"array": np.array([1, 2, 3, 4, 5]),
"nested": {
"values": [1, 2, {"inner": datetime.now()}]
}
}
# Simple API with automatic optimization
api_data = ds.dump_api(data) # Web APIs (UUIDs as strings, clean JSON)
ml_data = ds.dump_ml(data) # ML optimized (framework detection)
secure_data = ds.dump_secure(data) # Security focused (PII redaction)
fast_data = ds.dump_fast(data) # Performance optimized
# Progressive loading - choose your success rate
basic_result = ds.load_basic(api_data) # 60-70% success, fastest
smart_result = ds.load_smart(api_data) # 80-90% success, balanced
perfect_result = ds.load_perfect(api_data, template=data) # 100% with template
# Traditional API still available
serialized = ds.serialize(data)
restored = ds.deserialize(serialized)
import datason as ds
# Use the main dump() function with options for complex scenarios
large_sensitive_ml_data = {
"model": trained_model,
"user_data": {"email": "user@example.com", "preferences": {...}},
"large_dataset": huge_numpy_array
}
# Combine multiple optimizations
result = ds.dump(
large_sensitive_ml_data,
secure=True, # Enable PII redaction
ml_mode=True, # Optimize for ML objects
chunked=True # Memory-efficient processing
)
# Or use specialized functions for simple cases
api_data = ds.dump_api(response_data) # Web API optimized
ml_data = ds.dump_ml(model_data) # ML optimized
secure_data = ds.dump_secure(sensitive_data) # Security focused
fast_data = ds.dump_fast(performance_data) # Speed optimized
# Progressive loading with clear success rates
basic_result = ds.load_basic(json_data) # 60-70% success, fastest
smart_result = ds.load_smart(json_data) # 80-90% success, balanced
perfect_result = ds.load_perfect(json_data, template) # 100% with template
# API discovery and help
help_info = ds.help_api() # Get guidance on function selection
datason provides a complete ML serving architecture with visual documentation:
- π― Universal Integration Pattern: Single configuration works across all frameworks
- π Comprehensive Monitoring: Prometheus metrics, health checks, and observability
- π Enterprise Security: Input validation, rate limiting, and PII redaction
- β‘ Performance Optimized: Sub-millisecond serialization with caching support
- π A/B Testing: Framework for testing multiple model versions
- π Production Examples: Ready-to-deploy BentoML, Ray Serve, and FastAPI services
graph LR
A[Client Apps] --> B[API Gateway]
B --> C[ML Services<br/>BentoML/Ray/FastAPI]
C --> D[Models<br/>CatBoost/Keras/etc]
C --> E[Cache<br/>Redis]
C --> F[DB<br/>PostgreSQL]
style C fill:#e1f5fe,stroke:#01579b,stroke-width:3px
style D fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
See Full Documentation: Complete architecture diagrams and production patterns in
docs/features/model-serving/
For full documentation, examples, and API reference, visit: https://datason.readthedocs.io
- ποΈ Architecture Overview - Complete system architecture with Mermaid diagrams
- π Model Serving Integration - Production-ready examples for all major frameworks
- π― Production Patterns - Advanced deployment strategies and best practices
- π± Advanced BentoML Integration - Enterprise service with A/B testing and monitoring
- π Production ML Serving Guide - Complete implementation with security and observability
Quick Start: Run
python examples/production_ml_serving_guide.py
to see all features in action!
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.