ddex-builder

High-performance DDEX XML builder with deterministic output and smart normalization


Keywords
ddex, xml, builder, music, deterministic, canonicalization
License
MIT
Install
pip install ddex-builder==0.3.0

Documentation

DDEX Suite

Rust Node.js Python TypeScript Platform npm ddex-builder npm ddex-parser PyPI ddex-builder PyPI ddex-parser crates.io ddex-core crates.io ddex-parser crates.io ddex-builder

High-performance DDEX XML builder and parser with native bindings for TypeScript/JavaScript and Python. Built on a single Rust core for consistent behavior across all platforms.

DDEX Suite brings together powerful tools for music industry data exchange, combining the robust ddex-parser library for reading and transforming DDEX messages with the ddex-builder library for deterministic XML generation, creating a complete round-trip solution for DDEX processing.

🎯 Why DDEX Suite?

Working with DDEX XML shouldn't feel like archaeology. The suite transforms complex DDEX messages into clean, strongly-typed data structures that are as easy to work with as JSON.

Core Value Proposition

  • Single Rust Core: One implementation to rule them all - consistent behavior across JavaScript, Python, and Rust
  • Dual Model Architecture: Choose between faithful graph representation or developer-friendly flattened view
  • Production Ready: Built-in XXE protection, memory-bounded streaming, and comprehensive security hardening
  • Deterministic Output: Consistent, reproducible XML generation with smart normalization

πŸ‘¨πŸ»β€πŸ’» Developer Statement

I'm building DDEX Suite as a rigorous, end-to-end learning project to deepen my Rust skills while unifying my JavaScript and Python experience into a production-grade toolkit for music metadata. The intent is to ship a single Rust core that serves both a high-performance, security-hardened DDEX XML parser library (ddex-parser) and a consistent, deterministic builder library (ddex-builder). This core is exposed through napi-rs for Node/TypeScript and PyO3 for Python, showcasing not just cross-language API design but also deep ecosystem integration, including a declarative DataFrame mapping DSL for Python users. The project is deliberately "industry-shaped," tackling the complementary challenges of transforming complex DDEX XML into clean models (parsing) and generating canonical, reproducible XML from those models. This is achieved through a dual graph+flattened data model for developer UX and an uncompromising approach to determinism, centered on a custom canonicalization specification, DB-C14N/1.0, and a stable, content-addressable ID generation engine.

Beyond the core implementation, this is a showcase of software craftsmanship and platform thinking. The suite provides consistent APIs, painless installation via prebuilt binaries, a hardened CI/CD pipeline, and robust supply-chain safety (SBOM, cargo-deny, and Sigstore artifact signing). Every feature reflects production wisdomβ€”from the parser's XXE protection to the builder's versioned partner presets system with safety locks. Paired with my validator work (DDEX Workbench), DDEX Suite delivers a credible, end-to-end Parse β†’ Modify β†’ Build processing pipeline, complete with enterprise-grade features like preflight validation, a semantic diff engine, and a comprehensive CLI. It illustrates how to design interoperable components that are fast, safe, and easy to adopt in real-world systems.

🚧 Development Status

Latest Release: Suite v0.4.1 πŸŽ‰
Current Development Phase: 4.4 - Streaming Parser
Target Release: Suite v1.0.0 in Q1 2026

πŸ“¦ Available Packages

All packages published across npm, PyPI, and crates.io! βœ…

Package npm PyPI crates.io Version
ddex-core - - βœ… Published v0.4.1
ddex-parser βœ… Published βœ… Published βœ… Published v0.4.1
ddex-builder βœ… Published βœ… Published βœ… Published v0.4.1

Progress Overview

βœ… Phase 1-3: Complete - Core foundation, parser, and builder are fully implemented
βœ… Phase 4.1: Integration Testing - Round-trip functionality validated with 94 tests passing
βœ… crates.io Publishing - NEW! All Rust crates published to the official registry
βœ… Phase 4.2: Documentation - Docusaurus site in React
βœ… Phase 4.3: Smart Normalization Engine - Round-trip, deterministic output
βœ… Phase 4.3.5: Core Stabilization - Stability and performance upgrades
βœ… Phase 4.4: Streaming Parser - High-performance XML parser

For detailed development progress and technical implementation details, see blueprint.md.

🎭 Dual Model Architecture

The suite provides two complementary views of the same data with full round-trip data integrity:

Graph Model (Faithful)

Preserves the exact DDEX structure with references and extensions - perfect for compliance and round-trip operations:

interface ERNMessage {
  messageHeader: MessageHeader;
  parties: Party[];               // All parties with IDs
  resources: Resource[];          // Audio, video, image resources
  releases: Release[];            // Release metadata with references
  deals: Deal[];                  // Commercial terms
  extensions?: Map<string, XmlFragment>;  // Preserved for round-trip
  toBuildRequest(): BuildRequest; // Convert to builder input
}

Flattened Model (Developer-Friendly)

Denormalized and resolved for easy consumption - ideal for applications while maintaining round-trip capability:

interface ParsedRelease {
  releaseId: string;
  title: string;
  displayArtist: string;
  tracks: ParsedTrack[];          // Fully resolved with resources merged
  coverArt?: ParsedImage;
  _graph?: Release;               // Reference to original for full data integrity
  extensions?: Map<string, XmlFragment>; // Extensions preserved
}

🧹 Smart Normalization & Clean Output

The DDEX Suite provides powerful normalization capabilities that transform inconsistent, messy DDEX files into clean, compliant output.

Why Normalization Matters

Real-world DDEX files come from many sources with varying quality:

  • Different namespace conventions (ern:Title vs Title vs ns2:Title)
  • Inconsistent element ordering
  • Mixed DDEX versions and dialects
  • Redundant whitespace and formatting issues
  • Non-standard extensions and attributes

DDEX Builder solves this by:

  • Normalizing all input to clean DDEX 4.3 structure
  • Standardizing element and attribute ordering
  • Optimizing output for compliance and size
  • Preserving all semantic data and business information

Build & Normalization Features

// Build DDEX
const { DdexBuilder } = require('ddex-builder');
const builder = new DdexBuilder();
builder.applyPreset('audio_album'); // Apply baseline preset

// Configuration options
builder.setConfig({
  canonical: true,           // Consistent, deterministic output
  validate: true,            // Ensure DDEX compliance
  version: '4.3',           // Target DDEX version
  optimize_size: true       // Remove redundant whitespace
});

// Parse messy vendor DDEX β†’ Output clean DDEX
const { DdexParser } = require('ddex-parser');
const parser = new DdexParser();
const parsed = await parser.parse(messyVendorFile);
const cleanDdex = await builder.build(parsed);
// Result: Beautiful, compliant DDEX 4.3

What Gets Normalized

Input Chaos Output Order
Mixed namespace prefixes Consistent DDEX namespaces
Random element ordering Specification-compliant order
Whitespace soup Clean, minimal formatting
Legacy DDEX versions Modern DDEX 4.3
Vendor-specific quirks Standard-compliant structure

πŸ”’ Data Integrity Guarantees

The DDEX Suite ensures your business data is always preserved:

Guarantee 1: Semantic Preservation

All business-critical data (ISRCs, titles, artists, deals) is preserved with 100% accuracy.

Guarantee 2: Deterministic Output

Building the same data always produces identical output - perfect for testing and validation.

Guarantee 3: Extension Support

Partner extensions (Spotify, Apple, YouTube) are preserved and properly namespaced.

Guarantee 4: Round-Trip Data Integrity

Parse β†’ Modify β†’ Build workflows maintain all your data, with beneficial normalization applied.

πŸš€ Features

βœ… Streaming Parser with SIMD Optimization (v0.4.0)

  • ⚑ SIMD-Accelerated: FastStreamingParser using memchr for 25-30 MB/s production throughput
  • 🎯 Peak Performance: 500-700 MB/s for uniform XML, up to 1,265 MB/s in optimal conditions
  • πŸ’Ύ Memory Efficient: 90% reduction - 100MB files with <50MB peak memory (O(1) streaming)
  • πŸ” Selective Parsing: 11-12x faster with XPath-like selectors for targeted extraction
  • βš™οΈ Parallel Processing: 6.25x speedup on 8 cores with 78% efficiency
  • 🌐 Cross-Language: Native streaming in Rust, Python (16M+ elements/sec), Node.js (100K elements/sec)
  • πŸ“Š Production Ready: 96.3% score across all validation metrics

βœ… Smart Normalization (v0.4.0)

  • 🧹 Clean Output: Transform messy vendor DDEX into compliant DDEX 4.3
  • πŸ“ Consistent Structure: Standardized element ordering and namespaces
  • ✨ Optimized Size: Remove redundant whitespace and formatting
  • πŸ”„ Data Preservation: 100% semantic accuracy maintained
  • 🎯 Deterministic: Same input always produces same output

βœ… Native Python Bindings (v0.3.0)

  • 🐍 Production-Ready Python: Native PyO3 bindings with full DataFrame integration
  • πŸ“Š DataFrame Support: Three schema options (flat, releases, tracks) for pandas integration
  • ⚑ Native Performance: <50ms parsing for 10MB files with Python
  • πŸ”„ Round-Trip Python: Complete Parse β†’ DataFrame β†’ Build workflow support
  • πŸ”— PyPI Available: Install with pip install ddex-parser ddex-builder

βœ… Core Features

  • πŸ”„ Round-Trip Workflow: Parse β†’ Modify β†’ Build with 100% data preservation
  • 🎭 Dual Model Architecture: Graph (faithful) and flattened (developer-friendly) views
  • πŸ›‘οΈ Enterprise Security: XXE protection, entity expansion limits, memory bounds
  • ⚑ High Performance: Sub-millisecond processing for typical files
  • 🌐 Multi-Platform: Native bindings for Node.js, Python, WASM, and Rust
  • πŸ”— Reference Linking: Automatic relationship resolution between entities
  • πŸ†” Stable Hash IDs: Content-based deterministic ID generation
  • ✨ Multi-Version Support: ERN 3.8.2, 4.2, and 4.3 with automatic detection

πŸ”„ In Development

  • Streaming: Handle massive catalogs with backpressure and progress callbacks
  • Semantic Diff: Track changes between DDEX message versions
  • Additional Bindings: C#/.NET and Go language bindings

πŸ“¦ Installation

# JavaScript/TypeScript
npm install ddex-parser  # βœ… Latest: v0.4.0
npm install ddex-builder # βœ… Latest: v0.4.0

# Python
pip install ddex-parser  # βœ… Latest: v0.4.0
pip install ddex-builder # βœ… Latest: v0.4.0

# Rust
cargo add ddex-core      # βœ… Latest: v0.4.0
cargo add ddex-parser    # βœ… Latest: v0.4.0
cargo add ddex-builder   # βœ… Latest: v0.4.0

Browser/WASM

<script type="module">
import init, { DdexParser, DdexBuilder } from '@ddex/wasm';
await init();
const parser = new DdexParser();
const builder = new DdexBuilder();
</script>

Bundle sizes (v0.4.0):

  • Parser: 37KB (gzipped: ~12KB)
  • Builder: 420KB (gzipped: ~140KB)

πŸ’» Usage Examples

JavaScript/TypeScript

// Parse DDEX
const { DdexParser } = require('ddex-parser');
const parser = new DdexParser();
const result = await parser.parse(xmlContent);

// Modify the parsed data
result.flat.releases[0].title = "Updated Title";

// Build DDEX
const { DdexBuilder } = require('ddex-builder');
const builder = new DdexBuilder();
builder.applyPreset('audio_album'); // optional
const xml = await builder.build(result.toBuildRequest());

// Round-trip with beneficial normalization
const reparsed = await parser.parse(xml);
assert.deepEqual(reparsed.graph, result.graph); // βœ… Identical

Python (v0.4.0 - Native Implementation)

from ddex_parser import DdexParser
from ddex_builder import DdexBuilder
import pandas as pd

# Parse DDEX message with native performance
parser = DdexParser()
message = parser.parse(xml_content)

# Export to DataFrame for analysis (NEW!)
df = message.to_dataframe(schema='releases')  # 'flat', 'releases', or 'tracks'
print(f"Found {len(df)} releases")

# Modify DataFrame data
df.loc[0, 'title'] = 'Updated Album Title'

# Build from DataFrame (Round-trip support)
builder = DdexBuilder()
xml = builder.from_dataframe(df, version='4.3')

# Traditional object-based building also supported
xml = builder.build({
    'header': {
        'message_sender': {'party_name': [{'text': 'My Label'}]},
        'message_recipient': {'party_name': [{'text': 'YouTube'}]}
    },
    'version': '4.3',
    'releases': [{
        'release_id': '1234567890123',
        'title': [{'text': 'Amazing Album'}],
        'display_artist': 'Great Artist',
        'tracks': [
            {'position': 1, 'isrc': 'USXYZ2600001', 'title': 'Track 1', 'duration': 180}
        ]
    }]
})

Rust

use ddex_parser::DdexParser;
use ddex_builder::DdexBuilder;

// Parse DDEX message
let parser = DdexParser::new();
let result = parser.parse(&xml_content)?;

// Modify the parsed data
let mut build_request = result.to_build_request();
build_request.releases[0].title = "Updated Title".to_string();

// Build deterministic XML
let builder = DdexBuilder::new();
let xml = builder.build(&build_request)?;

// Round-trip with beneficial normalization and type safety
let reparsed = parser.parse(&xml)?;
assert_eq!(reparsed.graph, result.graph); // βœ… Identical

πŸ—οΈ Architecture

Built as a monorepo with shared core components:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            DDEX Suite                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   DDEX Parser   β”‚   DDEX Builder      β”‚
β”‚  Read & Parse   β”‚  Generate & Build   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           Shared Core                 β”‚
β”‚    Models β”‚ Errors β”‚ FFI β”‚ Utils      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         Language Bindings             β”‚
β”‚   napi-rs β”‚ PyO3 β”‚ WASM β”‚ CLI         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”’ Security

  • XXE (XML External Entity) protection
  • Entity expansion limits (billion laughs protection)
  • Deep nesting protection
  • Size and timeout limits
  • Memory-bounded streaming
  • Supply chain security with cargo-deny and SBOM

πŸ“Š Performance Metrics

Streaming Parser Performance

Operation Performance Memory Notes
Production Files 25-30 MB/s <50MB Complex DDEX files with varied content
Uniform XML 500-700 MB/s <50MB Repetitive patterns, SIMD sweet spot
Peak Throughput 1,265 MB/s <50MB Optimal conditions, cached data
100MB File ~3.6s <10MB 90% memory reduction achieved
1GB File ~36s <50MB Maintains constant memory
Selective Parsing 11-12x faster <5MB Extract specific elements only
Parallel (8 cores) 6.25x speedup ~6MB/thread 78% efficiency

Language Binding Performance

Language Throughput Memory Async Support Notes
Rust 50K elem/ms Native Yes (tokio) Baseline
Python 16M elem/s <100MB Yes (asyncio) PyO3 native
Node.js 100K elem/s <100MB Yes (streams) Native streams + backpressure
WASM 10K elem/s Browser Yes (Promise) 37KB bundle size

Parser Performance by File Size

File Size Parse Time Memory Usage Mode Notes
10KB <5ms <2MB DOM Single release
100KB <10ms <5MB DOM Small catalog
1MB <50ms <20MB DOM Medium catalog
10MB <400ms <100MB Auto Threshold for streaming
100MB <3.6s <10MB Stream 90% memory reduction
1GB <36s <50MB Stream Constant memory usage

Package Sizes

Component Size Target Status
Rust Core 9.4MB - βœ… Development artifact
Node.js (npm) 347KB <1MB βœ… Excellent
Python wheel 235KB <1MB βœ… Compact
WASM bundle 114KB <500KB βœ… 77% under target!
crates.io βœ… NEW!
ddex-core 57.2KiB (34 files) <10MB βœ… Compact
ddex-parser 197.9KiB (43 files) <10MB βœ… Efficient
ddex-builder 1.1MiB (81 files) <10MB βœ… Under limit

πŸ“š Documentation

πŸ“– Core Documentation

πŸ¦€ Rust Documentation βœ… NEW!

πŸ”§ Developer Resources

🀝 Contributing

This project is in active development. While external contributions aren't yet accepted, we welcome feedback and issue reports. Follow the project for updates!

πŸ“œ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

DDEX Suite is designed to complement DDEX Workbench by providing structural parsing and deterministic generation while Workbench handles XSD validation and business rules.


Repository: https://github.com/daddykev/ddex-suite
Status: Phase 4.4 - Additional Bindings
Parser: v0.4.1 on npm and PyPI
Builder: v0.4.1 on npm and PyPI
Suite Target: v1.0.0 in Q1 2026
Last Updated: September 15, 2025