Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 20, 2026

Summary

This PR adds documentation for DataJoint 2.1 features, primarily PostgreSQL multi-backend support and updated tutorials that work identically on both MySQL and PostgreSQL.

Changes

New Documentation Files

  • src/reference/specs/database-backends.md — Specification for supported database backends
    • Supported backends table (MySQL 8.0+, PostgreSQL 15+)
    • Adapter architecture diagram
    • Backend compatibility matrix
    • Type mapping reference (core types → native types)
    • Connection management patterns
    • Migration guidance

Updated Tutorials

All tutorials have been verified to work on both MySQL and PostgreSQL:

  • src/tutorials/advanced/json-type.ipynb — Simplified to focus on storage patterns
    • Insert and fetch entire JSON objects (backend-agnostic)
    • Filtering in Python after fetching (no backend-specific SQL)
    • Clear guidelines for JSON vs normalized tables
    • Works identically on MySQL and PostgreSQL

Modified Reference Files

  • src/about/versioning.md

    • Added v2.1 to version history table with "PostgreSQL multi-backend support"
  • src/how-to/configure-database.md

    • Added database.backend setting to configuration table
    • Added new "PostgreSQL Backend" section with:
      • Configuration file examples
      • Environment variable setup (DJ_BACKEND)
      • Programmatic configuration
      • Docker Compose example for local development
      • Backend compatibility note
  • src/how-to/migrate-to-v20.md

    • NEW: Added "Column Comment Format (Critical for Blob Migration)" section explaining:
      • The :<type>: comment format used by DataJoint 2.0
      • Why blob columns need :<blob>: markers (without them, blobs return raw bytes)
      • Usage examples for check_migration_status(), migrate_columns(), migrate_blob_columns()
    • UPDATED: Expanded "Option B: In-Place Migration" with step-by-step guide using actual datajoint.migrate functions:
      • backup_schema(), migrate_columns(), rebuild_lineage(), migrate_external(), verify_schema_v20()
  • src/reference/specs/type-system.md

    • Added version indicator for PostgreSQL type mappings
  • src/explanation/type-system.md

    • Added version indicator noting PostgreSQL support in 2.1
  • src/reference/specs/index.md

    • Added Database Backends to Foundation section in reading order
  • mkdocs.yaml

    • Added navigation entry for database-backends.md under Specifications > Foundation
    • Updated datajoint_version to "2.1" (indicates latest documented version)

Testing Infrastructure

  • docker-compose.yaml — MySQL 8.0, PostgreSQL 15, MinIO containers for testing
  • scripts/execute_notebooks.py — Notebook execution script with backend selection

Version Indicators

All PostgreSQL-related content includes proper version admonitions per about/versioning.md:

!!! version-added "New in 2.1"
    PostgreSQL is now supported as an alternative database backend.

Backend-Agnostic Design

The tutorials follow these principles for backend compatibility:

  1. JSON fields: Insert/fetch entire objects; filter in Python (no SQL restrictions on JSON)
  2. Standard SQL: Use only SQL features supported by both MySQL and PostgreSQL
  3. Type definitions: Use DataJoint core types that map correctly to both backends

Related

  • datajoint/datajoint-python PR #1339 (PostgreSQL implementation)

Test Plan

  • Build docs locally and verify new pages render correctly
  • Verify version admonitions display properly
  • Check navigation links work
  • Verify Docker Compose example syntax is correct
  • Run all tutorials on MySQL backend
  • Run all tutorials on PostgreSQL backend

🤖 Generated with Claude Code

dimitri-yatsenko and others added 14 commits January 19, 2026 21:51
- Add v2.1 to version history with PostgreSQL support
- Create database-backends.md specification for supported backends
- Add PostgreSQL configuration section to configure-database.md
- Add database.backend setting to configuration table
- Add version indicators for v2.1 features
- Update type system docs with PostgreSQL type mappings note
- Update mkdocs.yaml navigation with database backends spec
- Update specs index with database backends in foundation section

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PostgreSQL service to docker-compose.yaml
- Add EXECUTE and EXECUTE_PG modes for notebook execution
- Create scripts/execute_notebooks.py for automated notebook execution
- Install datajoint[postgres] for PostgreSQL support in EXECUTE_PG mode

Usage:
  MODE="EXECUTE" docker compose up --build     # Execute against MySQL
  MODE="EXECUTE_PG" docker compose up --build  # Execute against PostgreSQL

Note: PostgreSQL adapter implementation has outstanding bugs that
prevent notebooks from executing successfully. Issues identified:
- ENUM type generation not working (outputs literal {enum_type_name})
- ON DUPLICATE KEY UPDATE syntax (MySQL) vs ON CONFLICT (PostgreSQL)
- String quoting differences in LIKE patterns
- Index syntax differences
- Blob serialization with psycopg2

These need to be fixed in datajoint-python before notebooks can pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The year type is MySQL-specific and not portable to PostgreSQL.
Using int16 ensures the tutorial works across backends.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix arXiv citation in semantic-matching.md: correct paper title to
  "DataJoint: A Simpler Relational Data Model" (not Elements)
- Remove confusing unsigned type rows from type-system.md table;
  add note pointing to reference spec for unsigned type guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in type-system.md: kept simpler unsigned type note.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates for PostgreSQL backend compatibility:

1. Update notebooks for cross-backend SQL syntax
   - Replace double-quoted string literals with dict syntax
   - Affects: 01-first-pipeline, 03-data-entry, allen-ccf,
     university, languages notebooks

2. Add SQL function translation section to database-backends.md
   - Documents GROUP_CONCAT ↔ STRING_AGG translation
   - Bidirectional translation for portable code

3. Update query-algebra.md spec with subquery wrapping rules
   - Document when operators modify-in-place vs wrap as subquery
   - Document backend-specific transpilation (HAVING clause)

4. Add drop_tutorial_schemas.py script
   - Utility to clean up tutorial schemas before notebook runs

Notebooks tested against PostgreSQL: 20/21 pass
(json-type.ipynb is MySQL-specific, expected failure)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor the JSON tutorial to:
- Focus on insert/fetch of entire JSON objects
- Remove backend-specific SQL restriction examples
- Use simpler Equipment/specs example instead of RC car race story
- Show filtering in Python after fetching
- Add clear guidelines for when to use JSON vs normalized tables

Works identically on both MySQL and PostgreSQL.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Save notebook outputs from PostgreSQL execution for 2.1 release.
All 21 notebooks pass on PostgreSQL backend.

This establishes PostgreSQL as the reference backend for CI.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Re-execute all notebooks with DJ_USE_TLS=false to eliminate SSL warnings
- Add backend compatibility admonition to first tutorial noting
  MySQL and PostgreSQL support (PostgreSQL added in 2.1)
- Update execute_notebooks.py to disable TLS for tutorial containers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tebooks)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 21 notebooks now pass on PostgreSQL with:
- Working schema diagrams
- TIMESTAMPDIFF/CURDATE date function translations
- No SSL warnings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 21 notebooks pass with:
- Fixed TIMESTAMPDIFF/CURDATE translations
- YEAR()/MONTH()/DAY() function support
- SUM(boolean) handling for PostgreSQL

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Diagrams now render with correct table colors and styling for
all table tiers (Manual, Lookup, Imported, Computed, Part).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko marked this pull request as ready for review January 20, 2026 22:58
@dimitri-yatsenko dimitri-yatsenko changed the title WIP: DataJoint 2.1 Documentation docs: DataJoint 2.1 Documentation - PostgreSQL Multi-Backend Support Jan 20, 2026
@dimitri-yatsenko dimitri-yatsenko changed the base branch from pre/v2.0 to main January 20, 2026 23:45
- semantic-matching.md: clarify applies to all binary operators, not just joins
- semantic-matching.md: change "replacing current rules" to "replaces pre-2.0 rules"
- job-metadata.md: rename "Current/Proposed implementation" to "Pre-2.0/DataJoint 2.0 implementation"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdocs.yaml:217 - Should datajoint_version be updated to "2.1" to reflect the PostgreSQL content in this PR?

extra:
  datajoint_version: "2.1"  # Updated from "2.0"

Or is this intentionally kept at "2.0" since that's the documentation baseline (per about/versioning.md)?

Just want to clarify the intent here.

Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query-algebra.md Section 13.6 - The backend-specific SQL transpilation details (identifier quoting, HAVING alias handling, function translation) are thorough and valuable.

Question: Is this the intended location for these details, or would they fit better in database-backends.md alongside the other backend-specific behavior documentation?

Either location works—just checking if this was a deliberate choice to keep all SQL generation details together in the query-algebra spec.

@dimitri-yatsenko
Copy link
Member Author

Re: mkdocs.yaml:217 (datajoint_version)

The datajoint_version: "2.0" setting indicates the documentation baseline, not the newest feature version. Per our versioning strategy, 2.0 is the baseline version where features are documented without markers. Features added in 2.1 (like PostgreSQL) are explicitly marked with !!! version-added "New in 2.1" admonitions throughout.

This approach lets users on 2.0 use the documentation while clearly seeing which features require upgrading to 2.1.

@dimitri-yatsenko
Copy link
Member Author

Re: query-algebra.md Section 13.6 location

This was a deliberate choice following our Diataxis-based organization. Both files are specification pages (Reference category), but they serve different purposes:

query-algebra.md specifies the query algebra—how operators work and how they translate to SQL. Section 13.6 is integral to this spec because it documents how query expressions generate backend-specific SQL. This is relevant to developers understanding DataJoint's internals or debugging complex queries.

database-backends.md specifies backend configuration and user-facing compatibility patterns—what backends are supported, how to configure them, and how to write portable code.

The overlap in function translation is intentional: query-algebra.md documents the internal translation mechanism, while database-backends.md shows users how to leverage it. Users looking for "how do I write portable aggregations" go to database-backends.md; developers asking "how does DataJoint generate SQL for different backends" find it in the query algebra spec.

@dimitri-yatsenko
Copy link
Member Author

Update: Changed datajoint_version to "2.1"

On reflection, the displayed version should indicate what the documentation covers, not the baseline. Since this PR documents 2.1 features (PostgreSQL), displaying "2.1" is more accurate for users.

The baseline concept (2.0 features unmarked, 2.1+ with admonitions) remains—it's about how features are marked, not what version to display.

I've updated mkdocs.yaml to set datajoint_version: "2.1".

The displayed version should indicate what the documentation covers,
not just the baseline. Since this PR documents 2.1 features (PostgreSQL),
displaying "2.1" is more accurate for users.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Explains how DataJoint 2.0 identifies blob columns via `:<blob>:` comment
markers. Without these markers, legacy blobs are treated as raw binary
and not deserialized.

Added:
- "Column Comment Format" section explaining the `:type:` format
- Usage examples for check_migration_status(), migrate_columns(),
  and migrate_blob_columns() from datajoint.migrate
- Expanded "Option B: In-Place Migration" with step-by-step using
  actual migration functions (backup_schema, migrate_columns,
  rebuild_lineage, migrate_external, verify_schema_v20)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MilagrosMarin MilagrosMarin merged commit 3b5ee02 into main Jan 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants