-
Notifications
You must be signed in to change notification settings - Fork 11
docs: DataJoint 2.1 Documentation - PostgreSQL Multi-Backend Support #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add v2.1 to version history with PostgreSQL support - Create database-backends.md specification for supported backends - Add PostgreSQL configuration section to configure-database.md - Add database.backend setting to configuration table - Add version indicators for v2.1 features - Update type system docs with PostgreSQL type mappings note - Update mkdocs.yaml navigation with database backends spec - Update specs index with database backends in foundation section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PostgreSQL service to docker-compose.yaml
- Add EXECUTE and EXECUTE_PG modes for notebook execution
- Create scripts/execute_notebooks.py for automated notebook execution
- Install datajoint[postgres] for PostgreSQL support in EXECUTE_PG mode
Usage:
MODE="EXECUTE" docker compose up --build # Execute against MySQL
MODE="EXECUTE_PG" docker compose up --build # Execute against PostgreSQL
Note: PostgreSQL adapter implementation has outstanding bugs that
prevent notebooks from executing successfully. Issues identified:
- ENUM type generation not working (outputs literal {enum_type_name})
- ON DUPLICATE KEY UPDATE syntax (MySQL) vs ON CONFLICT (PostgreSQL)
- String quoting differences in LIKE patterns
- Index syntax differences
- Blob serialization with psycopg2
These need to be fixed in datajoint-python before notebooks can pass.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The year type is MySQL-specific and not portable to PostgreSQL. Using int16 ensures the tutorial works across backends. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix arXiv citation in semantic-matching.md: correct paper title to "DataJoint: A Simpler Relational Data Model" (not Elements) - Remove confusing unsigned type rows from type-system.md table; add note pointing to reference spec for unsigned type guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in type-system.md: kept simpler unsigned type note. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates for PostgreSQL backend compatibility:
1. Update notebooks for cross-backend SQL syntax
- Replace double-quoted string literals with dict syntax
- Affects: 01-first-pipeline, 03-data-entry, allen-ccf,
university, languages notebooks
2. Add SQL function translation section to database-backends.md
- Documents GROUP_CONCAT ↔ STRING_AGG translation
- Bidirectional translation for portable code
3. Update query-algebra.md spec with subquery wrapping rules
- Document when operators modify-in-place vs wrap as subquery
- Document backend-specific transpilation (HAVING clause)
4. Add drop_tutorial_schemas.py script
- Utility to clean up tutorial schemas before notebook runs
Notebooks tested against PostgreSQL: 20/21 pass
(json-type.ipynb is MySQL-specific, expected failure)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor the JSON tutorial to: - Focus on insert/fetch of entire JSON objects - Remove backend-specific SQL restriction examples - Use simpler Equipment/specs example instead of RC car race story - Show filtering in Python after fetching - Add clear guidelines for when to use JSON vs normalized tables Works identically on both MySQL and PostgreSQL. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Save notebook outputs from PostgreSQL execution for 2.1 release. All 21 notebooks pass on PostgreSQL backend. This establishes PostgreSQL as the reference backend for CI. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Re-execute all notebooks with DJ_USE_TLS=false to eliminate SSL warnings - Add backend compatibility admonition to first tutorial noting MySQL and PostgreSQL support (PostgreSQL added in 2.1) - Update execute_notebooks.py to disable TLS for tutorial containers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tebooks) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 21 notebooks now pass on PostgreSQL with: - Working schema diagrams - TIMESTAMPDIFF/CURDATE date function translations - No SSL warnings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 21 notebooks pass with: - Fixed TIMESTAMPDIFF/CURDATE translations - YEAR()/MONTH()/DAY() function support - SUM(boolean) handling for PostgreSQL Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Diagrams now render with correct table colors and styling for all table tiers (Manual, Lookup, Imported, Computed, Part). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- semantic-matching.md: clarify applies to all binary operators, not just joins - semantic-matching.md: change "replacing current rules" to "replaces pre-2.0 rules" - job-metadata.md: rename "Current/Proposed implementation" to "Pre-2.0/DataJoint 2.0 implementation" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MilagrosMarin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkdocs.yaml:217 - Should datajoint_version be updated to "2.1" to reflect the PostgreSQL content in this PR?
extra:
datajoint_version: "2.1" # Updated from "2.0"Or is this intentionally kept at "2.0" since that's the documentation baseline (per about/versioning.md)?
Just want to clarify the intent here.
MilagrosMarin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query-algebra.md Section 13.6 - The backend-specific SQL transpilation details (identifier quoting, HAVING alias handling, function translation) are thorough and valuable.
Question: Is this the intended location for these details, or would they fit better in database-backends.md alongside the other backend-specific behavior documentation?
Either location works—just checking if this was a deliberate choice to keep all SQL generation details together in the query-algebra spec.
|
Re: mkdocs.yaml:217 ( The This approach lets users on 2.0 use the documentation while clearly seeing which features require upgrading to 2.1. |
|
Re: query-algebra.md Section 13.6 location This was a deliberate choice following our Diataxis-based organization. Both files are specification pages (Reference category), but they serve different purposes:
The overlap in function translation is intentional: |
|
Update: Changed On reflection, the displayed version should indicate what the documentation covers, not the baseline. Since this PR documents 2.1 features (PostgreSQL), displaying "2.1" is more accurate for users. The baseline concept (2.0 features unmarked, 2.1+ with admonitions) remains—it's about how features are marked, not what version to display. I've updated |
The displayed version should indicate what the documentation covers, not just the baseline. Since this PR documents 2.1 features (PostgreSQL), displaying "2.1" is more accurate for users. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Explains how DataJoint 2.0 identifies blob columns via `:<blob>:` comment markers. Without these markers, legacy blobs are treated as raw binary and not deserialized. Added: - "Column Comment Format" section explaining the `:type:` format - Usage examples for check_migration_status(), migrate_columns(), and migrate_blob_columns() from datajoint.migrate - Expanded "Option B: In-Place Migration" with step-by-step using actual migration functions (backup_schema, migrate_columns, rebuild_lineage, migrate_external, verify_schema_v20) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b43e655 to
b0b3ed7
Compare
Summary
This PR adds documentation for DataJoint 2.1 features, primarily PostgreSQL multi-backend support and updated tutorials that work identically on both MySQL and PostgreSQL.
Changes
New Documentation Files
src/reference/specs/database-backends.md— Specification for supported database backendsUpdated Tutorials
All tutorials have been verified to work on both MySQL and PostgreSQL:
src/tutorials/advanced/json-type.ipynb— Simplified to focus on storage patternsModified Reference Files
src/about/versioning.mdsrc/how-to/configure-database.mddatabase.backendsetting to configuration tableDJ_BACKEND)src/how-to/migrate-to-v20.md:<type>:comment format used by DataJoint 2.0:<blob>:markers (without them, blobs return raw bytes)check_migration_status(),migrate_columns(),migrate_blob_columns()datajoint.migratefunctions:backup_schema(),migrate_columns(),rebuild_lineage(),migrate_external(),verify_schema_v20()src/reference/specs/type-system.mdsrc/explanation/type-system.mdsrc/reference/specs/index.mdmkdocs.yamldatajoint_versionto "2.1" (indicates latest documented version)Testing Infrastructure
docker-compose.yaml— MySQL 8.0, PostgreSQL 15, MinIO containers for testingscripts/execute_notebooks.py— Notebook execution script with backend selectionVersion Indicators
All PostgreSQL-related content includes proper version admonitions per
about/versioning.md:!!! version-added "New in 2.1" PostgreSQL is now supported as an alternative database backend.Backend-Agnostic Design
The tutorials follow these principles for backend compatibility:
Related
Test Plan
🤖 Generated with Claude Code