Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 20, 2026

Summary

This PR contains comprehensive documentation updates for DataJoint 2.0/2.1:

  • Standardize storage terminology: "in-table" and "in-store" instead of "internal/external"
  • Add comprehensive documentation for foreign key modifiers ([nullable], [unique], [nullable, unique])
  • Refactor semantic matching documentation into explanation and reference pages
  • Add PostgreSQL multi-backend documentation and infrastructure for DataJoint 2.1
  • Add notebook execution infrastructure for testing against MySQL and PostgreSQL
  • Document string quoting differences between MySQL and PostgreSQL

Changes

PostgreSQL Multi-Backend Support (New in 2.1)

  • Add reference/specs/database-backends.md specification covering:

    • Supported backends (MySQL 8.0+, PostgreSQL 15+)
    • Configuration via database.backend setting or DJ_BACKEND env var
    • Adapter architecture with Mermaid diagram
    • Type mapping between core types and native backend types
    • String quoting differences (critical for PostgreSQL compatibility)
    • Connection management patterns
    • Testing against multiple backends
  • Update how-to/configure-database.md with backend selection

  • Update how-to/migrate-to-v20.md with PostgreSQL string quoting guidance:

    • Replace double quotes with single quotes in SQL restriction strings
    • MySQL allows both; PostgreSQL interprets double quotes as identifiers
    • Added to AI agent prompts for automated migration
  • Reorganize specs nav: remove single-item "Foundation" folder, promote "Database Backends" to top level

Notebook Execution Infrastructure

  • Add scripts/execute_notebooks.py for automated notebook execution
  • Update docker-compose.yaml with execution modes:
    • MODE="EXECUTE" - Execute notebooks against MySQL
    • MODE="EXECUTE_PG" - Execute notebooks against PostgreSQL
  • Add PostgreSQL 15 service to docker-compose
  • Install datajoint[postgres] extra in EXECUTE_PG mode

Terminology Updates

  • Replace "external storage" → "in-store" throughout docs
  • Replace "internal storage" → "in-table" for consistency
  • Standardize "functional dependency" instead of "determines" in primary-keys spec
  • Rename "Clean Up External Storage" → "Clean Up Object Storage"

Foreign Key Modifiers Documentation

  • Add FK modifier section to how-to/define-tables.md
  • Add FK modifier section to tutorials/basics/02-schema-design.ipynb
  • Expand how-to/model-relationships.ipynb with nullable unique explanation
  • Add modifier syntax to grammar in reference/definition-syntax.md
  • Add section 6.7 (Nullable Unique FK) to reference/specs/table-declaration.md
  • Document that multiple NULLs allowed in [nullable, unique] FK

Semantic Matching Refactor

  • Create new explanation/semantic-matching.md with conceptual content
  • Simplify reference/specs/semantic-matching.md to specification only
  • Add arXiv:1807.11104 reference where concepts were introduced
  • Document that all binary operators use semantic matching

Other Fixes

  • Fix broken bullet lists across multiple pages
  • Change university.ipynb course type from int64 to int16 (portable across backends)
  • Update distributed computing threshold from 1 minute to 1 second
  • Execute all notebooks against MySQL (21/21 passing)

Files Added

  • src/reference/specs/database-backends.md - Database backends specification
  • src/explanation/semantic-matching.md - Semantic matching conceptual explanation
  • scripts/execute_notebooks.py - Notebook execution script

Test Plan

  • Verify docs build without errors
  • Check that all internal links work
  • Review terminology consistency across pages
  • Execute all 21 notebooks against MySQL (all passing)
  • Execute notebooks against PostgreSQL (blocked by datajoint-python adapter bugs)

Related

  • PostgreSQL adapter bugs identified in datajoint-python:
    • ENUM type generation
    • ON DUPLICATE KEY UPDATE vs ON CONFLICT syntax
    • String quoting in LIKE patterns
    • Index syntax differences

🤖 Generated with Claude Code

Terminology changes:
- Replace "external storage" with "in-store" throughout documentation
- Replace "internal storage" with "in-table" for consistency
- Standardize on "functional dependency" instead of "determines"
- Update garbage collection page title to "Clean Up Object Storage"

Foreign key modifiers:
- Add comprehensive documentation for [nullable], [unique], and
  [nullable, unique] modifiers across tutorials, how-tos, and reference
- Document that nullable unique FKs allow multiple NULL values
  (SQL UNIQUE doesn't consider NULLs equal)
- Add FK modifier syntax to grammar in definition-syntax.md

Semantic matching:
- Create new explanation/semantic-matching.md with conceptual content
- Refactor reference/specs/semantic-matching.md to focus on specification
- Add reference to arXiv:1807.11104 where concepts were introduced
- Document that all binary operators use semantic matching

Other fixes:
- Fix broken bullet lists across multiple pages
- Change university.ipynb course type from int64 to int16
- Update distributed computing threshold from 1 minute to 1 second

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko changed the base branch from pre/v2.0 to main January 20, 2026 16:40
Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/explanation/semantic-matching.md:32

The arXiv citation appears to have an incorrect paper title. arXiv:1807.11104 is titled "DataJoint: A Simpler Relational Data Model" (the 2018 paper about the core framework), not "DataJoint Elements: Data Workflows for Neurophysiology" (Elements is a separate, later project).

Suggested fix:

[^1]: Yatsenko D, Walker EY, Tolias AS (2018). DataJoint: A Simpler Relational Data Model. [arXiv:1807.11104](https://doi.org/10.48550/arXiv.1807.11104)

Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/explanation/type-system.md:65-68

The unsigned rows in this table are confusing:

| `int16` | 8-bit unsigned | 0 to 255 |

This reads as if int16 is an 8-bit unsigned type. The intent is "use int16 when you need 8-bit unsigned range," but the format doesn't convey that.

Consider reformatting or removing these rows (the reference spec at reference/specs/type-system.md:111-114 already covers this clearly).

.gitignore Outdated
# Generated documentation files
src/llms-full.txt
site/llms-full.txt
site/llms-full.txtdj_local_conf.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline — two entries are concatenated:

site/llms-full.txtdj_local_conf.json

Should be:

site/llms-full.txt
dj_local_conf.json

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri-yatsenko you need to fix this before I can approve all comments. Thanks!

dimitri-yatsenko added a commit that referenced this pull request Jan 20, 2026
- Fix arXiv citation in semantic-matching.md: correct paper title to
  "DataJoint: A Simpler Relational Data Model" (not Elements)
- Remove confusing unsigned type rows from type-system.md table;
  add note pointing to reference spec for unsigned type guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/how-to/create-custom-codec.md:69

Minor terminology inconsistency — this line uses "internal storage":

- `"<blob>"` — Chain to blob codec (internal storage)

The source code (builtin_codecs.py) uses "in-table storage" for <blob>, and the PR's goal is to standardize on "in-table" / "in-store" terminology. Consider:

- `"<blob>"` — Chain to blob codec (in-table storage)

Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/how-to/define-tables.md:140

Two issues here:

  1. int64 is duplicated (typo)
  2. Same terminology issue as in type-system.md — signed types labeled as "Unsigned integers"
| `int16`, `int32`, `int64`, `int64` | Unsigned integers |

Copy link
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive docs update @dimitri-yatsenko! I've finished my review and left a few suggestions. Let me know if you have any questions!

- Fix arXiv citation in semantic-matching.md: correct paper title to
  "DataJoint: A Simpler Relational Data Model"
- Remove confusing unsigned type rows from type-system.md; add note that
  unsigned types are not provided
- Fix terminology in create-custom-codec.md: "internal storage" → "in-table storage"
- Remove erroneous unsigned integers row from define-tables.md (had duplicate
  int64 and incorrect "Unsigned integers" label)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
.gitignore Outdated
# Generated documentation files
src/llms-full.txt
site/llms-full.txt
site/llms-full.txtdj_local_conf.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri-yatsenko you need to fix this before I can approve all comments. Thanks!

Separate site/llms-full.txt and dj_local_conf.json entries.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dimitri-yatsenko
Copy link
Member Author

Changes from this PR have been merged into PR #130 (pre/v2.1 branch) via commit e9d31d7. Closing as superseded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants