Skip to content

Conversation

@zstack-robot-2
Copy link
Collaborator

support thirdparty_ceph to bm root volume

Resolves: ZSTAC-73396

Change-Id: I6f6f616777677a6f74696c6c736c737a626f7863
(cherry picked from commit 827e880)

sync from gitlab !9033

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

新增数据库升级存储过程以调整 BareMetal2 实例/机箱网卡表结构;在 Ceph 主存储引入扩展点以支持基于实例类型的第三方克隆路径并填充块卷元数据;新增两个 BareMetal2 相关云操作错误码常量。

Changes

Cohort / File(s) 变更摘要
数据库模式升级
conf/db/upgrade/V5.5.6__schema.sql
添加存储过程 UpdateBareMetal2InstanceProvisionNicVO():包含事务与 SQLEXCEPTION 处理,按条件新增列 instanceUuidisPrimaryProvisionNic、复制并迁移数据、删除/重建外键与索引、重生 uuid,执行后提交并删除该存储过程。
Ceph 存储扩展点与克隆逻辑
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
新增接口 CephPrimaryStorageCheckInstanceTypeExtensionPoint(4 个方法);在 CephPrimaryStorageBase.CloneRsp 中新增 volumeIdvolumeStatus 字段及访问器;在根卷克隆与从快照创建卷流程中集成扩展点以支持基于实例类型的第三方克隆、块卷转换与块卷元数据填充。
错误码常量
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
新增常量 ORG_ZSTACK_BAREMETAL2_INSTANCE_10088ORG_ZSTACK_BAREMETAL2_INSTANCE_10089

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CephStorage as Ceph Primary Storage
    participant ExtPoint as Extension Point
    participant ThirdParty as Third-party Service
    participant BlockVolume as Block Volume Handler

    Client->>CephStorage: 发起克隆请求(根卷)
    CephStorage->>ExtPoint: isSupportCloneByThirdParty(vmUuid)?
    alt 支持第三方克隆
        ExtPoint-->>CephStorage: true
        CephStorage->>ThirdParty: 获取 Mon IP 与 Token / 发起第三方克隆
        ThirdParty-->>CephStorage: 返回 Token / 克隆结果 (volumeId, volumeStatus)
        CephStorage->>ExtPoint: convertToBlockVolume(volumeVO)
        ExtPoint-->>BlockVolume: 执行块卷转换
        BlockVolume-->>ExtPoint: 转换完成
        CephStorage->>ExtPoint: populateBlockVolumeDetails(uuid, volumeId, volumeStatus)
        ExtPoint-->>CephStorage: 元数据填充完成
    else 不支持第三方
        ExtPoint-->>CephStorage: false
        CephStorage->>CephStorage: 使用默认克隆路径
    end
    CephStorage-->>Client: 返回 CloneRsp(含 volumeId & volumeStatus)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 表格翻新夜未央,约束迁移步轻狂。
第三方桥接云上忙,卷成块状细填章。
小兔一跃合并旁,代码花开又一方。

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题完全符合指定格式 [scope]: ,长度54个字符,不超过72个字符限制。标题清晰地总结了主要变更:为裸金属(bm)根卷添加第三方Ceph支持。
Description check ✅ Passed 拉取请求描述与变更集高度相关,提供了问题追踪ID (ZSTAC-73396) 和变更来源 (GitLab !9033),清晰表明意图是为裸金属根卷添加第三方Ceph支持。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)

2661-2712: 扩展点遍历的 break + Boolean 直接取反会导致扩展失效或 NPE。
当前逻辑在任意扩展返回 false 时直接 break,会跳过后续扩展;同时 !Boolean 对 null 会抛 NPE。建议改为“null-safe + continue”。

🔧 建议修改
-                        for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
-                            if (!result) {
-                                break;
-                            }
+                        for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
+                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
+                            if (!Boolean.TRUE.equals(result)) {
+                                continue;
+                            }
                             boolean isRootVolume = Q.New(VolumeVO.class)
                                     .eq(VolumeVO_.uuid, msg.getVolume().getUuid())
                                     .eq(VolumeVO_.type, VolumeType.Root)
                                     .isExists();
@@
-                                for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                                    if (!ext.isBlockVolume(msg.getVolume().getUuid())) {
-                                        break;
-                                    }
+                                for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
+                                    if (!Boolean.TRUE.equals(ext.isBlockVolume(msg.getVolume().getUuid()))) {
+                                        continue;
+                                    }
                                     volumePath = ret.installPath == null ? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName);
                                     ext.populateBlockVolumeDetails(msg.getVolume().getUuid(), ret.volumeId, ret.volumeStatus);
                                 }
🤖 Fix all issues with AI agents
In `@conf/db/upgrade/V5.5.6__schema.sql`:
- Around line 36-40: Wrap the DROP FOREIGN KEY in a presence check against
INFORMATION_SCHEMA.TABLE_CONSTRAINTS for constraint name
fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO, and only execute the DROP if the constraint
exists; leave the subsequent CALL
ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO',
'fkBareMetal2InstanceProvisionNicVOInstanceVO', 'instanceUuid',
'BareMetal2InstanceVO', 'uuid', 'CASCADE') unchanged. Use a conditional SELECT
(e.g., COUNT(*) from INFORMATION_SCHEMA.TABLE_CONSTRAINTS filtering by
CONSTRAINT_NAME and TABLE_NAME/schema) and execute the ALTER TABLE ... DROP
FOREIGN KEY only when the count > 0 to ensure idempotency.
🧹 Nitpick comments (2)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java (1)

5-12: 补充接口方法的 Javadoc 注释。
编码规范要求接口方法必须有有效的 Javadoc,目前缺失。根据编码规范。

plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)

2671-2674: 避免硬编码 baremetal2,优先使用常量。
如果已有 VM 类型常量(例如 VmInstanceConstant 或其他统一定义),建议替换以降低维护成本与拼写风险。根据编码规范。

📜 Review details

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d590879 and 89ee173.

⛔ Files ignored due to path filters (3)
  • sdk/src/main/java/org/zstack/sdk/BareMetal2ChassisNicInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceProvisionNicInventory.java is excluded by !sdk/**
📒 Files selected for processing (3)
  • conf/db/upgrade/V5.5.6__schema.sql
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.*

⚙️ CodeRabbit configuration file

**/*.*: - 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写

Files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • conf/db/upgrade/V5.5.6__schema.sql
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
**/*.java

⚙️ CodeRabbit configuration file

**/*.java: ## 1. API 设计要求

  • API 命名:
    • API 名称必须唯一,不能重复。
    • API 消息类需要继承 APIMessage;其返回类必须继承 APIReplyAPIEvent,并在注释中用 @RestResponse 进行标注。
    • API 消息上必须添加注解 @RestRequest,并满足如下规范:
      • path:
        • 针对资源使用复数形式。
        • 当 path 中引用消息类变量时,使用 {variableName} 格式。
      • HTTP 方法对应:
        • 查询操作 → HttpMethod.GET
        • 更新操作 → HttpMethod.PUT
        • 创建操作 → HttpMethod.POST
        • 删除操作 → HttpMethod.DELETE
    • API 类需要实现 __example__ 方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。

2. 命名与格式规范

  • 类名:

    • 使用 UpperCamelCase 风格。
    • 特殊情况:
      • VO/AO/EO 类型类除外。
      • 抽象类采用 AbstractBase 前缀/后缀。
      • 异常类应以 Exception 结尾。
      • 测试类需要以 TestCase 结尾。
  • 方法名、参数名、成员变量和局部变量:

    • 使用 lowerCamelCase 风格。
  • 常量命名:

    • 全部大写,使用下划线分隔单词。
    • 要求表达清楚,避免使用含糊或不准确的名称。
  • 包名:

    • 统一使用小写,使用点分隔符,每个部分应是一个具有自然语义的英文单词(参考 Spring 框架的结构)。
  • 命名细节:

    • 避免在父子类或同一代码块中出现相同名字的成员或局部变量,防止混淆。
    • 命名缩写:
      • 不允许使用不必要的缩写,如:AbsSchedulerJobcondiFu 等。应使用完整单词提升可读性。

3. 编写自解释代码

  • 意图表达:

    • 避免使用布尔型参数造成含义不明确。例如:
      • 对于 stopAgent(boolean ignoreError),建议拆分为不同函数(如 stopAgentIgnoreError()),或使用枚举表达操作类型。
    • 命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途(例如在常量与变量名称中,将类型词放在末尾)。
    • 避免使用魔法值(Magic Value):
      • 直接使用未经定义的数值或字符串(如 if (status == 5))应替换为枚举或常量。

      • 示例:

      • // 错误示例:魔法值

      • if (user.getStatus() == 5) { ... }

      • // 正确示例:常量或枚举

      • public static final int STATUS_ACTIVE = 5;

      • if (user.getStatus() == STATUS_ACTIVE) { ... }

      • // 或使用枚举

      • enum UserStatus { ACTIVE, INACTIVE }

  • 注释:

    • 代码应尽量做到自解释,对少于两行的说明可以直接写在代码中。
    • 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。
    • 接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。

4. 流程控制和结构优化

  • if...else 的使用:

    • 应尽量减少 if...else 结构的使用,建议:
      • 限制嵌套层级最多为两层,且内层不应再出现 else 分支。
      • 尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。
      • 使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。
  • 条件判断:

    • if 条件表达不宜过长或过于复杂,必要时可以将条件抽成 boolean 变量描述。
  • 代码块长度:
    ...

Files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
**/*.sql

⚙️ CodeRabbit configuration file

**/*.sql: - Review the SQL code, make sure has no errors and confirm that:

  • Upgrading scene has been carefully handled
  • Do not use DEFAULT 0000-00-00 00:00:00 , use DEFAULT CURRENT_TIMESTAMP instead
  • When NOT NULL exists, must use stored procedure or other functions to process historical data, this is very very important
  • 数据库记录中,如果字符串长度不可控,不要用 vchar,用 text 类型

Files:

  • conf/db/upgrade/V5.5.6__schema.sql
🧠 Learnings (8)
📓 Common learnings
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2360
File: network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java:449-490
Timestamp: 2025-08-04T04:48:19.103Z
Learning: ZStack项目在cherry-pick操作中,即使发现了性能优化机会(如IP地址批量保存的内存优化),也严格遵循不做额外修改的政策,优先保证cherry-pick的完整性和一致性。
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-14T06:56:19.585Z
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2435
File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47
Timestamp: 2025-08-14T06:56:19.585Z
Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: 在ZStack项目中,同一包内的类(如VolumeSnapshotGroupVO)不需要显式导入。同时,uuid()辅助方法在API消息类的__example__()方法中可以直接使用,无需静态导入,这是通过项目的构建系统或框架配置实现的。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: In the ZStack project, API message classes may not require explicit imports for VolumeSnapshotGroupVO references in APIParam annotations or for the uuid() helper method used in __example__() methods, even when these classes/methods are defined in other packages. The build system or framework appears to handle these references automatically.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-25T03:52:37.301Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2504
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819
Timestamp: 2025-08-25T03:52:37.301Z
Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-07-22T02:30:46.123Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-12T05:52:18.323Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885
Timestamp: 2025-08-12T05:52:18.323Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
🧬 Code graph analysis (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (2)
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java (2)
  • VmInstanceSpec (25-911)
  • ImageSpec (123-164)
core/src/main/java/org/zstack/core/db/Q.java (1)
  • Q (16-193)
🔍 Remote MCP AtlassianMCP

Now I have gathered sufficient context. Let me compile the findings into a comprehensive summary.

Additional Context for PR #3214 Review

Jira Issue Background

The PR addresses ZSTAC-73396 (with related issue ZSTAC-81282 for version 5.5.6), a P0 priority feature request titled: "【中信保】弹性裸金属支持使用云盘作为系统盘且支持HA" (Support for elastic bare-metal instances to use high-speed cloud disks as root volumes with HA).

Business Context:

  • Critical Urgency: A customer requires elastic bare-metal instance support with cloud disk root volumes to deploy business services, with project completion timeline impacting revenue recovery of ~1 million+ total and ~400K+ for ZStack
  • Blocking Issues:
    • I/O errors occurring with cloud disk as bare-metal root volumes (TIC-3691)
    • Single point of failure risk in bare-metal gateway nodes (no HA groups)
    • Single network path risk (no aggregation possible)
    • Customer views current solution (legacy cloud disk as system disk) as insufficient for production use

Solution Approach:

  • Develop "high-speed cloud disk support for elastic bare-metal root volumes" based on version 4.8.0
  • Avoid gateway node bottlenecks by implementing direct third-party Ceph integration
  • Resolve single link and single point of failure issues
  • Target date: March 12 (original plan, now implemented in 5.5.6)

Issue Status: Closed (Fixed) as of May 6, 2025; Related merge requests merged on November 11, 2025 for version 4.8.30

PR Scope and Key Components

The PR implements support for third-party Ceph direct integration with bare-metal root volumes through three main technical components:

  1. Database Schema Migration (V5.5.6__schema.sql):

    • Creates stored procedure UpdateBareMetal2InstanceProvisionNicVO for transactional schema updates
    • Adds instanceUuid and isPrimaryProvisionNic columns to BareMetal2InstanceProvisionNicVO and BareMetal2ChassisNicVO
    • Establishes foreign key constraint linking instanceUuid to BareMetal2InstanceVO with CASCADE delete
    • Ensures data consistency through transactions with rollback on errors
  2. Extension Point Architecture (New interface):

    • Introduces CephPrimaryStorageCheckInstanceTypeExtensionPoint interface for extensible Ceph storage handling
    • Enables third-party implementations to hook into clone operations and block volume management
    • Four key methods: isSupportCloneByThirdParty(), convertToBlockVolume(), isBlockVolume(), populateBlockVolumeDetails()
  3. Ceph Primary Storage Enhancements (CephPrimaryStorageBase.java):

    • Adds volumeId and volumeStatus fields to inner class CloneRsp for block volume responses
    • Integrates extension point checks into root volume cloning flow
    • Routes compatible volumes to third-party cloning mechanisms when conditions are met (baremetal2 instances with valid authentication tokens)
    • Populates block volume metadata on successful clone operations

Related Technical Documentation

  • Bare-metal instance provisioning relies on BareMetal2InstanceAllocatePrimaryStorageFlow for primary storage allocation strategy
  • Block volume storage domains now include dependencies on baremetal2 instances and Ceph primary storage pools
  • The implementation supports the new bare-metal elastic compute model requiring direct storage I/O paths
🔇 Additional comments (3)
conf/db/upgrade/V5.5.6__schema.sql (3)

1-5: LGTM!

使用 MEDIUMTEXT 类型来存储可能较大的 stdout/stderr 输出是合适的选择,字符集配置也正确。


15-19: LGTM!

事务处理和异常处理逻辑正确,使用 EXIT HANDLER 配合 ROLLBACK 确保了数据一致性。


46-73: LGTM!

isPrimaryProvisionNic 列的添加逻辑正确:

  • 通过 INFORMATION_SCHEMA 检查列是否存在,保证幂等性
  • 使用 UPDATE 语句正确处理历史数据
  • BareMetal2ChassisNicVO 的条件更新 (WHERE isProvisionNic = TRUE) 逻辑合理

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +36 to +43
ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;

CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO',
'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

幂等性问题:DROP FOREIGN KEY 前应检查约束是否存在。

当前实现直接执行 DROP FOREIGN KEY,但未检查该外键约束是否存在。如果升级脚本部分执行后失败重试,或者数据库状态与预期不同,此操作会失败并导致整个迁移回滚。

根据 ZStack 项目的最佳实践,应通过 INFORMATION_SCHEMA.TABLE_CONSTRAINTS 检查约束存在性后再删除。

🐛 建议的修复方案
+       -- Check if FK exists before dropping
+       SET `@fk_exists` = (SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS 
+                         WHERE CONSTRAINT_SCHEMA = 'zstack' 
+                         AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO' 
+                         AND CONSTRAINT_NAME = 'fkBareMetal2InstanceProvisionNicVOInstanceVO'
+                         AND CONSTRAINT_TYPE = 'FOREIGN KEY');
+       
+       IF `@fk_exists` > 0 THEN
            ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
            DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;
+       END IF;

        CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO',
                                    'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE');
🤖 Prompt for AI Agents
In `@conf/db/upgrade/V5.5.6__schema.sql` around lines 36 - 40, Wrap the DROP
FOREIGN KEY in a presence check against INFORMATION_SCHEMA.TABLE_CONSTRAINTS for
constraint name fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO, and only execute the DROP if the constraint
exists; leave the subsequent CALL
ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO',
'fkBareMetal2InstanceProvisionNicVOInstanceVO', 'instanceUuid',
'BareMetal2InstanceVO', 'uuid', 'CASCADE') unchanged. Use a conditional SELECT
(e.g., COUNT(*) from INFORMATION_SCHEMA.TABLE_CONSTRAINTS filtering by
CONSTRAINT_NAME and TABLE_NAME/schema) and execute the ALTER TABLE ... DROP
FOREIGN KEY only when the count > 0 to ensure idempotency.

@MatheMatrix MatheMatrix force-pushed the sync/jin.shen/fix-81282@@3 branch 2 times, most recently from 58541c8 to 4c7969d Compare January 16, 2026 09:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)

2661-2712: 修复 Boolean 判空和改正扩展点迭代逻辑。

当前代码存在两个问题:

  1. NPE 风险确认:方法返回 Boolean 类型(可为 null),直接使用 !result!ext.isBlockVolume(...) 会在返回 null 时触发 NPE,应使用 Boolean.TRUE.equals(...) 安全判空。

  2. Break 中断扩展链:使用 break 会在第一个扩展返回 false 时停止所有后续扩展的处理。根据扩展点设计原则,应允许每个扩展独立判断是否处理(改为 continue),而非在第一个不支持时中止整条链。

🛠️ 建议修复
                         for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
-                            if (!result) {
-                                break;
+                            if (!Boolean.TRUE.equals(ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid()))) {
+                                continue;
                             }
                             boolean isRootVolume = Q.New(VolumeVO.class)
                                     .eq(VolumeVO_.uuid, msg.getVolume().getUuid())
                                     .eq(VolumeVO_.type, VolumeType.Root)
                                     .isExists();
@@
                                 for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                                    if (!ext.isBlockVolume(msg.getVolume().getUuid())) {
-                                        break;
+                                    if (!Boolean.TRUE.equals(ext.isBlockVolume(msg.getVolume().getUuid()))) {
+                                        continue;
                                     }
                                     volumePath = ret.installPath == null ? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName);
                                     ext.populateBlockVolumeDetails(msg.getVolume().getUuid(), ret.volumeId, ret.volumeStatus);
                                 }
🤖 Fix all issues with AI agents
In `@conf/db/upgrade/V5.5.6__schema.sql`:
- Around line 33-43: The migration updates BareMetal2InstanceProvisionNicVO.uuid
directly but ignores referencing rows in
BareMetal2BondingNicRefVO.provisionNicUuid (FK from V4.7.13) which lacks ON
UPDATE CASCADE; before running the UPDATE that sets `uuid = REPLACE(UUID(), '-',
'')` in BareMetal2InstanceProvisionNicVO, either (A) compute and apply the same
uuid-mapping to update BareMetal2BondingNicRefVO.provisionNicUuid to the new
UUIDs so references remain valid, or (B) DROP the foreign key
fkBareMetal2BondingNicRefVOProvisionNic (remove the FK constraint), perform the
uuid update on BareMetal2InstanceProvisionNicVO, then re-CALL ADD_CONSTRAINT to
recreate the FK (or create it with ON UPDATE CASCADE) afterwards; reference the
tables BareMetal2InstanceProvisionNicVO, BareMetal2BondingNicRefVO and the
ALTER/DROP FOREIGN KEY + CALL ADD_CONSTRAINT statements in the diff when making
the change.
♻️ Duplicate comments (1)
conf/db/upgrade/V5.5.6__schema.sql (1)

36-38: 删除外键前需做存在性检查以保证幂等。

若升级失败后重跑,这里直接 DROP FOREIGN KEY 可能报错并中断迁移。建议先在 INFORMATION_SCHEMA 判断外键是否存在再删除。基于已记录的 learnings。

📜 Review details

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89ee173 and 58541c8.

⛔ Files ignored due to path filters (3)
  • sdk/src/main/java/org/zstack/sdk/BareMetal2ChassisNicInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceProvisionNicInventory.java is excluded by !sdk/**
📒 Files selected for processing (4)
  • conf/db/upgrade/V5.5.6__schema.sql
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.*

⚙️ CodeRabbit configuration file

**/*.*: - 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写

Files:

  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
  • conf/db/upgrade/V5.5.6__schema.sql
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
**/*.java

⚙️ CodeRabbit configuration file

**/*.java: ## 1. API 设计要求

  • API 命名:
    • API 名称必须唯一,不能重复。
    • API 消息类需要继承 APIMessage;其返回类必须继承 APIReplyAPIEvent,并在注释中用 @RestResponse 进行标注。
    • API 消息上必须添加注解 @RestRequest,并满足如下规范:
      • path:
        • 针对资源使用复数形式。
        • 当 path 中引用消息类变量时,使用 {variableName} 格式。
      • HTTP 方法对应:
        • 查询操作 → HttpMethod.GET
        • 更新操作 → HttpMethod.PUT
        • 创建操作 → HttpMethod.POST
        • 删除操作 → HttpMethod.DELETE
    • API 类需要实现 __example__ 方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。

2. 命名与格式规范

  • 类名:

    • 使用 UpperCamelCase 风格。
    • 特殊情况:
      • VO/AO/EO 类型类除外。
      • 抽象类采用 AbstractBase 前缀/后缀。
      • 异常类应以 Exception 结尾。
      • 测试类需要以 TestCase 结尾。
  • 方法名、参数名、成员变量和局部变量:

    • 使用 lowerCamelCase 风格。
  • 常量命名:

    • 全部大写,使用下划线分隔单词。
    • 要求表达清楚,避免使用含糊或不准确的名称。
  • 包名:

    • 统一使用小写,使用点分隔符,每个部分应是一个具有自然语义的英文单词(参考 Spring 框架的结构)。
  • 命名细节:

    • 避免在父子类或同一代码块中出现相同名字的成员或局部变量,防止混淆。
    • 命名缩写:
      • 不允许使用不必要的缩写,如:AbsSchedulerJobcondiFu 等。应使用完整单词提升可读性。

3. 编写自解释代码

  • 意图表达:

    • 避免使用布尔型参数造成含义不明确。例如:
      • 对于 stopAgent(boolean ignoreError),建议拆分为不同函数(如 stopAgentIgnoreError()),或使用枚举表达操作类型。
    • 命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途(例如在常量与变量名称中,将类型词放在末尾)。
    • 避免使用魔法值(Magic Value):
      • 直接使用未经定义的数值或字符串(如 if (status == 5))应替换为枚举或常量。

      • 示例:

      • // 错误示例:魔法值

      • if (user.getStatus() == 5) { ... }

      • // 正确示例:常量或枚举

      • public static final int STATUS_ACTIVE = 5;

      • if (user.getStatus() == STATUS_ACTIVE) { ... }

      • // 或使用枚举

      • enum UserStatus { ACTIVE, INACTIVE }

  • 注释:

    • 代码应尽量做到自解释,对少于两行的说明可以直接写在代码中。
    • 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。
    • 接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。

4. 流程控制和结构优化

  • if...else 的使用:

    • 应尽量减少 if...else 结构的使用,建议:
      • 限制嵌套层级最多为两层,且内层不应再出现 else 分支。
      • 尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。
      • 使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。
  • 条件判断:

    • if 条件表达不宜过长或过于复杂,必要时可以将条件抽成 boolean 变量描述。
  • 代码块长度:
    ...

Files:

  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
**/*.sql

⚙️ CodeRabbit configuration file

**/*.sql: - Review the SQL code, make sure has no errors and confirm that:

  • Upgrading scene has been carefully handled
  • Do not use DEFAULT 0000-00-00 00:00:00 , use DEFAULT CURRENT_TIMESTAMP instead
  • When NOT NULL exists, must use stored procedure or other functions to process historical data, this is very very important
  • 数据库记录中,如果字符串长度不可控,不要用 vchar,用 text 类型

Files:

  • conf/db/upgrade/V5.5.6__schema.sql
🧠 Learnings (10)
📓 Common learnings
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2360
File: network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java:449-490
Timestamp: 2025-08-04T04:48:19.103Z
Learning: ZStack项目在cherry-pick操作中,即使发现了性能优化机会(如IP地址批量保存的内存优化),也严格遵循不做额外修改的政策,优先保证cherry-pick的完整性和一致性。
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: 在 ZStack 数据库升级脚本中,直接使用 RENAME TABLE 不能保证幂等性。应该通过 information_schema.tables 检查表的存在性,只在源表存在且目标表不存在时才执行重命名操作,以确保升级脚本可以安全地重复执行。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T03:42:34.349Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2293
File: conf/db/upgrade/V4.10.16__schema.sql:69-82
Timestamp: 2025-08-03T03:42:34.349Z
Learning: 在 ZStack 项目中进行数据库 DDL 升级时,由于 MySQL/MariaDB 中的 RENAME TABLE、ALTER TABLE 等 DDL 语句会隐式 COMMIT,因此应使用临时表 + 数据复制的方式来确保升级可逆,而不是依赖事务的 ROLLBACK 机制。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-14T06:56:19.585Z
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2435
File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47
Timestamp: 2025-08-14T06:56:19.585Z
Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: 在ZStack项目中,同一包内的类(如VolumeSnapshotGroupVO)不需要显式导入。同时,uuid()辅助方法在API消息类的__example__()方法中可以直接使用,无需静态导入,这是通过项目的构建系统或框架配置实现的。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: In the ZStack project, API message classes may not require explicit imports for VolumeSnapshotGroupVO references in APIParam annotations or for the uuid() helper method used in __example__() methods, even when these classes/methods are defined in other packages. The build system or framework appears to handle these references automatically.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-25T03:52:37.301Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2504
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819
Timestamp: 2025-08-25T03:52:37.301Z
Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-07-22T02:30:46.123Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-12T05:52:18.323Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885
Timestamp: 2025-08-12T05:52:18.323Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
🧬 Code graph analysis (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (2)
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java (2)
  • VmInstanceSpec (25-911)
  • ImageSpec (123-164)
core/src/main/java/org/zstack/core/db/Q.java (1)
  • Q (16-193)
🔍 Remote MCP AtlassianMCP

Additional Jira context retrieved

  • ZSTAC-73396 — Summary: "【中信保】弹性裸金属支持使用云盘作为系统盘且支持HA". Status: Closed. Priority: P0. Assignee: Jin Shen. Description: business-critical customer request to support high-speed cloud disks as bare-metal root volumes; references solution notes on Confluence. Comments reference related merge requests and note PRs merged for this work.

  • ZSTAC-81282 — Summary: "CLONE - 【中信保】弹性裸金属支持使用云盘作为系统盘且支持HA-5.5.6". Status: Open. Priority: P0. Assignee: Jin Shen. Description mirrors ZSTAC-73396 but scoped for 5.5.6.

Sources: AtlassianMCP jira_get_issue for ZSTAC-73396 and ZSTAC-81282.

🔇 Additional comments (5)
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java (1)

7681-7682: LGTM!

新增的错误码常量 ORG_ZSTACK_BAREMETAL2_INSTANCE_10088 遵循现有命名规范(全大写、下划线分隔),序号递增正确(10087 -> 10088),与 PR 中支持第三方 Ceph 作为裸金属根卷的功能相匹配。

conf/db/upgrade/V5.5.6__schema.sql (1)

4-5: stderr 字段字符集调整一致性 OK。

变更仅统一字符集与排序规则,无明显风险。

plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java (1)

5-12: 扩展点接口定义清晰。

方法职责划分明确,便于后续扩展实现。

plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (2)

57-63: 仅新增导入,无需额外评注。


488-517: CloneRsp 新字段与访问器补充到位。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +33 to +46
UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `instanceUuid` = `uuid`;

ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;

CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO',
'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE');

UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `uuid` = REPLACE(UUID(), '-', '');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 检索数据库升级脚本中对 BareMetal2InstanceProvisionNicVO 的引用
echo "=== SQL 脚本中的引用 ==="
rg -n --type=sql 'BareMetal2InstanceProvisionNicVO' -C2

echo ""
echo "=== Java 代码中的引用 ==="
rg -n --type=java 'BareMetal2InstanceProvisionNicVO' -C2

echo ""
echo "=== 相关的字段引用 ==="
rg -n --type=java 'bareMetal2InstanceProvisionNic|provisionNicUuid' -C1

Repository: MatheMatrix/zstack

Length of output: 6855


🏁 Script executed:

cat -n conf/db/upgrade/V5.5.6__schema.sql

Repository: MatheMatrix/zstack

Length of output: 3556


🏁 Script executed:

#!/bin/bash
# 查看 BareMetal2BondingNicRefVO 的表定义
echo "=== BareMetal2BondingNicRefVO 表定义 ==="
rg -n --type=sql 'CREATE TABLE.*BareMetal2BondingNicRefVO|CREATE TABLE.*IF NOT EXISTS.*BareMetal2BondingNicRefVO' -A 20

echo ""
echo "=== 查看所有升级脚本中对 BareMetal2BondingNicRefVO 的修改 ==="
rg -n --type=sql 'BareMetal2BondingNicRefVO' -B 2 -A 2

Repository: MatheMatrix/zstack

Length of output: 3202


🏁 Script executed:

#!/bin/bash
# 查看 V5.5.6 之后的升级脚本是否有处理 provisionNicUuid
echo "=== V5.5.6 之后的升级脚本 ==="
fd --type f 'V[0-9]+\.[0-9]+\.[0-9]+__schema\.sql' conf/db/upgrade | sort | tail -10

echo ""
echo "=== 搜索升级脚本中对 provisionNicUuid 的更新 ==="
rg -n --type=sql 'provisionNicUuid' -B 2 -A 2

echo ""
echo "=== 检查是否在升级脚本中有其他对 BareMetal2BondingNicRefVO 的更新 ==="
rg -n --type=sql 'UPDATE.*BareMetal2BondingNicRefVO|UPDATE.*provisionNicUuid' -B 2 -A 2

Repository: MatheMatrix/zstack

Length of output: 1045


🏁 Script executed:

#!/bin/bash
# 查看外键约束的完整定义
echo "=== 外键约束定义 ==="
rg -n --type=sql "FOREIGN KEY.*BareMetal2InstanceProvisionNicVO.*uuid" -B 2 -A 2

echo ""
echo "=== 确认约束配置中是否有 ON UPDATE CASCADE ==="
rg -n --type=sql "fkBareMetal2BondingNicRefVOProvisionNicVO" -B 2 -A 2

# 查看 ADD_CONSTRAINT 函数的定义
echo ""
echo "=== 查找 ADD_CONSTRAINT 存储过程的定义 ==="
rg -n --type=sql "CREATE PROCEDURE.*ADD_CONSTRAINT|PROCEDURE.*ADD_CONSTRAINT" -A 30 | head -60

Repository: MatheMatrix/zstack

Length of output: 3097


🏁 Script executed:

#!/bin/bash
# 详细查看 ADD_CONSTRAINT 的完整实现
sed -n '187,212p' conf/db/upgrade/beforeMigrate.sql

echo ""
echo "=== 确认当前是否有 ON UPDATE 处理 ==="
grep -n "ON UPDATE" conf/db/upgrade/beforeMigrate.sql

Repository: MatheMatrix/zstack

Length of output: 1118


在重置 uuid 前需同步更新引用该字段的表。

BareMetal2BondingNicRefVO.provisionNicUuid 有外键约束指向 BareMetal2InstanceProvisionNicVO.uuid(见 V4.7.13__schema.sql),但该约束只配置了 ON DELETE CASCADE,没有 ON UPDATE CASCADE(ADD_CONSTRAINT 存储过程仅生成 ON DELETE 子句)。

升级脚本在第 42-43 行直接更新 uuid 为新值,会导致 BareMetal2BondingNicRefVO.provisionNicUuid 指向不存在的 uuid,造成数据完整性破坏。

需在更新 uuid 之前,先同步更新所有引用该字段的表中的值,或在升级脚本中先删除外键再重建。

🤖 Prompt for AI Agents
In `@conf/db/upgrade/V5.5.6__schema.sql` around lines 33 - 43, The migration
updates BareMetal2InstanceProvisionNicVO.uuid directly but ignores referencing
rows in BareMetal2BondingNicRefVO.provisionNicUuid (FK from V4.7.13) which lacks
ON UPDATE CASCADE; before running the UPDATE that sets `uuid = REPLACE(UUID(),
'-', '')` in BareMetal2InstanceProvisionNicVO, either (A) compute and apply the
same uuid-mapping to update BareMetal2BondingNicRefVO.provisionNicUuid to the
new UUIDs so references remain valid, or (B) DROP the foreign key
fkBareMetal2BondingNicRefVOProvisionNic (remove the FK constraint), perform the
uuid update on BareMetal2InstanceProvisionNicVO, then re-CALL ADD_CONSTRAINT to
recreate the FK (or create it with ON UPDATE CASCADE) afterwards; reference the
tables BareMetal2InstanceProvisionNicVO, BareMetal2BondingNicRefVO and the
ALTER/DROP FOREIGN KEY + CALL ADD_CONSTRAINT statements in the diff when making
the change.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)

2662-2712: 扩展点遍历不应被单个返回 false 的实现短路,同时避免空指针。

当前 break 会直接终止遍历,导致后续扩展点无法执行;Boolean 取反也可能因返回 null 触发 NPE。建议改为 continue 并使用空安全判断。

🔧 建议修改
-                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
-                            if (!result) {
-                                break;
-                            }
+                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
+                            if (!Boolean.TRUE.equals(result)) {
+                                continue;
+                            }
@@
-                                    if (!ext.isBlockVolume(msg.getVolume().getUuid())) {
-                                        break;
-                                    }
+                                    if (!Boolean.TRUE.equals(ext.isBlockVolume(msg.getVolume().getUuid()))) {
+                                        continue;
+                                    }
🤖 Fix all issues with AI agents
In `@conf/db/upgrade/V5.5.6__schema.sql`:
- Around line 18-78: The migration wraps DDL (ADD_COLUMN, ALTER TABLE) inside a
START TRANSACTION/COMMIT with a DECLARE EXIT HANDLER FOR SQLEXCEPTION that
attempts ROLLBACK, but MySQL/MariaDB issues implicit commits for DDL so the
ROLLBACK is ineffective; remove the transactional illusion and rewrite the
script into safe, idempotent, stepwise operations: drop START TRANSACTION/COMMIT
and the EXIT HANDLER, ensure each step (checks using INFORMATION_SCHEMA, CALL
ADD_COLUMN for BareMetal2InstanceProvisionNicVO and BareMetal2ChassisNicVO,
ALTER TABLE DROP FOREIGN KEY, CALL ADD_CONSTRAINT, and UPDATE statements) is
guarded by existence checks and can be re-run without side effects (or split
into separate reversible stages), and keep references to the identifiers in this
file (ADD_COLUMN, ADD_CONSTRAINT, ALTER TABLE
`BareMetal2InstanceProvisionNicVO`, UPDATE statements, DECLARE EXIT HANDLER) so
the migration no longer relies on transaction rollback for DDL failures.
♻️ Duplicate comments (2)
conf/db/upgrade/V5.5.6__schema.sql (2)

39-41: 删除外键前需先判断是否存在,保证幂等性。

重复执行升级脚本或异常重试时,直接 DROP FOREIGN KEY 可能失败并中断升级流程。Based on learnings, ...


45-46: 更新 uuid 前需同步更新引用表,避免外键指向失效。

BareMetal2BondingNicRefVO.provisionNicUuid 有外键引用该 uuid,直接重置 uuid 会破坏引用一致性,应先同步更新引用表或重建外键。

📜 Review details

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58541c8 and 4c7969d.

⛔ Files ignored due to path filters (3)
  • sdk/src/main/java/org/zstack/sdk/BareMetal2ChassisNicInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceInventory.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceProvisionNicInventory.java is excluded by !sdk/**
📒 Files selected for processing (4)
  • conf/db/upgrade/V5.5.6__schema.sql
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.*

⚙️ CodeRabbit configuration file

**/*.*: - 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写

Files:

  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • conf/db/upgrade/V5.5.6__schema.sql
**/*.java

⚙️ CodeRabbit configuration file

**/*.java: ## 1. API 设计要求

  • API 命名:
    • API 名称必须唯一,不能重复。
    • API 消息类需要继承 APIMessage;其返回类必须继承 APIReplyAPIEvent,并在注释中用 @RestResponse 进行标注。
    • API 消息上必须添加注解 @RestRequest,并满足如下规范:
      • path:
        • 针对资源使用复数形式。
        • 当 path 中引用消息类变量时,使用 {variableName} 格式。
      • HTTP 方法对应:
        • 查询操作 → HttpMethod.GET
        • 更新操作 → HttpMethod.PUT
        • 创建操作 → HttpMethod.POST
        • 删除操作 → HttpMethod.DELETE
    • API 类需要实现 __example__ 方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。

2. 命名与格式规范

  • 类名:

    • 使用 UpperCamelCase 风格。
    • 特殊情况:
      • VO/AO/EO 类型类除外。
      • 抽象类采用 AbstractBase 前缀/后缀。
      • 异常类应以 Exception 结尾。
      • 测试类需要以 TestCase 结尾。
  • 方法名、参数名、成员变量和局部变量:

    • 使用 lowerCamelCase 风格。
  • 常量命名:

    • 全部大写,使用下划线分隔单词。
    • 要求表达清楚,避免使用含糊或不准确的名称。
  • 包名:

    • 统一使用小写,使用点分隔符,每个部分应是一个具有自然语义的英文单词(参考 Spring 框架的结构)。
  • 命名细节:

    • 避免在父子类或同一代码块中出现相同名字的成员或局部变量,防止混淆。
    • 命名缩写:
      • 不允许使用不必要的缩写,如:AbsSchedulerJobcondiFu 等。应使用完整单词提升可读性。

3. 编写自解释代码

  • 意图表达:

    • 避免使用布尔型参数造成含义不明确。例如:
      • 对于 stopAgent(boolean ignoreError),建议拆分为不同函数(如 stopAgentIgnoreError()),或使用枚举表达操作类型。
    • 命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途(例如在常量与变量名称中,将类型词放在末尾)。
    • 避免使用魔法值(Magic Value):
      • 直接使用未经定义的数值或字符串(如 if (status == 5))应替换为枚举或常量。

      • 示例:

      • // 错误示例:魔法值

      • if (user.getStatus() == 5) { ... }

      • // 正确示例:常量或枚举

      • public static final int STATUS_ACTIVE = 5;

      • if (user.getStatus() == STATUS_ACTIVE) { ... }

      • // 或使用枚举

      • enum UserStatus { ACTIVE, INACTIVE }

  • 注释:

    • 代码应尽量做到自解释,对少于两行的说明可以直接写在代码中。
    • 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。
    • 接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。

4. 流程控制和结构优化

  • if...else 的使用:

    • 应尽量减少 if...else 结构的使用,建议:
      • 限制嵌套层级最多为两层,且内层不应再出现 else 分支。
      • 尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。
      • 使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。
  • 条件判断:

    • if 条件表达不宜过长或过于复杂,必要时可以将条件抽成 boolean 变量描述。
  • 代码块长度:
    ...

Files:

  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
**/*.sql

⚙️ CodeRabbit configuration file

**/*.sql: - Review the SQL code, make sure has no errors and confirm that:

  • Upgrading scene has been carefully handled
  • Do not use DEFAULT 0000-00-00 00:00:00 , use DEFAULT CURRENT_TIMESTAMP instead
  • When NOT NULL exists, must use stored procedure or other functions to process historical data, this is very very important
  • 数据库记录中,如果字符串长度不可控,不要用 vchar,用 text 类型

Files:

  • conf/db/upgrade/V5.5.6__schema.sql
🧠 Learnings (14)
📓 Common learnings
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2360
File: network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java:449-490
Timestamp: 2025-08-04T04:48:19.103Z
Learning: ZStack项目在cherry-pick操作中,即使发现了性能优化机会(如IP地址批量保存的内存优化),也严格遵循不做额外修改的政策,优先保证cherry-pick的完整性和一致性。
📚 Learning: 2025-08-14T06:56:19.585Z
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2435
File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47
Timestamp: 2025-08-14T06:56:19.585Z
Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: 在ZStack项目中,同一包内的类(如VolumeSnapshotGroupVO)不需要显式导入。同时,uuid()辅助方法在API消息类的__example__()方法中可以直接使用,无需静态导入,这是通过项目的构建系统或框架配置实现的。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: In the ZStack project, API message classes may not require explicit imports for VolumeSnapshotGroupVO references in APIParam annotations or for the uuid() helper method used in __example__() methods, even when these classes/methods are defined in other packages. The build system or framework appears to handle these references automatically.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-25T03:52:37.301Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2504
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819
Timestamp: 2025-08-25T03:52:37.301Z
Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-07-22T02:30:46.123Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-12T05:52:18.323Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885
Timestamp: 2025-08-12T05:52:18.323Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。

Applied to files:

  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: 在 ZStack 数据库升级脚本中,直接使用 RENAME TABLE 不能保证幂等性。应该通过 information_schema.tables 检查表的存在性,只在源表存在且目标表不存在时才执行重命名操作,以确保升级脚本可以安全地重复执行。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T03:42:34.349Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2293
File: conf/db/upgrade/V4.10.16__schema.sql:69-82
Timestamp: 2025-08-03T03:42:34.349Z
Learning: 在 ZStack 项目中进行数据库 DDL 升级时,由于 MySQL/MariaDB 中的 RENAME TABLE、ALTER TABLE 等 DDL 语句会隐式 COMMIT,因此应使用临时表 + 数据复制的方式来确保升级可逆,而不是依赖事务的 ROLLBACK 机制。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-10-09T02:23:30.305Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2709
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:5142-5142
Timestamp: 2025-10-09T02:23:30.305Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 getAttachableL3Network() 逻辑中,项目约束是:同一 VmNic 可能同时存在两条 UsedIpVO,但它们只代表同一 L3 的 IPv4/IPv6 双栈,UsedIpVO.l3NetworkUuid 与 VmNicVO.l3NetworkUuid 保持一致;因此使用 VmNicVO_.l3NetworkUuid 列表判断 IPAM 关闭网络是否已附加是正确的,无需通过 UsedIpVO 或 VmNicHelper.getL3Uuids() 额外聚合。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-10T13:42:01.027Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2410
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceHelper.java:334-356
Timestamp: 2025-08-10T13:42:01.027Z
Learning: 在 ZStack 的 `VmInstanceHelper.validateVmNicParams` 方法中,当 vmNicParams 非空但 l3NetworkUuids 为空时的报错文案 "l3NetworkUuids and vmNicInventories mustn't both be empty or both be set" 需要保持不变,以维持与现有 i18n 的一致性,即使文案中提到的是 vmNicInventories 而非 vmNicParams。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-09-05T10:14:54.816Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2566
File: conf/db/upgrade/V5.4.0__schema.sql:33-34
Timestamp: 2025-09-05T10:14:54.816Z
Learning: 在ZStack的HostNetworkInterfaceVO表中,pciDeviceAddress字段出现结尾换行符(\n)的脏数据仅在嵌套环境中出现,删除这些记录是安全的,不会产生影响。这种情况通常发生在主机重新连接时。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-12T05:39:14.846Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3856-3865
Timestamp: 2025-08-12T05:39:14.846Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp 流程中,从数据库加载的 VmNicVO 实例其 getUsedIps() 为空时也不会为 null(返回空集合)。因此无需对 getUsedIps() 再做 Optional/空列表归一化的空指针保护;若找不到对应 NIC,使用 orElse(new VmNicVO()) 的约定允许后续逻辑通过 NPE 暴露问题,与项目既有约定一致。

Applied to files:

  • conf/db/upgrade/V5.5.6__schema.sql
🔍 Remote MCP AtlassianMCP

Relevant Jira context fetched:

  • ZSTAC-73396 — Summary: "【中信保】弹性裸金属支持使用云盘作为系统盘且支持HA". Status: Closed. Priority: P0. Assignee: Jin Shen. Created 2025-02-26; last updated 2026-01-12. Description: business-critical customer request to allow high-performance cloud disks as bare-metal root disks; references Confluence solution notes.

  • ZSTAC-81282 — Summary: "CLONE - 【中信保】弹性裸金属支持使用云盘作为系统盘且支持HA-5.5.6". Status: Open. Priority: P0. Assignee: Jin Shen. Created/updated 2026-01-12. Description: same customer requirement scoped to 5.5.6 (CLONE work). This PR references/resolves that work.

Implications for reviewing this PR:

  • The change implements third-party Ceph support for bare-metal root volumes to satisfy a P0 customer requirement tracked by the above Jira issues; verify behavioral changes (new extension point, CloneRsp fields, DB migration) against the described customer needs and HA/IO concerns in the Jira descriptions.
🔇 Additional comments (4)
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java (1)

7679-7683: 常量新增符合既有命名与排布,没问题。
新增错误码与现有序列一致、命名和值一致,可接受。

plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java (1)

5-13: 接口定义清晰,便于扩展第三方克隆流程。

plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (2)

57-63: 仅新增引用,无需额外关注。


488-517: 新增 CloneRsp 字段与访问器一致,便于回填块卷信息。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +18 to +78
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK;
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'An error occurred during the update process.';
END;

START TRANSACTION;

SELECT COUNT(*)
INTO instanceUuid_exists
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'zstack'
AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO'
AND COLUMN_NAME = 'instanceUuid';

IF instanceUuid_exists = 0 THEN
CALL ADD_COLUMN('BareMetal2InstanceProvisionNicVO', 'instanceUuid', 'VARCHAR(32)', FALSE, '');

UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `instanceUuid` = `uuid`;

ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;

CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO',
'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE');

UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `uuid` = REPLACE(UUID(), '-', '');
END IF;

SELECT COUNT(*)
INTO isPrimaryProvisionNic_exists_in_ProvisionNicVO
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'zstack'
AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO'
AND COLUMN_NAME = 'isPrimaryProvisionNic';

IF isPrimaryProvisionNic_exists_in_ProvisionNicVO = 0 THEN
CALL ADD_COLUMN('BareMetal2InstanceProvisionNicVO', 'isPrimaryProvisionNic', 'BOOLEAN', FALSE, FALSE);

UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `isPrimaryProvisionNic` = TRUE;
END IF;

SELECT COUNT(*)
INTO isPrimaryProvisionNic_exists_in_ChassisNicVO
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'zstack'
AND TABLE_NAME = 'BareMetal2ChassisNicVO'
AND COLUMN_NAME = 'isPrimaryProvisionNic';

IF isPrimaryProvisionNic_exists_in_ChassisNicVO = 0 THEN
CALL ADD_COLUMN('BareMetal2ChassisNicVO', 'isPrimaryProvisionNic', 'BOOLEAN', FALSE, FALSE);

UPDATE `zstack`.`BareMetal2ChassisNicVO`
SET `isPrimaryProvisionNic` = TRUE
WHERE `isProvisionNic` = TRUE;
END IF;

COMMIT;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

DDL 隐式提交会让事务回滚失效。

此处在事务内执行 ALTER TABLE / ADD_COLUMN 等 DDL,MySQL/MariaDB 会隐式 COMMIT,EXIT HANDLER 的 ROLLBACK 无法还原已执行的 DDL,升级中断可能留下半完成状态。建议移除事务假象或改为可逆、分步且幂等的升级方案。Based on learnings, ...

🤖 Prompt for AI Agents
In `@conf/db/upgrade/V5.5.6__schema.sql` around lines 18 - 78, The migration wraps
DDL (ADD_COLUMN, ALTER TABLE) inside a START TRANSACTION/COMMIT with a DECLARE
EXIT HANDLER FOR SQLEXCEPTION that attempts ROLLBACK, but MySQL/MariaDB issues
implicit commits for DDL so the ROLLBACK is ineffective; remove the
transactional illusion and rewrite the script into safe, idempotent, stepwise
operations: drop START TRANSACTION/COMMIT and the EXIT HANDLER, ensure each step
(checks using INFORMATION_SCHEMA, CALL ADD_COLUMN for
BareMetal2InstanceProvisionNicVO and BareMetal2ChassisNicVO, ALTER TABLE DROP
FOREIGN KEY, CALL ADD_CONSTRAINT, and UPDATE statements) is guarded by existence
checks and can be re-run without side effects (or split into separate reversible
stages), and keep references to the identifiers in this file (ADD_COLUMN,
ADD_CONSTRAINT, ALTER TABLE `BareMetal2InstanceProvisionNicVO`, UPDATE
statements, DECLARE EXIT HANDLER) so the migration no longer relies on
transaction rollback for DDL failures.

support thirdparty_ceph to bm root volume

Resolves: ZSTAC-73396

Change-Id: I6f6f616777677a6f74696c6c736c737a626f7863
(cherry picked from commit 827e880)
@MatheMatrix MatheMatrix force-pushed the sync/jin.shen/fix-81282@@3 branch from 4c7969d to f18ff25 Compare January 19, 2026 03:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 2661-2694: The loop over extensions (variable exts of type
CephPrimaryStorageCheckInstanceTypeExtensionPoint) must not use break on a false
result and must be null-safe: replace the current "if (!result) { break; }"
logic with a null-safe check such as "if (Boolean.FALSE.equals(result)) {
continue; }" so that a null result does not throw an NPE and false only skips
the current extension instead of stopping the whole loop; apply the same change
to the other extension loop that calls ext.isSupportCloneByThirdParty(...) and
ext.convertToBlockVolume(...).

In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java`:
- Around line 5-12: The interface
CephPrimaryStorageCheckInstanceTypeExtensionPoint uses nullable Boolean for
isSupportCloneByThirdParty and isBlockVolume which allows implementations to
return null and causes NPEs when callers in CephPrimaryStorageBase unwrap the
value; change both method signatures to return primitive boolean, then update
all implementing classes of CephPrimaryStorageCheckInstanceTypeExtensionPoint to
return true/false (not null), and adjust any unit tests or mocks accordingly so
callers (e.g., the checks in CephPrimaryStorageBase) can safely use !result or
!ext.isBlockVolume(...) without null checks.
♻️ Duplicate comments (3)
conf/db/upgrade/V5.5.6__schema.sql (3)

18-78: 事务包裹 DDL 会导致回滚失效。

MySQL/MariaDB 的 DDL 会触发隐式提交,EXIT HANDLER 里的 ROLLBACK 无法还原已执行的 DDL,升级中断易留下半完成状态。建议去掉“事务假象”,并将每一步改为可重入的幂等检查与执行。(Based on learnings, ...)


39-41: DROP FOREIGN KEY 前应检查约束是否存在。

直接 DROP 在约束不存在或重试升级时会失败并中断迁移;建议先查 INFORMATION_SCHEMA.TABLE_CONSTRAINTS 再执行删除,保证幂等性。(Based on learnings, ...)


45-46: 更新 uuid 会破坏引用表外键完整性。

此处重置 BareMetal2InstanceProvisionNicVO.uuid 会导致引用表中的 provisionNicUuid 变成悬挂引用(无 ON UPDATE CASCADE)。需先同步更新引用表,或先删 FK 再重建(并考虑加入 ON UPDATE CASCADE)。

Comment on lines +2661 to +2694
List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts = pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class);
for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
if (!result) {
break;
}
boolean isRootVolume = Q.New(VolumeVO.class)
.eq(VolumeVO_.uuid, msg.getVolume().getUuid())
.eq(VolumeVO_.type, VolumeType.Root)
.isExists();
boolean isBareMetal2Instance = Q.New(VmInstanceVO.class)
.eq(VmInstanceVO_.uuid, msg.getVolume().getVmInstanceUuid())
.eq(VmInstanceVO_.type, "baremetal2")
.isExists();
boolean hasToken = CephSystemTags.THIRDPARTY_PLATFORM.hasTag(msg.getPrimaryStorageUuid());
if (isRootVolume && isBareMetal2Instance && hasToken) {
CephPrimaryStorageVO cephPrimaryStorageVO = dbf.findByUuid(msg.getPrimaryStorageUuid(), CephPrimaryStorageVO.class);
String monIp = cephPrimaryStorageVO.getMons()
.stream()
.filter(v -> v.getStatus() == MonStatus.Connected)
.map(CephPrimaryStorageMonVO::getHostname)
.findAny()
.orElseThrow(() -> new OperationFailureException(
operr("all ceph mons of primary storage[uuid:%s] are not in Connected state", cephPrimaryStorageVO.getUuid())
));
cmd.token = CephSystemTags.THIRDPARTY_PLATFORM.getTokenByResourceUuid(msg.getPrimaryStorageUuid(),
CephSystemTags.THIRDPARTY_PLATFORM_TOKEN);
cmd.monIp = monIp;
cmd.tpTimeout = CephGlobalConfig.THIRD_PARTY_SDK_TIMEOUT.value(String.class);
VolumeVO vo = Q.New(VolumeVO.class)
.eq(VolumeVO_.uuid, msg.getVolume().getUuid()).find();
ext.convertToBlockVolume(vo);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

扩展点遍历不应使用 break,否则会跳过后续扩展

当前在遇到 false 时直接 break,会导致后续扩展点被跳过;同时若返回 null 也可能触发 NPE。建议使用 continue 并做 null-safe 判断。

🐛 建议修改
-                        for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                            Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
-                            if (!result) {
-                                break;
-                            }
+                        for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
+                            if (!Boolean.TRUE.equals(ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid()))) {
+                                continue;
+                            }
                             boolean isRootVolume = Q.New(VolumeVO.class)
                                     .eq(VolumeVO_.uuid, msg.getVolume().getUuid())
                                     .eq(VolumeVO_.type, VolumeType.Root)
                                     .isExists();
                             boolean isBareMetal2Instance = Q.New(VmInstanceVO.class)
                                     .eq(VmInstanceVO_.uuid, msg.getVolume().getVmInstanceUuid())
                                     .eq(VmInstanceVO_.type, "baremetal2")
                                     .isExists();
                             boolean hasToken = CephSystemTags.THIRDPARTY_PLATFORM.hasTag(msg.getPrimaryStorageUuid());
                             if (isRootVolume && isBareMetal2Instance && hasToken) {
                                 ...
                             }
                         }
-                                for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
-                                    if (!ext.isBlockVolume(msg.getVolume().getUuid())) {
-                                        break;
-                                    }
+                                for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
+                                    if (!Boolean.TRUE.equals(ext.isBlockVolume(msg.getVolume().getUuid()))) {
+                                        continue;
+                                    }
                                     volumePath = ret.installPath == null ? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName);
                                     ext.populateBlockVolumeDetails(msg.getVolume().getUuid(), ret.volumeId, ret.volumeStatus);
                                 }

Also applies to: 2705-2712

🤖 Prompt for AI Agents
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`
around lines 2661 - 2694, The loop over extensions (variable exts of type
CephPrimaryStorageCheckInstanceTypeExtensionPoint) must not use break on a false
result and must be null-safe: replace the current "if (!result) { break; }"
logic with a null-safe check such as "if (Boolean.FALSE.equals(result)) {
continue; }" so that a null result does not throw an NPE and false only skips
the current extension instead of stopping the whole loop; apply the same change
to the other extension loop that calls ext.isSupportCloneByThirdParty(...) and
ext.convertToBlockVolume(...).

Comment on lines +5 to +12
public interface CephPrimaryStorageCheckInstanceTypeExtensionPoint {
Boolean isSupportCloneByThirdParty(String uuid);

void convertToBlockVolume(VolumeVO vo);

Boolean isBlockVolume(String uuid);

void populateBlockVolumeDetails(String uuid, String volumeId, String volumeStatus);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# 查找接口实现类
rg -n --type=java 'implements\s+CephPrimaryStorageCheckInstanceTypeExtensionPoint' -A 3

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 查找接口方法的调用
rg -n --type=java '\.isSupportCloneByThirdParty\(|\.isBlockVolume\(' -B 2 -A 2

Repository: MatheMatrix/zstack

Length of output: 1892


🏁 Script executed:

# 查找接口的注入和使用
rg -n --type=java 'CephPrimaryStorageCheckInstanceTypeExtensionPoint'

Repository: MatheMatrix/zstack

Length of output: 1157


🏁 Script executed:

# 检查当前文件的完整内容和是否有其他定义
cat -n plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java

Repository: MatheMatrix/zstack

Length of output: 536


🏁 Script executed:

# 查找接口实现类
find . -name "*.java" -type f -exec grep -l "implements.*CephPrimaryStorageCheckInstanceTypeExtensionPoint" {} \;

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 检查具体的调用代码上下文
sed -n '2660,2710p' plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

Repository: MatheMatrix/zstack

Length of output: 3914


🏁 Script executed:

# 搜索所有实现该接口的类
rg -n 'class\s+\w+.*implements.*CephPrimaryStorageCheckInstanceTypeExtensionPoint' -A 10

Repository: MatheMatrix/zstack

Length of output: 44


Boolean 改为 boolean 避免调用侧 NPE

调用代码在 CephPrimaryStorageBase.java 的 2664 行和 2707 行直接对返回值进行拆箱操作(!result!ext.isBlockVolume(...)),未做 null 检查。若任何实现返回 null,将触发 NullPointerException

建议修改
-    Boolean isSupportCloneByThirdParty(String uuid);
+    boolean isSupportCloneByThirdParty(String uuid);

-    Boolean isBlockVolume(String uuid);
+    boolean isBlockVolume(String uuid);
🤖 Prompt for AI Agents
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java`
around lines 5 - 12, The interface
CephPrimaryStorageCheckInstanceTypeExtensionPoint uses nullable Boolean for
isSupportCloneByThirdParty and isBlockVolume which allows implementations to
return null and causes NPEs when callers in CephPrimaryStorageBase unwrap the
value; change both method signatures to return primitive boolean, then update
all implementing classes of CephPrimaryStorageCheckInstanceTypeExtensionPoint to
return true/false (not null), and adjust any unit tests or mocks accordingly so
callers (e.g., the checks in CephPrimaryStorageBase) can safely use !result or
!ext.isBlockVolume(...) without null checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants