-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Enhancement] Reduce lock contention in DatabaseTransactionMgr #59990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add support for `unique_key_update_mode` property in routine load to enable flexible partial columns update. This allows different rows in the same batch to update different columns, unlike fixed partial update where all rows must update the same columns. Changes: - Add `unique_key_update_mode` property to CreateRoutineLoadInfo with values: UPSERT (default), UPDATE_FIXED_COLUMNS, UPDATE_FLEXIBLE_COLUMNS - Add validation for flexible partial update constraints (JSON format only, no jsonpaths, no fuzzy_parse, no COLUMNS clause, no WHERE clause, table must have skip_bitmap column enabled) - Update RoutineLoadJob to persist and restore the update mode - Update KafkaRoutineLoadJob to pass update mode to task info - Support ALTER ROUTINE LOAD to change unique_key_update_mode - Add regression tests covering basic usage and error cases - Fix HashMap ordering issue in gsonPostProcess for backward compatibility - Add validation when ALTER changes mode to UPDATE_FLEXIBLE_COLUMNS - Add comprehensive ALTER test cases for flexible partial update validation
1. Fix checkstyle: line length exceeds 120 characters - Split long exception message string to comply with 120-character limit 2. Add shared parseUniqueKeyUpdateMode() helper methods in CreateRoutineLoadInfo - parseUniqueKeyUpdateMode(String): returns TUniqueKeyUpdateMode or null - parseAndValidateUniqueKeyUpdateMode(String): validates and throws on error - Replaces duplicated switch/if-else logic across 4 files 3. Add OlapTable.validateForFlexiblePartialUpdate() method - Centralizes table-level validation (MoW, skip_bitmap, light_schema_change, variant) - Used by CreateRoutineLoadInfo, RoutineLoadJob, and NereidsStreamLoadPlanner 4. Update all callers to use shared validation methods - Reduces code duplication and ensures consistent error messages 5. Allow jsonpaths, WHERE clause, and MERGE/DELETE with flexible partial update - Removed restrictions that blocked these features
1. Fix exception type mismatch in KafkaRoutineLoadJob.replayModifyProperties
- Changed catch block from DdlException to UserException since
modifyPropertiesInternal now throws UserException
2. Fix setSchemaForPartialUpdate not called for flexible partial update
- Changed condition from isPartialUpdate to check both
UPDATE_FIXED_COLUMNS and UPDATE_FLEXIBLE_COLUMNS modes
- Aligns with StreamLoadHandler behavior
3. Update tests to allow WHERE clause and jsonpaths with flexible partial update
- Tests 4, 7, 16, 18 now verify these features work correctly
- Added expected output for new success test cases
- Test parseUniqueKeyUpdateMode() with valid/invalid mode strings - Test parseAndValidateUniqueKeyUpdateMode() with exception handling - Test backward compatibility: partial_columns=true maps to UPDATE_FIXED_COLUMNS - Test unique_key_update_mode takes precedence over partial_columns
Test the backward compatibility and precedence logic directly without calling gsonPostProcess() which requires origStmt
Use 'can only support' format instead of 'requires' to match existing test expectations in test_flexible_partial_update_restricts.groovy
Move edit log operations outside the write lock to improve transaction throughput for concurrent operations on different tables within the same database. Changes: - Add unprotectUpdateInMemoryState() for in-memory updates inside lock - Add persistTransactionState() for edit log writes outside lock - Refactor beginTransaction, commitTransaction, finishTransaction, abortTransaction, and removeUselessTxns to use the new pattern - Refactor unprotectUpsertTransactionState to delegate to new methods This reduces lock hold time from milliseconds (I/O bound) to microseconds (memory only), enabling higher concurrency for multi-table workloads. Safety is maintained because: - In-memory state is updated atomically within the write lock - Edit log failures call System.exit(-1), preventing inconsistent state - Replay path remains unchanged (uses isReplay=true) Fixes: apache#53642
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces two major changes: (1) reduces lock contention in DatabaseTransactionMgr by moving edit log operations outside write locks, and (2) adds support for flexible partial update mode in routine load operations.
Changes:
- Refactored
DatabaseTransactionMgrto split in-memory state updates from edit log persistence for reduced lock contention - Added comprehensive support for
unique_key_update_modeproperty with newUPDATE_FLEXIBLE_COLUMNSmode - Introduced validation framework for flexible partial update constraints in
OlapTable
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java | Core refactoring: splits unprotectUpsertTransactionState into unprotectUpdateInMemoryState (inside lock) and persistTransactionState (outside lock); applies pattern to transaction lifecycle methods |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateRoutineLoadInfo.java | Adds parsing, validation, and property management for unique_key_update_mode with backward compatibility for partial_columns |
| fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java | Updates job properties handling to support new update mode; adds validation logic for ALTER operations |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java | Adds validateForFlexiblePartialUpdate() method to centralize table-level constraint validation |
| fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsStreamLoadPlanner.java | Refactors flexible partial update validation to use centralized OlapTable method |
| fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsRoutineLoadTaskInfo.java | Updates constructor to accept TUniqueKeyUpdateMode instead of boolean isPartialUpdate flag |
| fe/fe-core/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java | Updates method signature and calls to support UserException for ALTER operations |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterRoutineLoadCommand.java | Adds validation for unique_key_update_mode property during ALTER ROUTINE LOAD |
| regression-test/suites/load_p0/routine_load/test_routine_load_flexible_partial_update.groovy | Comprehensive 1414-line test suite with 21 test cases covering flexible partial update feature |
| regression-test/data/load_p0/routine_load/test_routine_load_flexible_partial_update.out | Expected test outputs for flexible partial update test cases |
| fe/fe-core/src/test/java/org/apache/doris/load/routineload/RoutineLoadJobTest.java | Unit tests for unique_key_update_mode parsing, validation, and backward compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| writeUnlock(); | ||
| } | ||
| // Persist edit log outside lock to reduce lock contention | ||
| persistTransactionState(transactionState); |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The persistTransactionState method is called outside the write lock with a transactionState variable that could potentially be null if an exception is thrown inside the try block before line 372. If any of the checks before line 372 throw an exception (such as checkRunningTxnExceedLimit at line 369), transactionState will remain null and calling persistTransactionState(transactionState) at line 384 would result in a NullPointerException. Consider adding a null check before calling persistTransactionState, or wrapping the call in a condition that ensures transactionState is not null.
| persistTransactionState(transactionState); | |
| if (transactionState != null) { | |
| persistTransactionState(transactionState); | |
| } |
| ( | ||
| "max_batch_interval" = "10", | ||
| "format" = "json", | ||
| "jsonpaths" = '["\\$.id", "\\$.name", "\\$.score"]', |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent string escaping for jsonpaths property. Line 276 uses single quotes with backslash escaping '["\$.id", ...]', while line 982 uses double quotes with triple backslash escaping "[\"\$.id\", ...]". Both tests should use the same escaping pattern for consistency and readability. Consider standardizing on one approach throughout the test file.
| "jsonpaths" = '["\\$.id", "\\$.name", "\\$.score"]', | |
| "jsonpaths" = "[\\\"\\$.id\\", \\\"\\$.name\\", \\\"\\$.score\\"]", |
TPC-H: Total hot run time: 31451 ms |
TPC-DS: Total hot run time: 174418 ms |
ClickBench: Total hot run time: 27.05 s |
FE UT Coverage ReportIncrement line coverage |
Summary
Changes
unprotectUpdateInMemoryState()for in-memory updates inside lockpersistTransactionState()for edit log writes outside lockbeginTransactionpreCommitTransaction2PCcommitTransaction(both overloads)finishTransactionabortTransaction/abortTransaction2PCremoveUselessTxnsSafety Guarantees
System.exit(-1), preventing inconsistent stateisReplay=true, skips edit log write)Test plan
Fixes: #53642