Databricks: Add support for OPTIMIZE, PARTITIONED BY, and STRUCT syntax #2170

funcpp · 2026-01-22T06:01:31Z

Summary

This PR adds several Databricks Delta Lake SQL syntax features:

1. OPTIMIZE statement support

Adds support for the Databricks OPTIMIZE statement syntax:

OPTIMIZE table_name [WHERE predicate] [ZORDER BY (col1, col2, ...)]

Reference: https://docs.databricks.com/en/sql/language-manual/delta-optimize.html

Key difference from ClickHouse: Databricks omits the TABLE keyword after OPTIMIZE.

2. PARTITIONED BY with optional column types

Databricks allows partition columns to reference existing table columns without specifying types:

CREATE TABLE t (col1 STRING, col2 INT) PARTITIONED BY (col1)
CREATE TABLE t (name STRING) PARTITIONED BY (year INT, month INT)

Reference: https://docs.databricks.com/en/sql/language-manual/sql-ref-partition.html

3. STRUCT type with colon syntax

Databricks uses Hive-style colon separator for struct field definitions:

STRUCT<field_name: field_type, ...>
ARRAY<STRUCT<finish_flag: STRING, survive_flag: STRING, score: INT>>

Reference: https://docs.databricks.com/en/sql/language-manual/data-types/struct-type.html

The colon is optional per the spec, so both field: type and field type syntaxes are now accepted.

Changes

Extended OptimizeTable AST to support Databricks-specific fields (predicate, zorder, has_table_keyword)
Added parse_column_def_for_partition() to handle optional column types in PARTITIONED BY
Added DatabricksDialect to STRUCT type parsing
Modified parse_struct_field_def() to accept optional colon separator

Test plan

Added tests for OPTIMIZE statement variations
Added tests for PARTITIONED BY with/without column types
Added tests for STRUCT type with colon syntax
Verified existing ClickHouse and BigQuery tests still pass
All tests pass (cargo test)

Add support for Databricks Delta Lake OPTIMIZE statement syntax: - OPTIMIZE table_name [WHERE predicate] [ZORDER BY (col1, ...)] This extends the existing OptimizeTable AST to support both ClickHouse and Databricks syntax by adding: - has_table_keyword: distinguishes OPTIMIZE TABLE (ClickHouse) from OPTIMIZE (Databricks) - predicate: optional WHERE clause for partition filtering - zorder: optional ZORDER BY clause for data colocation

Databricks allows partition columns to be specified without types when referencing columns already defined in the table specification: CREATE TABLE t (col1 STRING, col2 INT) PARTITIONED BY (col1) CREATE TABLE t (name STRING) PARTITIONED BY (year INT, month INT) This change introduces parse_column_def_for_partition() which makes the data type optional by checking if the next token is a comma or closing paren (indicating no type follows the column name).

Add support for Databricks/Hive-style STRUCT field syntax using colons: STRUCT<field_name: field_type, ...> Changes: - Add DatabricksDialect to STRUCT type parsing (alongside BigQuery/Generic) - Modify parse_struct_field_def to handle optional colon separator between field name and type, supporting both: - BigQuery style: STRUCT<field_name field_type> - Databricks/Hive style: STRUCT<field_name: field_type> This enables parsing complex nested types like: ARRAY<STRUCT<finish_flag: STRING, survive_flag: STRING, score: INT>>

funcpp added 4 commits January 22, 2026 12:53

Apply cargo fmt formatting

77f4fae

funcpp force-pushed the feature/databricks-delta-support branch from 1dd3157 to 77f4fae Compare January 22, 2026 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks: Add support for OPTIMIZE, PARTITIONED BY, and STRUCT syntax #2170

Databricks: Add support for OPTIMIZE, PARTITIONED BY, and STRUCT syntax #2170

funcpp commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Databricks: Add support for OPTIMIZE, PARTITIONED BY, and STRUCT syntax #2170

Are you sure you want to change the base?

Databricks: Add support for OPTIMIZE, PARTITIONED BY, and STRUCT syntax #2170

Conversation

funcpp commented Jan 22, 2026

Summary

1. OPTIMIZE statement support

2. PARTITIONED BY with optional column types

3. STRUCT type with colon syntax

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant