databendlabs
diff --git a/‎docs/doc/14-sql-commands/00-ddl/20-table/60-optimize-table.md
Lines changed: 3 additions & 78 deletions b/‎docs/doc/14-sql-commands/00-ddl/20-table/60-optimize-table.md
Lines changed: 3 additions & 78 deletions
diff --git a/‎docs/doc/14-sql-commands/00-ddl/20-table/80-analyze-table.md
Lines changed: 95 additions & 0 deletions b/‎docs/doc/14-sql-commands/00-ddl/20-table/80-analyze-table.md
Lines changed: 95 additions & 0 deletions
diff --git a/‎docs/doc/15-sql-functions/111-system-functions/fuse_statistic.md
Lines changed: 1 addition & 1 deletion b/‎docs/doc/15-sql-functions/111-system-functions/fuse_statistic.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/query/ast/src/ast/format/ast_format.rs
Lines changed: 11 additions & 0 deletions b/‎src/query/ast/src/ast/format/ast_format.rs
Lines changed: 11 additions & 0 deletions
diff --git a/‎src/query/ast/src/ast/statements/statement.rs
Lines changed: 2 additions & 0 deletions b/‎src/query/ast/src/ast/statements/statement.rs
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/query/ast/src/ast/statements/table.rs
Lines changed: 22 additions & 2 deletions b/‎src/query/ast/src/ast/statements/table.rs
Lines changed: 22 additions & 2 deletions
diff --git a/‎src/query/ast/src/parser/statement.rs
Lines changed: 13 additions & 1 deletion b/‎src/query/ast/src/parser/statement.rs
Lines changed: 13 additions & 1 deletion
diff --git a/‎src/query/ast/src/visitors/visitor.rs
Lines changed: 2 additions & 0 deletions b/‎src/query/ast/src/visitors/visitor.rs
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/query/ast/src/visitors/visitor_mut.rs
Lines changed: 2 additions & 0 deletions b/‎src/query/ast/src/visitors/visitor_mut.rs
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/query/ast/src/visitors/walk.rs
Lines changed: 1 addition & 0 deletions b/‎src/query/ast/src/visitors/walk.rs
Lines changed: 1 addition & 0 deletions
@@ -8,7 +8,7 @@ The objective of optimizing a table in Databend is to compact or purge its histo
 Databend's Time Travel feature relies on historical data. If you purge historical data from a table with the command `OPTIMIZE TABLE <your_table> PURGE` or `OPTIMIZE TABLE <your_table> ALL`, the table will not be eligible for time travel. The command removes all snapshots (except the most recent one) and their associated segments,block files and table statistic file.
 :::
 
-## What are Snapshot, Segment, Block and Table statistic file?
+## What are Snapshot, Segment, Block?
 
 Snapshot, segment, and block are the concepts Databend uses for data storage. Databend uses them to construct a hierarchical structure for storing table data.
 
@@ -20,8 +20,6 @@ A snapshot is a JSON file that does not save the table's data but indicate the s
 
 A segment is a JSON file that organizes the storage blocks (at least 1, at most 1,000) where the data is stored. If you run [FUSE_SEGMENT](../../../15-sql-functions/111-system-functions/fuse_segment.md) against a snapshot with the snapshot ID, you can find which segments are referenced by the snapshot.
 
-A table statistic file is a JSON file that save table statistic data, such as distinct values of table column.
-
 Databends saves actual table data in parquet files and considers each parquet file as a block. If you run [FUSE_BLOCK](../../../15-sql-functions/111-system-functions/fuse_block.md) against a snapshot with the snapshot ID, you can find which blocks are referenced by the snapshot.
 
 Databend creates a unique ID for each database and table for storing the snapshot, segment, and block files and saves them to your object storage in the path `<bucket_name>/[root]/<db_id>/<table_id>/`. Each snapshot, segment, and block file is named with a UUID (32-character lowercase hexadecimal string).
@@ -31,7 +29,6 @@ Databend creates a unique ID for each database and table for storing the snapsho
 | Snapshot | JSON    | `<32bitUUID>_<version>.json`    | `<bucket_name>/[root]/<db_id>/<table_id>/_ss/`   |
 | Segment  | JSON    | `<32bitUUID>_<version>.json`    | `<bucket_name>/[root]/<db_id>/<table_id>/_sg/`   |
 | Block    | parquet | `<32bitUUID>_<version>.parquet` | `<bucket_name>/[root]/<db_id>/<table_id>/_b/` |
-| Table statistic | JSON    | `<32bitUUID>_<version>.json`    | `<bucket_name>/[root]/<db_id>/<table_id>/_ts/`   |
 
 ## Table Optimization Considerations
 
@@ -67,12 +64,13 @@ Optimizing a table could be time-consuming, especially for large ones. Databend
 ## Syntax
 
 ```sql
-OPTIMIZE TABLE [database.]table_name [ PURGE | COMPACT | ALL | STATISTIC ] [SEGMENT] [LIMIT <segment_count>]
+OPTIMIZE TABLE [database.]table_name [ PURGE | COMPACT | ALL | [SEGMENT] [LIMIT <segment_count>]
 ```
 
 - `OPTIMIZE TABLE <table_name> PURGE`
 
     Purges the historical data of table. Only the latest snapshot (including the segments, blocks and table statistic file referenced by this snapshot) will be kept.
+    (For more explanations of table statistic file, see [ANALYZE TABLE](./80-analyze-table.md).)
 
 - `OPTIMIZE TABLE <table_name> COMPACT [LIMIT <segment_count>]`
 
@@ -97,13 +95,6 @@ OPTIMIZE TABLE [database.]table_name [ PURGE | COMPACT | ALL | STATISTIC ] [SEGM
 
     Works the same way as `OPTIMIZE TABLE <table_name> PURGE`.
 
-- `OPTIMIZE TABLE <table_name> STATISTIC`
-
-    Estimates the number of distinct values of each column in a table. 
-    
-    - It does not display the estimated results after execution. To show the estimated results, use the function [FUSE_STATISTIC](../../../15-sql-functions/111-system-functions/fuse_statistic.md).
-    - The command does not identify distinct values by comparing them but by counting the number of storage segments and blocks. This might lead to a significant difference between the estimated results and the actual value, for example, multiple blocks holding the same value. In this case, Databend recommends compacting the storage segments and blocks to merge them as much as possible before you run the estimation.
-
 ## Examples
 
 This example compacts and purges historical data from a table:
@@ -162,70 +153,4 @@ mysql> select snapshot_id, segment_count, block_count, row_count from fuse_snaps
 +----------------------------------+---------------+-------------+-----------+
 | 4f33a63031424ed095b8c2f9e8b15ecb |            16 |          16 |  10000005 |
 +----------------------------------+---------------+-------------+-----------+
-```
-
-This example estimates the number of distinct values for each column in a table and shows the results with the function FUSE_STATISTIC:
-
-```sql
-create table t(a uint64);
-
-insert into t values (5);
-insert into t values (6);
-insert into t values (7);
-
-select * from t order by a;
-
-----
-5
-6
-7
-
--- FUSE_STATISTIC will not return any results until you run an estimation with OPTIMIZE TABLE.
-select * from fuse_statistic('db_09_0020', 't');
-
-optimize table `t` statistic;
-
-select * from fuse_statistic('db_09_0020', 't');
-
-----
-(0,3);
-
-
-insert into t values (5);
-insert into t values (6);
-insert into t values (7);
-
-select * from t order by a;
-
-----
-5
-5
-6
-6
-7
-7
-
--- FUSE_STATISTIC returns results of your last estimation. To get the most recent estimated values, run the estimation again.
--- OPTIMIZE TABLE does not identify distinct values by comparing them but by counting the number of storage segments and blocks.
-select * from fuse_statistic('db_09_0020', 't');
-
-----
-(0,3);
-
-optimize table `t` statistic;
-
-select * from fuse_statistic('db_09_0020', 't');
-
-----
-(0,6);
-
--- Best practice: Compact the table before running the estimation.
-optimize table t compact;
-
-optimize table `t` statistic;
-
-select * from fuse_statistic('db_09_0020', 't');
-
-----
-(0,3);
 ```
@@ -0,0 +1,95 @@
+---
+title: ANALYZE TABLE
+---
+
+The objective of analyzing a table in Databend is to calculate table statistics, such as distinct number of columns.
+
+## What is Table statistic file?
+
+A table statistic file is a JSON file that save table statistic data, such as distinct values of table column.
+
+Databend creates a unique ID for each database and table for storing the table statistic file and saves them to your object storage in the path `<bucket_name>/[root]/<db_id>/<table_id>/`. Each table statistic file is named with a UUID (32-character lowercase hexadecimal string).
+
+| File     | Format  | Filename                        | Storage Folder                                                               |
+|----------|---------|---------------------------------|----------------------------------------------------------------------------|
+| Table statistic | JSON    | `<32bitUUID>_<version>.json`    | `<bucket_name>/[root]/<db_id>/<table_id>/_ts/`   |
+
+## Syntax
+```sql
+ANALYZE TABLE [database.]table_name
+```
+
+- `ANALYZE TABLE <table_name>`
+
+    Estimates the number of distinct values of each column in a table. 
+    
+    - It does not display the estimated results after execution. To show the estimated results, use the function [FUSE_STATISTIC](../../../15-sql-functions/111-system-functions/fuse_statistic.md).
+    - The command does not identify distinct values by comparing them but by counting the number of storage segments and blocks. This might lead to a significant difference between the estimated results and the actual value, for example, multiple blocks holding the same value. In this case, Databend recommends compacting the storage segments and blocks to merge them as much as possible before you run the estimation.
+
+## Examples
+
+This example estimates the number of distinct values for each column in a table and shows the results with the function FUSE_STATISTIC:
+
+```sql
+create table t(a uint64);
+
+insert into t values (5);
+insert into t values (6);
+insert into t values (7);
+
+select * from t order by a;
+
+----
+5
+6
+7
+
+-- FUSE_STATISTIC will not return any results until you run an estimation with OPTIMIZE TABLE.
+select * from fuse_statistic('db_09_0020', 't');
+
+analyze table `t`;
+
+select * from fuse_statistic('db_09_0020', 't');
+
+----
+(0,3);
+
+
+insert into t values (5);
+insert into t values (6);
+insert into t values (7);
+
+select * from t order by a;
+
+----
+5
+5
+6
+6
+7
+7
+
+-- FUSE_STATISTIC returns results of your last estimation. To get the most recent estimated values, run the estimation again.
+-- OPTIMIZE TABLE does not identify distinct values by comparing them but by counting the number of storage segments and blocks.
+select * from fuse_statistic('db_09_0020', 't');
+
+----
+(0,3);
+
+analyze table `t`;
+
+select * from fuse_statistic('db_09_0020', 't');
+
+----
+(0,6);
+
+-- Best practice: Compact the table before running the estimation.
+optimize table t compact;
+
+analyze table `t`;
+
+select * from fuse_statistic('db_09_0020', 't');
+
+----
+(0,3);
+```
@@ -16,4 +16,4 @@ FUSE_STATISTIC('<database_name>', '<table_name>')
 
 ## Examples
 
-You're most likely to use this function together with `OPTIMIZE TABLE <table_name> STATISTIC` to generate and check the statistical information of a table. For more explanations and examples, see [OPTIMIZE TABLE](../../14-sql-commands/00-ddl/20-table/60-optimize-table.md).
+You're most likely to use this function together with `ANALYZE TABLE <table_name>` to generate and check the statistical information of a table. For more explanations and examples, see [OPTIMIZE TABLE](../../14-sql-commands/00-ddl/20-table/60-optimize-table.md).
@@ -1371,6 +1371,17 @@ impl<'ast> Visitor<'ast> for AstFormatVisitor {
         self.children.push(node);
     }
 
+    fn visit_analyze_table(&mut self, stmt: &'ast AnalyzeTableStmt<'ast>) {
+        let mut children = Vec::new();
+        self.visit_table_ref(&stmt.catalog, &stmt.database, &stmt.table);
+        children.push(self.children.pop().unwrap());
+
+        let name = "AnalyzeTable".to_string();
+        let format_ctx = AstFormatContext::with_children(name, children.len());
+        let node = FormatTreeNode::with_children(format_ctx, children);
+        self.children.push(node);
+    }
+
     fn visit_exists_table(&mut self, stmt: &'ast ExistsTableStmt<'ast>) {
         self.visit_table_ref(&stmt.catalog, &stmt.database, &stmt.table);
         let child = self.children.pop().unwrap();
 
@@ -105,6 +105,7 @@ pub enum Statement<'a> {
     RenameTable(RenameTableStmt<'a>),
     TruncateTable(TruncateTableStmt<'a>),
     OptimizeTable(OptimizeTableStmt<'a>),
+    AnalyzeTable(AnalyzeTableStmt<'a>),
     ExistsTable(ExistsTableStmt<'a>),
 
     // Views
@@ -295,6 +296,7 @@ impl<'a> Display for Statement<'a> {
             Statement::RenameTable(stmt) => write!(f, "{stmt}")?,
             Statement::TruncateTable(stmt) => write!(f, "{stmt}")?,
             Statement::OptimizeTable(stmt) => write!(f, "{stmt}")?,
+            Statement::AnalyzeTable(stmt) => write!(f, "{stmt}")?,
             Statement::ExistsTable(stmt) => write!(f, "{stmt}")?,
             Statement::CreateView(stmt) => write!(f, "{stmt}")?,
             Statement::AlterView(stmt) => write!(f, "{stmt}")?,
 
@@ -411,6 +411,28 @@ impl Display for OptimizeTableStmt<'_> {
     }
 }
 
+#[derive(Debug, Clone, PartialEq)]
+pub struct AnalyzeTableStmt<'a> {
+    pub catalog: Option<Identifier<'a>>,
+    pub database: Option<Identifier<'a>>,
+    pub table: Identifier<'a>,
+}
+
+impl Display for AnalyzeTableStmt<'_> {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        write!(f, "ANALYZE TABLE ")?;
+        write_period_separated_list(
+            f,
+            self.catalog
+                .iter()
+                .chain(&self.database)
+                .chain(Some(&self.table)),
+        )?;
+
+        Ok(())
+    }
+}
+
 #[derive(Debug, Clone, PartialEq, Eq)]
 pub struct ExistsTableStmt<'a> {
     pub catalog: Option<Identifier<'a>>,
@@ -462,7 +484,6 @@ pub enum CompactTarget {
 pub enum OptimizeTableAction<'a> {
     All,
     Purge,
-    Statistic,
     Compact {
         target: CompactTarget,
         limit: Option<Expr<'a>>,
@@ -474,7 +495,6 @@ impl<'a> Display for OptimizeTableAction<'a> {
         match self {
             OptimizeTableAction::All => write!(f, "ALL"),
             OptimizeTableAction::Purge => write!(f, "PURGE"),
-            OptimizeTableAction::Statistic => write!(f, "STATISTIC"),
             OptimizeTableAction::Compact { target, limit } => {
                 match target {
                     CompactTarget::Block => {
 
@@ -515,6 +515,18 @@ pub fn statement(i: Input) -> IResult<StatementMsg> {
             })
         },
     );
+    let analyze_table = map(
+        rule! {
+            ANALYZE ~ TABLE ~ #peroid_separated_idents_1_to_3
+        },
+        |(_, _, (catalog, database, table))| {
+            Statement::AnalyzeTable(AnalyzeTableStmt {
+                catalog,
+                database,
+                table,
+            })
+        },
+    );
     let exists_table = map(
         rule! {
             EXISTS ~ TABLE ~ #peroid_separated_idents_1_to_3
@@ -991,6 +1003,7 @@ pub fn statement(i: Input) -> IResult<StatementMsg> {
             | #rename_table : "`RENAME TABLE [<database>.]<table> TO <new_table>`"
             | #truncate_table : "`TRUNCATE TABLE [<database>.]<table> [PURGE]`"
             | #optimize_table : "`OPTIMIZE TABLE [<database>.]<table> (ALL | PURGE | COMPACT [SEGMENT])`"
+            | #analyze_table : "`ANALYZE TABLE [<database>.]<table>`"
             | #exists_table : "`EXISTS TABLE [<database>.]<table>`"
         ),
         rule!(
@@ -1449,7 +1462,6 @@ pub fn optimize_table_action(i: Input) -> IResult<OptimizeTableAction> {
     alt((
         value(OptimizeTableAction::All, rule! { ALL }),
         value(OptimizeTableAction::Purge, rule! { PURGE }),
-        value(OptimizeTableAction::Statistic, rule! { STATISTIC }),
         map(
             rule! { COMPACT ~ (SEGMENT)? ~ ( LIMIT ~ ^#expr )?},
             |(_, opt_segment, opt_limit)| OptimizeTableAction::Compact {
 
@@ -436,6 +436,8 @@ pub trait Visitor<'ast>: Sized {
 
     fn visit_optimize_table(&mut self, _stmt: &'ast OptimizeTableStmt<'ast>) {}
 
+    fn visit_analyze_table(&mut self, _stmt: &'ast AnalyzeTableStmt<'ast>) {}
+
     fn visit_exists_table(&mut self, _stmt: &'ast ExistsTableStmt<'ast>) {}
 
     fn visit_create_view(&mut self, _stmt: &'ast CreateViewStmt<'ast>) {}
 
@@ -439,6 +439,8 @@ pub trait VisitorMut: Sized {
 
     fn visit_optimize_table(&mut self, _stmt: &mut OptimizeTableStmt<'_>) {}
 
+    fn visit_analyze_table(&mut self, _stmt: &mut AnalyzeTableStmt<'_>) {}
+
     fn visit_exists_table(&mut self, _stmt: &mut ExistsTableStmt<'_>) {}
 
     fn visit_create_view(&mut self, _stmt: &mut CreateViewStmt<'_>) {}
 
@@ -346,6 +346,7 @@ pub fn walk_statement<'a, V: Visitor<'a>>(visitor: &mut V, statement: &'a Statem
         Statement::RenameTable(stmt) => visitor.visit_rename_table(stmt),
         Statement::TruncateTable(stmt) => visitor.visit_truncate_table(stmt),
         Statement::OptimizeTable(stmt) => visitor.visit_optimize_table(stmt),
+        Statement::AnalyzeTable(stmt) => visitor.visit_analyze_table(stmt),
         Statement::ExistsTable(stmt) => visitor.visit_exists_table(stmt),
         Statement::CreateView(stmt) => visitor.visit_create_view(stmt),
         Statement::AlterView(stmt) => visitor.visit_alter_view(stmt),
Original file line number	Diff line number	Diff line change
`@@ -16,4 +16,4 @@ FUSE_STATISTIC('<database_name>', '<table_name>')`
`16`	`16`
`17`	`17`	`## Examples`
`18`	`18`
`19`		-You're most likely to use this function together with `OPTIMIZE TABLE <table_name> STATISTIC` to generate and check the statistical information of a table. For more explanations and examples, see [OPTIMIZE TABLE](../../14-sql-commands/00-ddl/20-table/60-optimize-table.md).
	`19`	+You're most likely to use this function together with `ANALYZE TABLE <table_name>` to generate and check the statistical information of a table. For more explanations and examples, see [OPTIMIZE TABLE](../../14-sql-commands/00-ddl/20-table/60-optimize-table.md).