Skip to content

Commit 0c878d2

Browse files
authored
Merge pull request #7541 from lichuang/table_stage_file_duplicate_rfc
rfc: Idempotent Copy
2 parents 9efea7f + e5ffc14 commit 0c878d2

File tree

3 files changed

+56
-0
lines changed

3 files changed

+56
-0
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Idempotent Copy
3+
description: Avoid duplicating when copy stage files into a table
4+
---
5+
6+
- Tracking Issue: https://github.com/datafuselabs/databend/issues/6338
7+
8+
## Summary
9+
10+
When streaming copy stage files into a table, there is a chance that some files have already been copied, So it needs some ways to avoid duplicate copying files, make it an `idempotent` operation.
11+
12+
## Save copy into table stage files meta information in meta service
13+
14+
Whenever copy stage files into a table, save the stage file meta information into the meta service:
15+
16+
- key: combined with `(tenant, database, table, file name)`.
17+
- value: value MUST includes all the meta of a stage file, such as `content-length`,`etag`,`last modified`.
18+
19+
20+
21+
![](/img/rfc/20220909-idempotent-copy/stage-file-meta.png)
22+
23+
24+
25+
The expiration time of the stage file meta information is 64 days by default.
26+
27+
## Avoiding duplicates when copy stage files into a table
28+
29+
Using the stage file meta information, whenever copy stage files into a table, follow these steps:
30+
31+
* First, get all the table file meta information of the copy stage files that want to copy into the table(if any).
32+
* Second, get all the stage file meta information.
33+
* Third, compare the table file meta information with stage file meta information:
34+
* If they matched, this file is just ignored without copying.
35+
* Else, copy the stage file and up-insert into the table stage file meta.
36+
37+
38+
39+
![](/img/rfc/20220909-idempotent-copy/example.png)
40+
41+
42+
43+
Take the image above as an example:
44+
45+
* Client make a request to copy thress files (file1, file2, file3) into table.
46+
47+
* Get the table stage file meta of (file1, file2, file3).
48+
49+
* In the meta service, only found (file1,file3) stage file information.
50+
51+
* Compare the table stage file information with stage file information, and found that file1 has not been changed, so file1 will be ignored in this copy operation, and (file2,file3) will be copied.
52+
53+
* After copying new files, (file2, file3) stage file information will be saved into table file information.
54+
55+
56+
Loading
Loading

0 commit comments

Comments
 (0)