Open
Description
The problem
All things being equal, the amount of data provided by a certain column shard may vary between different runs of the same query. It's a rare and "floating" bug, so I had to make a special tool to reproduce it.
Steps to reproduce
Prerequisites:
- Linux / MacOS
- Go toolchain > 1.23.8
- Network access to Cloud Preprod SAS
Compile the tool:
git clone git@github.com:ydb-platform/fq-connector-go.git
cd fq-connector-go
go build ./tools/ydb/olap_inconsistency/
Run the tool (pay attention to -start
and -end
values - the time range shouldn't be affected with GC collecting records with expired TTL):
./olap_inconsistency
-endpoint="u-lb.cc8bajrmntk9q0d1lc0t.ydb.mdb.cloud-preprod.yandex.net:2135"
-database "/pre-prod_sas/yc.logs.cloud/cc8bajrmntk9q0d1lc0t"
-token=$(yc --profile preprod-fed-user iam create-token)
-start="2025-06-25T01:01:01Z"
-end="2025-06-25T02:02:02Z"
-table="logs/origin/aoeoqusjtbo4m549jrom/aoe3cidh5dfee2s6cqu5/af3p40c4vf9jqpb81qvm"
-resource-pool="yandex_query_pool"
If you're lucky, after a while you'll find in the logs the tablet id that returned different number of rows. For example:
2025-06-25T18:10:36.905+0300 INFO olap_inconsistency/main.go:305 inconsistency detected {"tablet_id": "72075186235526655", "query_num": 2, "previous_count": 120478, "current_count": 62662}
2025-06-25T18:10:36.905+0300 INFO olap_inconsistency/main.go:225 inconsistency found in tablet ID {"tablet_id": "72075186235526655"}
This log record means that CS with id 72075186235526655
returned 120478 lines during the previous query launch, but only 62662 in the current query launch.