Skip to content

Column shards can return different amount of data for the same queries #20199

Open
@vitalyisaev2

Description

@vitalyisaev2

The problem

All things being equal, the amount of data provided by a certain column shard may vary between different runs of the same query. It's a rare and "floating" bug, so I had to make a special tool to reproduce it.

Steps to reproduce

Prerequisites:

  • Linux / MacOS
  • Go toolchain > 1.23.8
  • Network access to Cloud Preprod SAS

Compile the tool:

git clone git@github.com:ydb-platform/fq-connector-go.git
cd fq-connector-go
go build ./tools/ydb/olap_inconsistency/

Run the tool (pay attention to -start and -end values - the time range shouldn't be affected with GC collecting records with expired TTL):

./olap_inconsistency 
-endpoint="u-lb.cc8bajrmntk9q0d1lc0t.ydb.mdb.cloud-preprod.yandex.net:2135" 
-database "/pre-prod_sas/yc.logs.cloud/cc8bajrmntk9q0d1lc0t" 
-token=$(yc --profile preprod-fed-user iam create-token) 
-start="2025-06-25T01:01:01Z" 
-end="2025-06-25T02:02:02Z" 
-table="logs/origin/aoeoqusjtbo4m549jrom/aoe3cidh5dfee2s6cqu5/af3p40c4vf9jqpb81qvm"
-resource-pool="yandex_query_pool"

If you're lucky, after a while you'll find in the logs the tablet id that returned different number of rows. For example:

2025-06-25T18:10:36.905+0300	INFO	olap_inconsistency/main.go:305	inconsistency detected	{"tablet_id": "72075186235526655", "query_num": 2, "previous_count": 120478, "current_count": 62662}
2025-06-25T18:10:36.905+0300	INFO	olap_inconsistency/main.go:225	inconsistency found in tablet ID	{"tablet_id": "72075186235526655"}

This log record means that CS with id 72075186235526655 returned 120478 lines during the previous query launch, but only 62662 in the current query launch.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions