-
Notifications
You must be signed in to change notification settings - Fork 138
Description
Cluster version: 8.5.1
Client version:
name = "tikv-client"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "048968e4e3d04db472346770cc19914c6b5ae206fa44677f6a0874d54cd05940"
I have a workload where am ingesting lots of data into TiKV, one transaction after another in quick succession. In one transaction I write/delete many KVs, and in a later transaction (in this specific case ~24h later) I try to read one of the keys I wrote to but no value is found, and I did not do another operation to the key in between. Other keys I wrote within the transaction which wrote the missing KV are found. Some of the TiKV nodes are crashing around the time due to the heavy workload. The missing keys are not deterministic, but when we repeat the workload we again see cases where some writes or deletes within a transaction do not seem to be applied. When no nodes crash during the workload we don't seem to have the issue (could be coincidence).
.put(key)
within transaction (default optimistic)- Begin committing transaction writing the KV
- 10-20 or so logs about connecting to specifically one of the nodes which is likely crashed/restarting, these 2 repeating:
tikv_client::pd::client�[0m�[2m:�[0m connect to tikv endpoint: <node 0>
tikv_client::common::security�[0m�[2m:�[0m connect to rpc server at endpoint: <node 0>
- Client returns the commit was successful
Not related to this specific transaction but to give some context on what else was happening: ~30 seconds later, a different transaction submitted later gets a failed to commit secondary keys due to TxnLockNotFound, but it was not the same transaction as the one which wrote the missing KV. ~2 mins after that we see a heart beat error for TxnNotFound. Then a transaction errors with gRPC api error: status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} }
.
Here are the logs from the cluster around (30s either side) the time I called .commit()
on the transaction which was supposed to write this missing KV (I called commit at 2025-07-12T19:28:37.261
):
The missing key was 03ee44790100f9465cf9b0426aa7020a0685c66bd10a859b78e2da1dea940ba5e113600083b37e051e0b67eb17348f48d3e6e6e8b009d8ea48854d721488667d6000000000029fd4fc727f8594309b02da47145f019993219e0db644bdfd81ea6890e0c717836bdc370000000000000001
and it was written in transaction with start ts (Timestamp { physical: 1752348517228, logical: 14, suffix_bits: 0 })
and commit ts Timestamp { physical: 1752348517328, logical: 35, suffix_bits: 0 }
.
How do I go about debugging this, or what information can I provide to help? I'm not sure if this is client or something else.