Open
Description
Steps to reproduce:
-
Create table
DROP TABLE `wikipedia`; CREATE TABLE wikipedia ( id Uint64 NOT NULL, title Utf8, text Utf8, url Utf8, wiki_id Uint32, views float, paragraph_id Uint32, langs Uint32, emb Utf8, embedding String, PRIMARY KEY (id) );
-
Get dataset
-
Import dataset
ydb -v import file csv --path wikipedia --header wikipedia_embeddings_train.csv --timeout 30
https://paste.yandex-team.ru/04758dbd-3d60-4eab-bf09-83b8c295111b/text
The upload will stop on ~73% when a datashard is splitting
There are some logs about retries but nothing is happen in the next 10 minutes
Also it may be noticed that there is no synchronization among threads on writing errors and the output is confusing (we have Sending retry attempt 1 of 10000
log 23 times
Maybe retry strategy uses exponential backoff and increases too much (x2 by each error in each thread), or there is some race and retries just dead locked
I propose that this issue must have a high priority as any new user that wants to try YDB may face it and never go back