YDB CLI import stuck

Steps to reproduce:

1. Create table

    ```
    DROP TABLE `wikipedia`;
    
    CREATE TABLE wikipedia (
        id Uint64 NOT NULL,
        title Utf8,
        text Utf8,
      url Utf8,
      wiki_id Uint32,
      views float,
      paragraph_id Uint32,
      langs Uint32,
      emb Utf8,
      embedding String,
        PRIMARY KEY (id)
    );
    ```

2. Get dataset

    https://proxy.sandbox.yandex-team.ru/8368260097

3. Import dataset

    ```
    ydb -v import file csv --path wikipedia --header wikipedia_embeddings_train.csv --timeout 30
    ```

  https://paste.yandex-team.ru/04758dbd-3d60-4eab-bf09-83b8c295111b/text

The upload will stop on ~73% when a datashard is splitting

There are some logs about retries but nothing is happen in the next 10 minutes

Also it may be noticed that there is no synchronization among threads on writing errors and the output is confusing (we have `Sending retry attempt 1 of 10000` log 23 times

Maybe retry strategy uses exponential backoff and increases too much (x2 by each error in each thread), or there is some race and retries just dead locked

I propose that this issue must have a high priority as any new user that wants to try YDB may face it and never go back

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YDB CLI import stuck #16329

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

YDB CLI import stuck #16329

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions