Skip to content

YDB CLI import stuck #16329

Open
Open
@kunga

Description

@kunga

Steps to reproduce:

  1. Create table

    DROP TABLE `wikipedia`;
    
    CREATE TABLE wikipedia (
        id Uint64 NOT NULL,
        title Utf8,
        text Utf8,
      url Utf8,
      wiki_id Uint32,
      views float,
      paragraph_id Uint32,
      langs Uint32,
      emb Utf8,
      embedding String,
        PRIMARY KEY (id)
    );
    
  2. Get dataset

    https://proxy.sandbox.yandex-team.ru/8368260097

  3. Import dataset

    ydb -v import file csv --path wikipedia --header wikipedia_embeddings_train.csv --timeout 30
    

https://paste.yandex-team.ru/04758dbd-3d60-4eab-bf09-83b8c295111b/text

The upload will stop on ~73% when a datashard is splitting

There are some logs about retries but nothing is happen in the next 10 minutes

Also it may be noticed that there is no synchronization among threads on writing errors and the output is confusing (we have Sending retry attempt 1 of 10000 log 23 times

Maybe retry strategy uses exponential backoff and increases too much (x2 by each error in each thread), or there is some race and retries just dead locked

I propose that this issue must have a high priority as any new user that wants to try YDB may face it and never go back

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/cliCLI related issuesbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions