Issues with List API reliability

The API [POST /api/v1/list](https://attestate.com/kiwistand/main/http-api.html) is the main method to get message data out of a Kiwi News node. Having worked with it over a longer time span, it shows more and more issues that make it harder to work with it.

The API allows retrieving the messages pagewise, by specifying the start index and the number of messages. To retrieve all messages of a node, the method must be called repeatedly. 

## Known limitations of the API

The API _walks the trie_ and is expensive in terms of memory and computation. With smaller nodes (4GB RAM, see the Hetzner CAX11 for a reference) it is necessary to pause (15–30 seconds) between each invocation of the API; otherwise the node might hang or crash. 

The API seems to highly dependent on the state/load of the system. Sometimes clients get responses without any data, although the next invocation returns data. On other occasions, one client gets repeatedly HTTP 500 responses about missing leaf nodes, while another client happily retrieves the same data the first one also called for.

The sequence of data items retrieved is not always guaranteed. Therefore, ffter node crashes or re-initialization of a node, a complete re-download of all messages is often necessary.

While the re-download was no problem when there were only a few thousand messages, with increasing messages numbers this is becoming a problem for systems relying on the data.

## ETL process to retrieve message data and *missing data*

The ETL  process to retrieve data consists of two steps:

1) Get all current messages by downloading them with repeated invocations of the `list` API, until no data is returned anymore.
2) Periodically, check for new messages. The ETL process knows how many messages it has retrieved, and uses this as the `from` (start index) parameter.  

Step (1) normally works, occasional server errors can be corrected by repeating the API calls. On a CAX11 VM this process can take an hour.

However, it occurred that the number of messages retrieved after a complete re-initialization of a node is lower than its previous message count. A system, that previously had exported 25112 messages from a node, got only 25048 after the node data had been deleted and re-synched. 

Step (2) is problematic. It gets all new messages from the node, but over time it also gets duplicate messages that should not be there:

![kn-import-duplicate](https://github.com/user-attachments/assets/0e23bfd7-7c71-43d4-b947-f0a9d65ba1d5)

The image shows the amount of new messages the ETL process exports (green line), per call. The yellow line is the number of duplicates the process received. Here, at 10:00, the ETL process received a message with an already existing message index, although it should have gotten only new items! Is there now somewhere new data, that changed the sequence?

These duplicates never go away, their number increases over time (1-4 messages per occasion). After 37 days of operation, there were 10 duplicates.

This behavior affects the download of `amplify` and `comment` messages.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with List API reliability #156

Known limitations of the API

ETL process to retrieve message data and missing data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with List API reliability #156

Description

Known limitations of the API

ETL process to retrieve message data and missing data

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions