State machines review #549

ndr-brt · 2022-01-25T13:50:56Z

ndr-brt
Jan 25, 2022
Collaborator

In the last days some bugs and issues popped up regarding the state machines (TransferProcessManager and *ContractNegotiationManager).
Seems like we haven't defined a clear pattern for the state machines and that is causing (potential) problems.

The state is not always updated

Problem
There are some methods that fetch entities with nextfForState and then they do not always update the state.
This could lead to situations where there are are more than batchSize entities in the same state and they are not updated, in that case no other entities will be processed causing them to stale. (problem mentioned here #393 )
Solution
Every time an entity is fetched to be processed, its state should be updated, to make sure that every entity could receive enough attention.

The state is not moved to an "not to be processed" state before executing an async operation

Problem
This is the problem described here: #538 . When an entity is fetched from the nextForState and an async operation is called, it should be set in a state that's not fetched by the state machine loop, otherwise that can cause useless loops (as described in the issue)
Solution
Define 3 type of states:

To be processed state: an entity with this state is fetched by the nextForState, processed and moved to a processing state (async operation), to another to be processed state (sync operation) or to a final state (nothing else should be done)
Processing state: an entity with this state is waiting for an asynchronous response, it's not fetched by the nextForState.
Final state: an entity with this state has reached the terminus and there should stay, e.g. states like COMPLETED, ENDED, ERROR, ...

The state is updated without fetching the entity again after an asynchronous operation

Problem
Sometimes, after an async call result evaluation (inside the whenComplete block), the state is updated and the entity saved to the store, but using the entity that was fetched from nextForState, this could lead to problems if in the meantime another async operation happens (e.g. a message is received from a peer)
Solution
After an async operation the entity should be fetched again before being updated and stored.

`nextForState` does not scale well

Problem
Reading entities to be processed by their state could lead to poor performances:

with small batchSize there'll be too many queries for a single loop iteration (useless database load)
with big batchSize there'll be too much time from processing a state to processing another one
(issue TransferProcessManager only acts on 5 transfers #393)

Solution
Do only one fetch for every loop iteration that reads the older entities in any "to be processed" state in a single batch and process them in a way described by their state.

There are actions that don't follow the loop order

Problem
There are interface methods that permit the modification of the entities outside the state machine/command queue loop (e.g. confirmed in ConsumerContractNegotiationManager). This could lead to errors in the case where that entity is already leased by another thread.
Solution
Every operation on the entities should be executed by the loop. Every of those method should create and enqueue a command that will be executed in the loop.

I'd like to gather some impressions by everyone to eventually open some issues.

jimmarino · 2022-01-26T22:11:17Z

jimmarino
Jan 26, 2022
Collaborator

Thanks @ndr-brt for a great write-up. Let me look through the issues you outlined and I'll get back with responses ASAP.

0 replies

ndr-brt · 2022-01-27T14:14:33Z

ndr-brt
Jan 27, 2022
Collaborator Author

Another issue, we miss idempotency, e.g. when a provider confirms a negotiation 2 times, the consumer should not respond with a 500 but should assert "well, I already confirmed it, no problem with that". This was reported in #545

3 replies

jimmarino Jan 27, 2022
Collaborator

That happens in a number of places, particularly provisioning. We'll need to deal with that holistically through the codebase

ndr-brt Jan 28, 2022
Collaborator Author

In my opinion passing N times through a state shouldn't be cause for set the entity in an error state, better will be to track the errors themselves in another way. e.g. on a failing provision calling a provisionFailed method on the transfer process that will save this information, and that after N failures could move the entity in a failing state.

Saving a list of "facts" inside the entity could help evaluate the various operations and taking decisions on what should be done, this will also gives free audit reports on the entity (like in an "event-sourced" environment).

jimmarino Jan 28, 2022
Collaborator

That would be one way but it also introduces issues we need to work through. Let's discuss.

DominikPinsel · 2022-01-28T08:32:56Z

DominikPinsel
Jan 28, 2022

Thanks for your review. I assume these issues arise the complexity of the code and would like opportunity to point out another discussion, I opened some time ago, where I'm making a suggestion how we could reduce the complexity there.

_{Dominik Pinsel dominik.pinsel@daimler.com, Daimler TSS GmbH, legal info/Impressum}

1 reply

ndr-brt Jan 28, 2022
Collaborator Author

Yes, I read your discussion at the time it was published and I think applying your proposal points makes totally sense

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

State machines review #549

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

State machines review #549

Uh oh!

Uh oh!

ndr-brt Jan 25, 2022 Collaborator

The state is not always updated

The state is not moved to an "not to be processed" state before executing an async operation

The state is updated without fetching the entity again after an asynchronous operation

nextForState does not scale well

There are actions that don't follow the loop order

Replies: 3 comments · 4 replies

Uh oh!

jimmarino Jan 26, 2022 Collaborator

Uh oh!

ndr-brt Jan 27, 2022 Collaborator Author

Uh oh!

jimmarino Jan 27, 2022 Collaborator

Uh oh!

ndr-brt Jan 28, 2022 Collaborator Author

Uh oh!

jimmarino Jan 28, 2022 Collaborator

Uh oh!

DominikPinsel Jan 28, 2022

Uh oh!

ndr-brt Jan 28, 2022 Collaborator Author

ndr-brt
Jan 25, 2022
Collaborator

`nextForState` does not scale well

Replies: 3 comments 4 replies

jimmarino
Jan 26, 2022
Collaborator

ndr-brt
Jan 27, 2022
Collaborator Author

jimmarino Jan 27, 2022
Collaborator

ndr-brt Jan 28, 2022
Collaborator Author

jimmarino Jan 28, 2022
Collaborator

DominikPinsel
Jan 28, 2022

ndr-brt Jan 28, 2022
Collaborator Author