Rollover behaviour and returned id of .add() Question #23

Sebastian-Griesbach · 2022-07-13T20:07:29Z

Sebastian-Griesbach
Jul 13, 2022

Hey I am currently building a wrapper around the cpprb for my purposes. This wrapper is supposed to handle redundancies before they are put into the actual replay buffer. It's not really important for my issue but the rough idea is that i have an algorithm that saves all parameters of a given policy. To not save the same policy 300 times for 300 timesteps of the environment i build this wrapper.

My algorithm is based on the assumption that the cpprb .add() function works as follows. When calling .add() on a batch the rb adds entries at the end of the last Addition and returns the ID at which this batch started. When the end of the rb is reached the remainder of the batch is rolled over to the beginning of the replay buffer but still the id of the beginning of the batch (which would be near the end of the rb) is returned.
As i track these ids to handle redundancies dynamically i need this assumptions to be right however i found that they are not. When having a rollover the returned id always seems to be 0 and also it seems that data inside of the replay is internally shifted. I couldn't find anything about the exact mechanics of the .add() function in this regards in the documentation. Could you help me and tell me what about my assumptions is wrong/ How the .add() function handles batches that cause a rollover?

Answered by ymd-h

Jul 13, 2022

@Sebastian-Griesbach
Thank you for your feedback.

Could you give us an example code?

As far as I tried with cpprb v10.6.4 on Google Colab, replay buffer works correctly.

Single step addition

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for _ in range(5):
    print(rb.add(a=2))

# 0
# 1
# 2
# 0
# 1

Batch addition

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for _ in range(5):
    print(rb.add(a=[2, 2]))

# 0
# 2
# 1
# 0
# 2

More details

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for i in range(5):
    p_idx = rb.add(a=[i, i+1])
    n_idx = rb.get_next_index()
    buf = rb.get_all_transitions()
    print((p_idx, n_idx, buf))

# (0, 2, {'a': array([[0.], [1.]], dtype=f…

View full answer

ymd-h · 2022-07-13T22:51:35Z

ymd-h
Jul 13, 2022
Maintainer

@Sebastian-Griesbach
Thank you for your feedback.

Could you give us an example code?

As far as I tried with cpprb v10.6.4 on Google Colab, replay buffer works correctly.

Single step addition

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for _ in range(5):
    print(rb.add(a=2))

# 0
# 1
# 2
# 0
# 1

Batch addition

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for _ in range(5):
    print(rb.add(a=[2, 2]))

# 0
# 2
# 1
# 0
# 2

More details

import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})

for i in range(5):
    p_idx = rb.add(a=[i, i+1])
    n_idx = rb.get_next_index()
    buf = rb.get_all_transitions()
    print((p_idx, n_idx, buf))

# (0, 2, {'a': array([[0.], [1.]], dtype=float32)})
# (2, 1, {'a': array([[2.], [1.], [1.]], dtype=float32)})
# (1, 0, {'a': array([[2.], [2.], [3.]], dtype=float32)})
# (0, 2, {'a': array([[3.], [4.], [3.]], dtype=float32)})
# (2, 1, {'a': array([[5.], [4.], [4.]], dtype=float32)})

5 replies

Sebastian-Griesbach Jul 14, 2022
Author

Thank you for taking a look!
While trying to build and extract a minimal code example i found the actual problem. In my code i build a checkpoint function such that the whole state of the training process can be saved and started at a later point, for this i am using the save_transitions(), this does not seem to save and load the current pointer of the replay buffer and therefore starts over at 0 afterwards.
Is there some way i can set the pointer at the correct position again afterwards? (I have the position saved for my redundancy Handling, i just need a way to set it again, unfortunately i don't see many method that seems to do this in the replay buffer)

Update:
I tried using the safe = False, flag after looking into the source code, but that still doesn't fix it as i don't use compression or next_of.

ymd-h Jul 14, 2022
Maintainer

Is there some way i can set the pointer at the correct position again afterwards?

No we don't, unfortunately.

save_transitions() saves transitions, not the internal state of replay buffer.
Moreover, the order of the transitions is not guaranteed when the buffer is full.

Maybe you can

get the next index by ReplayBuffer.get_next_index()
dump transitions manually with correct order by private method ReplayBuffer._encode_sample(indexes)
add dummy samples to the new replay buffer to adjust index,
finally add real data to the new one.

Warning
The above procedure cannot work with Nstep feature.

Note
_encode_sample() is not a public method nor guaranteed API, but we know some users use it for special needs, so that we don't want to change it without enough reason.

Sebastian-Griesbach Jul 17, 2022
Author

Thank you for this workaround. Not pretty but should do the trick!

Sebastian-Griesbach Jul 17, 2022
Author

Sorry one more question, in step you you mention that i should use _encode_sample to to dump transitions in the correct order. Does that mean that save_transitions does not maintain the order of transitions?

ymd-h Jul 18, 2022
Maintainer

@Sebastian-Griesbach

save_transiotions() takes transitions with get_all_transitions(), where continuous indexes from 0 to get_stored_size() are used.
If the buffer is full (aka. get_stored_size() == get_buffer_size()), not always the oldest transition becomes the first.
https://github.com/ymd-h/cpprb/blob/v10.6.4/cpprb/PyReplayBuffer.pyx#L1217
https://github.com/ymd-h/cpprb/blob/v10.6.4/cpprb/PyReplayBuffer.pyx#L1194

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rollover behaviour and returned id of .add() Question #23

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Rollover behaviour and returned id of .add() Question #23

Uh oh!

Sebastian-Griesbach Jul 13, 2022

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

ymd-h Jul 13, 2022 Maintainer

Uh oh!

Uh oh!

Sebastian-Griesbach Jul 14, 2022 Author

Uh oh!

ymd-h Jul 14, 2022 Maintainer

Uh oh!

Sebastian-Griesbach Jul 17, 2022 Author

Uh oh!

Sebastian-Griesbach Jul 17, 2022 Author

Uh oh!

ymd-h Jul 18, 2022 Maintainer

Sebastian-Griesbach
Jul 13, 2022

Replies: 1 comment 5 replies

ymd-h
Jul 13, 2022
Maintainer

Sebastian-Griesbach Jul 14, 2022
Author

ymd-h Jul 14, 2022
Maintainer

Sebastian-Griesbach Jul 17, 2022
Author

Sebastian-Griesbach Jul 17, 2022
Author

ymd-h Jul 18, 2022
Maintainer