Rollover behaviour and returned id of .add() Question #23
-
Hey I am currently building a wrapper around the cpprb for my purposes. This wrapper is supposed to handle redundancies before they are put into the actual replay buffer. It's not really important for my issue but the rough idea is that i have an algorithm that saves all parameters of a given policy. To not save the same policy 300 times for 300 timesteps of the environment i build this wrapper. My algorithm is based on the assumption that the cpprb .add() function works as follows. When calling .add() on a batch the rb adds entries at the end of the last Addition and returns the ID at which this batch started. When the end of the rb is reached the remainder of the batch is rolled over to the beginning of the replay buffer but still the id of the beginning of the batch (which would be near the end of the rb) is returned. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
@Sebastian-Griesbach Could you give us an example code? As far as I tried with cpprb v10.6.4 on Google Colab, replay buffer works correctly. Single step addition import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})
for _ in range(5):
print(rb.add(a=2))
# 0
# 1
# 2
# 0
# 1 Batch addition import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})
for _ in range(5):
print(rb.add(a=[2, 2]))
# 0
# 2
# 1
# 0
# 2 More details import cpprb
rb = cpprb.ReplayBuffer(3, {"a": {}})
for i in range(5):
p_idx = rb.add(a=[i, i+1])
n_idx = rb.get_next_index()
buf = rb.get_all_transitions()
print((p_idx, n_idx, buf))
# (0, 2, {'a': array([[0.], [1.]], dtype=float32)})
# (2, 1, {'a': array([[2.], [1.], [1.]], dtype=float32)})
# (1, 0, {'a': array([[2.], [2.], [3.]], dtype=float32)})
# (0, 2, {'a': array([[3.], [4.], [3.]], dtype=float32)})
# (2, 1, {'a': array([[5.], [4.], [4.]], dtype=float32)}) |
Beta Was this translation helpful? Give feedback.
@Sebastian-Griesbach
Thank you for your feedback.
Could you give us an example code?
As far as I tried with cpprb v10.6.4 on Google Colab, replay buffer works correctly.
Single step addition
Batch addition
More details