-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Hi,
@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor. self.idx
is used to compute total_iter
here. After the replay buffer is full to capacity, self.idx
will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.
Maybe this should work:
import math
def relabel_with_predictor(self, predictor):
batch_size = 200
if self.full: # if the buffer is full
total_iter = math.ceil(self.capacity/batch_size) # line added
else:
total_iter = int(self.idx/batch_size)
if self.idx > batch_size*total_iter:
total_iter += 1
Metadata
Metadata
Assignees
Labels
No labels