Relabeling the buffer with updated reward - potential bug?

Hi,

@pokaxpoka 
The function used for relabeling the data in the buffer with an updated reward function is defined here: [relabel_with_predictor](https://github.com/rll-research/BPref/blob/main/replay_buffer.py#L70). `self.idx` is used to compute `total_iter` here. After the replay buffer is full to capacity, `self.idx` will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code ([line 72](https://github.com/rll-research/BPref/blob/main/replay_buffer.py#L72)) does not allow this. 

Maybe this should work:
```
import math
def relabel_with_predictor(self, predictor):
    batch_size = 200
    if self.full: # if the buffer is full
        total_iter = math.ceil(self.capacity/batch_size) # line added
    else:
        total_iter = int(self.idx/batch_size)
            
        if self.idx > batch_size*total_iter:
            total_iter += 1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relabeling the buffer with updated reward - potential bug? #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Relabeling the buffer with updated reward - potential bug? #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions