Skip to content

Relabeling the buffer with updated reward - potential bug? #7

@vrn25

Description

@vrn25

Hi,

@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor. self.idx is used to compute total_iter here. After the replay buffer is full to capacity, self.idx will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.

Maybe this should work:

import math
def relabel_with_predictor(self, predictor):
    batch_size = 200
    if self.full: # if the buffer is full
        total_iter = math.ceil(self.capacity/batch_size) # line added
    else:
        total_iter = int(self.idx/batch_size)
            
        if self.idx > batch_size*total_iter:
            total_iter += 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions