Skip to content

Inconsistency of the EpsilonGreedyExplorer selection function  #520

@3rdCore

Description

@3rdCore

While working on my package, I noticed that the the EpsilonGreedyExplorer had a strange behaviour with its output.

Here is the related function :

function (s::EpsilonGreedyExplorer{<:Any,false})(values, mask)
    ϵ = get_ϵ(s)
    s.is_training && (s.step += 1)
    rand(s.rng) >= ϵ ? findmax(values, mask)[2] : rand(s.rng, findall(mask))
end

I seems that depending if the explorer with return the greedy choice (left side) or a random choice (right side), the output will be respectively :

  • for the greedy choice : the index of the selected value in the subset of the authorized values.
  • for the random choice : the index of the selected value in the original set of values.

Let me explain the problem with a little example :

values = Float32[-0.48240864, 0.07573502, -0.19618785, 0.25742468]
mask = Bool[1, 1, 0, 1]
rng =  MersenneTwister()

Its clear that the highest value is of index 4. Lets simulate the output of the explorer :

  1. if the explorer decide to return the index of the highest authorized value (greedy choice), it will return the related index 3 of the selected value in the subset of the authorized index, and not in the total set of values :
julia> findmax(values, mask)[2]
3

this is exactly the expected behavior of the RLCore function :
Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) = findmax(i -> A[i], view(keys(A), mask))

  1. if the explorer decides to return a random index, it will return the index of the selected value in the original set of values :
julia> rand(rng, findall(mask))
4

The output signification is thus inconsistent. I am still discovering the package, so please let me know if I made a mistake. If this behavior turns out to be a bug, I can propose a simple fix for that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions