Inconsistency of the EpsilonGreedyExplorer selection function 

While working on my package, I noticed that the the EpsilonGreedyExplorer had a strange behaviour with its output. 

Here is the related function : 
```julia
function (s::EpsilonGreedyExplorer{<:Any,false})(values, mask)
    ϵ = get_ϵ(s)
    s.is_training && (s.step += 1)
    rand(s.rng) >= ϵ ? findmax(values, mask)[2] : rand(s.rng, findall(mask))
end
```

I seems that depending if the explorer with return the greedy choice (left side) or a random choice (right side), the output will be respectively : 
- _for the greedy choice_ : the index of the selected value in the **subset** of the authorized values. 
- _for the random choice_ : the index of the selected value in the original set of values. 

Let me explain the problem with a little example : 
```julia
values = Float32[-0.48240864, 0.07573502, -0.19618785, 0.25742468]
mask = Bool[1, 1, 0, 1]
rng =  MersenneTwister()
```
Its clear that the highest value is of index 4. Lets simulate the output of the explorer : 
1. if the explorer decide to return the index of the highest authorized value (greedy choice), it will return the related index **3** of the selected value in the **subset** of the authorized index, and not in the total set of values : 
```julia
julia> findmax(values, mask)[2]
3
```
this is exactly the expected behavior of the RLCore function :
 `Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) = findmax(i -> A[i], view(keys(A), mask))`

2. if the explorer decides to return a random index, it will return the index of the selected value in the original set of values :
```julia
julia> rand(rng, findall(mask))
4
```

The output signification is thus inconsistent. I am still discovering the package, so please let me know if I made a mistake. If this behavior turns out to be a bug,  I can propose a simple fix for that. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistency of the EpsilonGreedyExplorer selection function #520

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistency of the EpsilonGreedyExplorer selection function #520

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions