It's not feasible to update the Q-based value agent in large steps for the RandomWalk1D() environment.

I followed the `RandomWalk1D()` example in the tutorial and wanted to update the agent. But `run` function returns `BoundsError: attempt to access 2×7 Matrix{Float64} at index [0, 1]` if I use the `TDLearner`. My code is 
```
> envRW = RandomWalk1D()
> NS = length(state_space(envRW))
> NA = length(action_space(envRW))
> agentRW = Agent(
	policy = QBasedPolicy(
           learner = TDLearner(
                   TabularQApproximator(
                       n_state = NS,
                       n_action = NA,
                   ),
                   :SARS
               ),
           explorer = EpsilonGreedyExplorer(0.1)
       ),
	trajectory = Trajectory(
           ElasticArraySARTSTraces(;
               state = Int64 => (),
               action = Int64 => (),
               reward = Float64 => (),
               terminal = Bool => (),
           ),
           DummySampler(),
           InsertSampleRatioController(),
       )
)

> run(agentRW, envRW, StopAfterNEpisodes(10), TotalRewardPerEpisode())
```
It returns 
```
BoundsError: attempt to access 2×7 Matrix{Float64} at index [0, 1]
```
The above code works if I stop the simulation early, i.e., specify `StopAfterNSteps(3)`.
It also works for `RandomPolicy()`. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

It's not feasible to update the Q-based value agent in large steps for the RandomWalk1D() environment. #1068

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

It's not feasible to update the Q-based value agent in large steps for the RandomWalk1D() environment. #1068

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions