Skip to content

It's not feasible to update the Q-based value agent in large steps for the RandomWalk1D() environment. #1068

@Van314159

Description

@Van314159

I followed the RandomWalk1D() example in the tutorial and wanted to update the agent. But run function returns BoundsError: attempt to access 2×7 Matrix{Float64} at index [0, 1] if I use the TDLearner. My code is

> envRW = RandomWalk1D()
> NS = length(state_space(envRW))
> NA = length(action_space(envRW))
> agentRW = Agent(
	policy = QBasedPolicy(
           learner = TDLearner(
                   TabularQApproximator(
                       n_state = NS,
                       n_action = NA,
                   ),
                   :SARS
               ),
           explorer = EpsilonGreedyExplorer(0.1)
       ),
	trajectory = Trajectory(
           ElasticArraySARTSTraces(;
               state = Int64 => (),
               action = Int64 => (),
               reward = Float64 => (),
               terminal = Bool => (),
           ),
           DummySampler(),
           InsertSampleRatioController(),
       )
)

> run(agentRW, envRW, StopAfterNEpisodes(10), TotalRewardPerEpisode())

It returns

BoundsError: attempt to access 2×7 Matrix{Float64} at index [0, 1]

The above code works if I stop the simulation early, i.e., specify StopAfterNSteps(3).
It also works for RandomPolicy().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions