Skip to content

[RL] Improve reward function #11

@mjuchli

Description

@mjuchli

Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.

evernote snapshot 20180310 225733

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions