Social Attention Experiment

An implementation of the paper Social Attention for Autonomous Decision-Making in Dense Traffic with custom modification (experiments with double DQN, SAC and other potential advancements).

The first attempt uses the low-level continuous action, which might be harder for the agent to learn (RGB rendering shows that the agent goes crazy and few other vehicles appeared in the view), perhaps our next step is to try out SAC over discrete action space (with the DiscreteMetaAction type).

To-do List

Try out DiscreteMetaAction (which clearly avoids the problem of the agent going crazy, and we put rgb_render on so that you can actually see how it performs.)
- Issue 1: discrete SAC is not working (entropy gets done to 0 quickly and the agent only learns to act FASTER), the target_entropy setting in my code is suspicious according to this paper. We'll do double DQN for the discrete case for simplicity.
- Issue 2: double DQN appears to be prone to catastrophic forgetting, the displayed result is one of the many attempts that does not turn to keep declining after a peak. The plan is to try out methods that explore & exploit more efficiently so that our Q network does not get to solely update the sub-optimal transitions after a period of training, such as prioritized experience replay or Bayesian DQN.
- Issue 3: our agent dies quickly at times (see the huge variance in the returns), perhaps some safety RL methods should be considered.
Incorporate DRQN and test the performance by removing the velocity information from the state space (maybe refer to this paper Playing FPS Games with Deep Reinforcement Learning)

Results

Continuous Action (3000 episodes, reward scale x10)

Meta-Discrete Action (3000 episodes, reward scale x10)

References

The implementation of SAC, double DQN and rl_utils.py are adapted from the Hands on RL tutorial. What I did on these implementations is add some small features such as linear learning rate scheduler, epsilon decay, rendering the environment during training time and modifying the architecture of policy and q value networks to be more suitable for using the social attention module and highway-env.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
README.md		README.md
SAC.py		SAC.py
attention.py		attention.py
double_DQN.py		double_DQN.py
model_eval_ddqn.py		model_eval_ddqn.py
requirements.txt		requirements.txt
result01.png		result01.png
result02.png		result02.png
rl_utils.py		rl_utils.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Social Attention Experiment

To-do List

Results

Continuous Action (3000 episodes, reward scale x10)

Meta-Discrete Action (3000 episodes, reward scale x10)

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hanyang-hu/social-attention-exp

Folders and files

Latest commit

History

Repository files navigation

Social Attention Experiment

To-do List

Results

Continuous Action (3000 episodes, reward scale x10)

Meta-Discrete Action (3000 episodes, reward scale x10)

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages