The policy-based version of Mortal has been open-sourced #91
Replies: 5 comments 3 replies
-
Excellent work! Policy-based methods were what I started with at the very beginning (pre-v1). I experimented with various common methods and practices and spent a lot of time (several months) tuning it. However, they all failed to outperform the baseline. In the end, surprisingly and somewhat unreasonably, simple value-based methods turned out to be effective, so I went on building Mortal with value-based methods. |
Beta Was this translation helpful? Give feedback.
-
Have you tried to compare the model with Akochan or Akagi? I think mjai.app doesn't provide enough samples to compare the performance of models. Also, what's the size of your model? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I believe that compared to model architecture, RL algorithms are more important. I’ve tested Transformer architectures, CNN architectures, and hybrid architectures combining CNN with self-attention. Personally, I favor distributional Q-value methods. They not only allow for easy policy switching, but also enable the assessment of action risk variability, thereby reinforcing more deterministic strategies. Moreover, they make it easier to simulate opponents with diverse playing styles in an online training environment. |
Beta Was this translation helpful? Give feedback.
-
I have tested the policy-based version for several months, so I want to share some thoughts. The most evident improvement of the policy-based version, in my opinion, is the sampling. I never managed to figure out the optimal hyperparameters for the Boltzmann sampler of the value-based Mortal, but in the policy-based Mortal, the exploration can be easily controlled by the entropy weight. It seems to me that the policy-based Mortal is also better in numerical stability, but maybe it is just about random seeds. About performance, it seems that the policy-based Mortal trains slower than the value-based Mortal. However, due to the stability issues I encountered in the value-based Mortal, I was not able to proceed with the training further for the value-based Mortal, and finally, the policy-based Mortal outperformed the value-based Mortal after many more training steps (~50M games for policy-based and ~10M for value-based). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/Nitasurin/Mortal-Policy
Its complete original implementation based on MortalV2 has been used in certain scenarios since 2022
Beta Was this translation helpful? Give feedback.
All reactions