mlpack
blog
|
Deep Reinforcement Learning Methods - Summary
This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.
Introduction
My work mainly locates in methods/reinforcement_learning
folder
q_learning.hpp
: the main entrance for (double) q learningasync_learning.hpp
: the main entrance for async methodstraining_config.hpp
: wrapper for hyper parametersenvironment
: implementation of two classical control problems, i.e. mountain car and cart polepolicy
: implementation of several behavior policiesreplay
: implementation of experience replaynetwork
: wrapper for non-standard networks (e.g actor critic network without shared layers)worker
: implementation of async rl methods
Refactoring of existing neural network components is another important part of my work
- Detachment of
module
andoptimizer
: This influences all the optimizers and most test cases. - PVS convention: Now many of mlpack components comply with
pass-by-reference
, which is less flexible. I proposed the idea ofpass-by-value
in combination withstd::move
. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase. - Exposure of
Forward
andBackward
: Before this we only havePredict
andTrain
, which may lead to duplicate computation in some case. By the exposure ofForward
andBackward
, we can address this issue. - Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the
Alias
layer to address this issue. This is also a huge change, which will influence all the visitors. - Misc update of old APIs.
Detailed usage can be found in the two test cases: async_learning_test.cpp
and q_learning_test.cpp
. You can run the test cases by bin/mlpack_test -t QLearningTest
and bin/mlpack_test -t AsyncLearningTest
.
In total, I contributed following PRs:
- Implementation of Alias layer
- Async n-step q-learning and one step sarsa
- Implement a framework of DQN and asynchronous learning methods
- Implementation of async one step q-learning
- Add aggregated policy for async rl methods
- Support batched forward and backward for FFN
- Update Optimizer API
- Add new API for some optimizers
- Basic DQN
- Add epsilon greedy policy for DQN
- Add random replay for DQN
- Fix a bug in gaussian init
- Refactor FFN
- Implement two classical control problems for testing reinforcement learning method
- Fix bug of variadic template parameters of Optimizer
Highlights
The most challenging parts are:
- Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use
step
instead ofepisode
as a job unit. Alias
layer: This blocked me most and is still blocking me. We need a deep understanding ofarmadillo
memory management,boost::variant
and#include
directives.
Future Work
Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).
Acknowledgment
I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.
Generated by 1.8.13