Thanks for open sourcing this, this is very good stuff. However, the code doesn't seem to work on mountain car env? Maybe is it because I have only 2600 expert state/action pairs?