Tidbd: Adapting temporal-difference step-sizes through stochastic meta-descent

Type
Publication
arXiv preprint arXiv:1804.03334