posted on 2023-10-29, 11:25authored byHerke Van Hoof, Jan Peters, Gerhard Neumann
<p>Learning complex control policies from highdimensional sensory input is a challenge forreinforcement learning algorithms. Kernel methods that approximate values functionsor transition models can address this problem. Yet, many current approaches rely oninstable greedy maximization. In this paper, we develop a policy search algorithm thatintegrates robust policy updates and kernel embeddings. Our method can learn nonparametriccontrol policies for infinite horizon continuous MDPs with high-dimensionalsensory representations. We show that our method outperforms related approaches, andthat our algorithm can learn an underpowered swing-up task task directly from highdimensionalimage data.</p>
History
School affiliated with
School of Computer Science (Research Outputs)
Publication Title
Journal of Machine Learning Research: Workshop and Conference Proceedings
Volume
38
Pages/Article Number
995-1003
Publisher
MIT Press
ISSN
1532-4435
eISSN
1533-7928
Date Submitted
2017-02-24
Date Accepted
2015-05-12
Date of First Publication
2015-05-12
Date of Final Publication
2015-05-12
Event Name
18th International Conference on Artificial Intelligence and Statistics (AISTATS)