论文笔记《Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation》 02-26