倪坤, 刘云龙, 于丹宁. 基于记忆探索策略的有模型深度强化学习算法[J]. 微电子学与计算机, 2021, 38(4): 23-28.
引用本文: 倪坤, 刘云龙, 于丹宁. 基于记忆探索策略的有模型深度强化学习算法[J]. 微电子学与计算机, 2021, 38(4): 23-28.
NI Kun, LIU Yun-long, YU Dan-ning. Model-based deep reinforcement learning algorithm based on memory exploration strategy[J]. Microelectronics & Computer, 2021, 38(4): 23-28.
Citation: NI Kun, LIU Yun-long, YU Dan-ning. Model-based deep reinforcement learning algorithm based on memory exploration strategy[J]. Microelectronics & Computer, 2021, 38(4): 23-28.

基于记忆探索策略的有模型深度强化学习算法

Model-based deep reinforcement learning algorithm based on memory exploration strategy

  • 摘要: 深度强化学习在各个领域中都展现出了巨大的潜力,但现有的深度强化学习算法需要大量样本才能学习到一个较好的策略,而在实际场景中,深度强化学习样本通常存在数量少、成本高等特性.因此,改善样本利用率是拓展深度强化学习应用范围的关键.除了基于模型的方法之外,智能体的探索策略也是影响样本利用率的重要因素.本文在智能体的行为策略中引入基于记忆的探索方法,其可以通过搜索过去的记忆来快速产生高回报的样本供状态价值网络学习,加快算法的训练过程.通过在仿真环境中利用基准任务来对所提算法进行评测,验证了其有效性.

     

    Abstract: Deep reinforcement learning has shown great potential in various fields, but the existing deep reinforcement learning algorithms need a large number of samples to learn a better strategy, while in actual scenes, deep reinforcement learning samples usually have the characteristics of small quantity and high cost. Therefore, improving sample utilization is the key to expand the application scope of deep reinforcement learning. In addition to the model-based approaches, the exploration strategy of the agent is also an important factor affecting the sample utilization. A memory-based exploration method is introduced into the agent's behavior strategy in this paper, which can quickly generate high return sample supply state value network learning by searching past memory, and speed up the training process of the algorithm. The effectiveness of the proposed algorithm is verified by using the benchmark task in the simulation environment.

     

/

返回文章
返回