Abstract:
Deep reinforcement learning has shown great potential in various fields, but the existing deep reinforcement learning algorithms need a large number of samples to learn a better strategy, while in actual scenes, deep reinforcement learning samples usually have the characteristics of small quantity and high cost. Therefore, improving sample utilization is the key to expand the application scope of deep reinforcement learning. In addition to the model-based approaches, the exploration strategy of the agent is also an important factor affecting the sample utilization. A memory-based exploration method is introduced into the agent's behavior strategy in this paper, which can quickly generate high return sample supply state value network learning by searching past memory, and speed up the training process of the algorithm. The effectiveness of the proposed algorithm is verified by using the benchmark task in the simulation environment.