Abstract:
To find a balance between exploration and exploitation, this paper proposes a VDBE(Value-Difference Based Exploration) based algorithm. The algorithm proposes a state-based control strategy depends on the value difference. In order to achieve the ideal exploration/exploitation behavior state, agent takes positive actions to explore environments in the initial stage of learning when agent is unfamiliar with surrounding environment. As learning time goes on and agent is more familiar with surrounding, it gradually reduces the exploration rate.