Double Delayed Q-learning

Bilal H. Abed-alguni, Mohammad Ashraf Ottom

Abstract



Delayed Q-learning is an efficient model-free reinforcement-learning algorithm. This algorithm is guaranteed to converge in polynomial time to near optimal policies in Markov decision processes. However, Delayed Q-learning performs very poorly in some stochastic environments because it overestimates action values. Overestimated action values are caused by a positive bias that is a result of using the maximum value function to update the maximum expected action value. This paper applies the double-estimator method to Delayed Q-learning to construct a new algorithm called Double Delayed Q-learning (2D Q-learning). The 2D Q-learning was tested using the gambling game of roulette. The experimental results showed that 2D Q-learning converges to an optimal policy and that it performs better than Delayed Q-learning in some settings where Delayed Q-learning has a poor performance because of its large overestimation.

Keywords


Reinforcement Learning, Double Q-learning, Delayed Q-learning, Markov Decision Process, PAC-MDP

Refbacks

  • There are currently no refbacks.


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information.