Assumed Density Filtering Q-learning

Authors: Heejin Jeong, Clark Zhang, George J. Pappas, Daniel D. Lee

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate that ADFQ outperforms comparable algorithms on various Atari 2600 games, with drastic improvements in highly stochastic domains or domains with a large action space.
Researcher Affiliation Academia 1University of Pennsylvania, Philadelphia, PA 19104 2Cornell Tech, New York, NY 10044
Pseudocode Yes Algorithm 1 ADFQ algorithm
Open Source Code Yes Example source code is available online1. 1https://github.com/coco66/ADFQ
Open Datasets Yes We tested on six Atari games, Enduro (|A| = 9), Boxing (|A| = 18), Pong (|A| = 6), Asterix (|A| = 9), Kung-Fu Master (|A| = 14), and Breakout (|A| = 4), from the Open AI gym simulator [Brockman et al., 2016].
Dataset Splits No Each learning was greedily evaluated at every epoch (= TH/100) for 3 times, and their averaged results are presented in Fig.5. The entire experiment was repeated for 3 random seeds.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided in the paper.
Software Dependencies No For baselines, we used DQN and Double DQN with prioritized experience replay implemented in Open AI baselines2.
Experiment Setup Yes We used prioritized experience replay [Schaul et al., 2015] and a combined Huber loss functions of mean and variance. (...) We used ϵ-greedy action policy with ϵ annealed from 1.0 to 0.01 for the baselines as well as ADFQ. (...) Rewards were normalized to { 1, 0, 1} and different from raw scores of the games.