Assumed Density Filtering Q-learning
Authors: Heejin Jeong, Clark Zhang, George J. Pappas, Daniel D. Lee
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that ADFQ outperforms comparable algorithms on various Atari 2600 games, with drastic improvements in highly stochastic domains or domains with a large action space. |
| Researcher Affiliation | Academia | 1University of Pennsylvania, Philadelphia, PA 19104 2Cornell Tech, New York, NY 10044 |
| Pseudocode | Yes | Algorithm 1 ADFQ algorithm |
| Open Source Code | Yes | Example source code is available online1. 1https://github.com/coco66/ADFQ |
| Open Datasets | Yes | We tested on six Atari games, Enduro (|A| = 9), Boxing (|A| = 18), Pong (|A| = 6), Asterix (|A| = 9), Kung-Fu Master (|A| = 14), and Breakout (|A| = 4), from the Open AI gym simulator [Brockman et al., 2016]. |
| Dataset Splits | No | Each learning was greedily evaluated at every epoch (= TH/100) for 3 times, and their averaged results are presented in Fig.5. The entire experiment was repeated for 3 random seeds. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided in the paper. |
| Software Dependencies | No | For baselines, we used DQN and Double DQN with prioritized experience replay implemented in Open AI baselines2. |
| Experiment Setup | Yes | We used prioritized experience replay [Schaul et al., 2015] and a combined Huber loss functions of mean and variance. (...) We used ϵ-greedy action policy with ϵ annealed from 1.0 to 0.01 for the baselines as well as ADFQ. (...) Rewards were normalized to { 1, 0, 1} and different from raw scores of the games. |