Learning Uncertainty-Aware Temporally-Extended Actions
Authors: Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments. Our findings show that UTE outperforms existing action repetition algorithms, effectively mitigating their inherent limitations and significantly enhancing policy learning efficiency. |
| Researcher Affiliation | Collaboration | 1Seoul National University 2Samsung Research 3Google Deep Mind |
| Pseudocode | Yes | Algorithm 1: UTE: Uncertainty-Aware Temporal Extension |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We used Open Ai Gym s Atari environment with 4 frame-skips (Bellemare et al. 2013). |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly mention using a separate validation set or provide details on how data was split for validation. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks) used in the experiments. |
| Experiment Setup | Yes | To ensure a fair comparison, we explored a considerable range of hyperparameters to identify the most optimal value for each algorithm (Refer Table 8, 9, 11, and 12 in Appendix). Each algorithm is trained for a total of 2.5 106 training steps, which is only 10 million frames. All algorithms except B-DQN use a linearly decaying ϵ-greedy exploration schedule over the first 200,000 time-steps with a final ϵ fixed to 0.01. |