Option Discovery in the Absence of Rewards with Manifold Analysis
Authors: Amitay Bar, Ronen Talmon, Ron Meir
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods. ... We empirically demonstrate that the learning performance obtained by our options outperforms competing options on three small-scale domains. |
| Researcher Affiliation | Academia | 1Viterbi Faculty of Electrical Engineering, Technion, Israel Institute of Technology . |
| Pseudocode | Yes | Algorithm 1 Diffusion Options |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We focus on three domains: a Ring domain, which is the 2D manifold of the placement of a 2-joint robotic arm (Verma, 2008), a Maze domain (Wu et al., 2019), and a 4Rooms domain (Sutton et al., 1999). |
| Dataset Splits | No | The paper describes the experimental setup for Q-learning (e.g., episodes, steps, alpha, gamma) but does not specify dataset splits (e.g., train/validation/test percentages or counts) for the environments used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing Q-learning but does not list any specific software libraries, frameworks, or solvers with version numbers. |
| Experiment Setup | Yes | We implement Q learning (Watkins & Dayan, 1992) with α = 0.1 and γ = 0.9 for 400 episodes, containing 100 steps each. ... The main hyperparameter of the algorithm is t. In our implementation, we set t = 4. |