Option Discovery in the Absence of Rewards with Manifold Analysis

Authors: Amitay Bar, Ronen Talmon, Ron Meir

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods. ... We empirically demonstrate that the learning performance obtained by our options outperforms competing options on three small-scale domains.
Researcher Affiliation Academia 1Viterbi Faculty of Electrical Engineering, Technion, Israel Institute of Technology .
Pseudocode Yes Algorithm 1 Diffusion Options
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We focus on three domains: a Ring domain, which is the 2D manifold of the placement of a 2-joint robotic arm (Verma, 2008), a Maze domain (Wu et al., 2019), and a 4Rooms domain (Sutton et al., 1999).
Dataset Splits No The paper describes the experimental setup for Q-learning (e.g., episodes, steps, alpha, gamma) but does not specify dataset splits (e.g., train/validation/test percentages or counts) for the environments used.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions implementing Q-learning but does not list any specific software libraries, frameworks, or solvers with version numbers.
Experiment Setup Yes We implement Q learning (Watkins & Dayan, 1992) with α = 0.1 and γ = 0.9 for 400 episodes, containing 100 steps each. ... The main hyperparameter of the algorithm is t. In our implementation, we set t = 4.