C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Authors: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality.
Researcher Affiliation Collaboration Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell & Tong Li Layer 6 AI {panteha, gabriel, harry, anthony, jesse, tong}@layer6.ai Animesh Garg University of Toronto, Vector Institute, Nvidia garg@cs.toronto.edu
Pseudocode Yes Algorithm 1: Training C-learning Network
Open Source Code Yes Our code is available at https://github.com/layer6ai-labs/CAE
Open Datasets Yes 3. Fetch Pick And Place-v1 (Brockman et al., 2016) is a complex, higher-dimensional environment in which a robotic arm needs to pick up a block and move it to the goal location... 4. Hand Manipulate Pen Full-v0 (Brockman et al., 2016) is a realistic environment known the be a difficult goal-reaching problem...
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about a validation dataset split or percentages.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions.
Experiment Setup Yes For all methods, we train for 300 episodes, each one of maximal length 50 steps, we use a learning rate 10 3, a batch size of size 256, and train for 64 gradient steps per episode. We use a 0.1-greedy for the behavior policy. We use a neural network with two hidden layers of respective sizes 60 and 40 with Re LU activations. We use 15 fully random exploration episodes before we start training. We take p(s0) as uniform among non-hole states during training, and set it as a point mass at (1, 0) for testing. We set p(g) as uniform among states during training, and we evaluate at every goal during testing. For C-learning, we use κ = 3, and copy the target network every 10 steps.