reproducibilityindex.ai

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Authors: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality.
Researcher Affiliation	Collaboration	Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell & Tong Li Layer 6 AI {panteha, gabriel, harry, anthony, jesse, tong}@layer6.ai Animesh Garg University of Toronto, Vector Institute, Nvidia garg@cs.toronto.edu
Pseudocode	Yes	Algorithm 1: Training C-learning Network
Open Source Code	Yes	Our code is available at https://github.com/layer6ai-labs/CAE
Open Datasets	Yes	3. Fetch Pick And Place-v1 (Brockman et al., 2016) is a complex, higher-dimensional environment in which a robotic arm needs to pick up a block and move it to the goal location... 4. Hand Manipulate Pen Full-v0 (Brockman et al., 2016) is a realistic environment known the be a difﬁcult goal-reaching problem...
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide details about a validation dataset split or percentages.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions.
Experiment Setup	Yes	For all methods, we train for 300 episodes, each one of maximal length 50 steps, we use a learning rate 10 3, a batch size of size 256, and train for 64 gradient steps per episode. We use a 0.1-greedy for the behavior policy. We use a neural network with two hidden layers of respective sizes 60 and 40 with Re LU activations. We use 15 fully random exploration episodes before we start training. We take p(s0) as uniform among non-hole states during training, and set it as a point mass at (1, 0) for testing. We set p(g) as uniform among states during training, and we evaluate at every goal during testing. For C-learning, we use κ = 3, and copy the target network every 10 steps.