BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images

Authors: Sandesh Adhikary, Anqi Li, Byron Boots

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that when added as a drop-in modification, Beige Maps improve the policy performance of prior behavioral distance based RL algorithms. and We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018)
Researcher Affiliation Collaboration Sandesh Adhikary 1 Anqi Li 1 2 Byron Boots 1 1Computer Science and Engineering, University of Washington, Seattle, WA (USA) 2NVIDIA; Work done while AL was affiliated with the University of Washington. Correspondence to: Sandesh Adhikary <adhikary@cs.washington.edu>.
Pseudocode Yes Algorithm 1 Behavioral Distance Representation Learning
Open Source Code No The paper does not provide any explicit statement about releasing the source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018), a set of continuous control tasks that has been used as a benchmark for all prior behavioral distance algorithms.
Dataset Splits No The paper mentions evaluating on "random evaluation seeds" and "random training seeds" but does not specify explicit percentages or counts for training, validation, or test dataset splits. It also mentions "All environments are truncated at 1000 steps and have dense rewards bounded between [0, 1], except for ball in cup catch which has sparse binary rewards." but this is about environment setup, not data splitting.
Hardware Specification No All models were trained using the Hyak computing cluster at the University of Washington. Each model was trained on a single GPU, which was assigned by the cluster’s scheduling system." (No specific GPU model, CPU, or other hardware details are provided.)
Software Dependencies No The paper mentions using a "Docker container" and libraries like "Soft Actor Critic (SAC)" and "rliable library", but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In Table 3, we list all model hyperparameter choices the only difference from Zhang et al. (2022) is that we use a smaller replay buffer size (100, 000 instead of 1M) due to computational constraints." (Table 3 provides detailed hyperparameter values such as Batch Size 128, Discount γ 0.99, various learning rates, etc.)