Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images

Authors: Sandesh Adhikary, Anqi Li, Byron Boots

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that when added as a drop-in modification, Beige Maps improve the policy performance of prior behavioral distance based RL algorithms. and We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018)
Researcher Affiliation Collaboration Sandesh Adhikary 1 Anqi Li 1 2 Byron Boots 1 1Computer Science and Engineering, University of Washington, Seattle, WA (USA) 2NVIDIA; Work done while AL was affiliated with the University of Washington. Correspondence to: Sandesh Adhikary <EMAIL>.
Pseudocode Yes Algorithm 1 Behavioral Distance Representation Learning
Open Source Code No The paper does not provide any explicit statement about releasing the source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018), a set of continuous control tasks that has been used as a benchmark for all prior behavioral distance algorithms.
Dataset Splits No The paper mentions evaluating on "random evaluation seeds" and "random training seeds" but does not specify explicit percentages or counts for training, validation, or test dataset splits. It also mentions "All environments are truncated at 1000 steps and have dense rewards bounded between [0, 1], except for ball in cup catch which has sparse binary rewards." but this is about environment setup, not data splitting.
Hardware Specification No All models were trained using the Hyak computing cluster at the University of Washington. Each model was trained on a single GPU, which was assigned by the clusterโ€™s scheduling system." (No specific GPU model, CPU, or other hardware details are provided.)
Software Dependencies No The paper mentions using a "Docker container" and libraries like "Soft Actor Critic (SAC)" and "rliable library", but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In Table 3, we list all model hyperparameter choices the only difference from Zhang et al. (2022) is that we use a smaller replay buffer size (100, 000 instead of 1M) due to computational constraints." (Table 3 provides detailed hyperparameter values such as Batch Size 128, Discount ฮณ 0.99, various learning rates, etc.)