BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images
Authors: Sandesh Adhikary, Anqi Li, Byron Boots
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that when added as a drop-in modification, Beige Maps improve the policy performance of prior behavioral distance based RL algorithms. and We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018) |
| Researcher Affiliation | Collaboration | Sandesh Adhikary 1 Anqi Li 1 2 Byron Boots 1 1Computer Science and Engineering, University of Washington, Seattle, WA (USA) 2NVIDIA; Work done while AL was affiliated with the University of Washington. Correspondence to: Sandesh Adhikary <adhikary@cs.washington.edu>. |
| Pseudocode | Yes | Algorithm 1 Behavioral Distance Representation Learning |
| Open Source Code | No | The paper does not provide any explicit statement about releasing the source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We train and evaluate all algorithms on the Deep Mind Control (DMC) suite (Tassa et al., 2018), a set of continuous control tasks that has been used as a benchmark for all prior behavioral distance algorithms. |
| Dataset Splits | No | The paper mentions evaluating on "random evaluation seeds" and "random training seeds" but does not specify explicit percentages or counts for training, validation, or test dataset splits. It also mentions "All environments are truncated at 1000 steps and have dense rewards bounded between [0, 1], except for ball in cup catch which has sparse binary rewards." but this is about environment setup, not data splitting. |
| Hardware Specification | No | All models were trained using the Hyak computing cluster at the University of Washington. Each model was trained on a single GPU, which was assigned by the cluster’s scheduling system." (No specific GPU model, CPU, or other hardware details are provided.) |
| Software Dependencies | No | The paper mentions using a "Docker container" and libraries like "Soft Actor Critic (SAC)" and "rliable library", but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In Table 3, we list all model hyperparameter choices the only difference from Zhang et al. (2022) is that we use a smaller replay buffer size (100, 000 instead of 1M) due to computational constraints." (Table 3 provides detailed hyperparameter values such as Batch Size 128, Discount γ 0.99, various learning rates, etc.) |