Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite. |
| Researcher Affiliation | Collaboration | Rishabh Agarwal Marlos C. Machado Pablo Samuel Castro Marc G. Bellemare Google Research, Brain Team {rishabhagarwal, marlosm, psc, bellemare}@google.com Also at Mila, Université de Montréal. Now at Deep Mind. |
| Pseudocode | Yes | Algorithm 1 Contrastive Metric Embeddings (CMEs) and J. PSEUDO CODE, including functions like def metric_fixed_point and def contrastive_loss. |
| Open Source Code | No | We use the open-source code released by Sonar et al. (2020) for our experiments. |
| Open Datasets | Yes | Jumping task from pixels (Tachet des Combes et al., 2018), LQR with spurious correlations (Song et al., 2019), and Distracting DM Control Suite (Stone et al., 2021; Zhang et al., 2018b). |
| Dataset Splits | Yes | We split the problem into 18 seen (training) and 268 unseen (test) tasks... For hyperparameter selection, we evaluate all agents on a validation set containing 54 unseen tasks in the wide grid (Figure 2a) and pick the parameters with the best validation performance. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | All agents are built on top of SAC (Haarnoja et al., 2018) combined with Dr Q (Kostrikov et al., 2020)... |
| Experiment Setup | Yes | Table G.2: Common hyperparameters across all methods for all jumping task experiments. Table G.3: Optimal hyperparameters for reporting results in Table 1. Table G.4: Optimal hyperparameters for reporting results in Figure 5.3. Table G.5: Optimal hyperparameters for reporting ablation results in Table 2. |