Learning Domain Invariant Representations in Goal-conditioned Block MDPs
Authors: Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael Zhang, Jimmy Ba
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical evaluation shows that our goal-conditioned RL agent can perform well in various unseen test environments, improving by 50% over baselines. Empirically, our experiments for a sawyer arm robot simulation with visual observations and goals demonstrates that our proposed method achieves state-of-the-art performance compared to data augmentation and bisimulation baselines at generalizing to unseen test environments in goal-conditioned tasks (Section 4). |
| Researcher Affiliation | Academia | Beining Han IIIS, Tsinghua University bouldinghan@gmail.com Chongyi Zheng Carnegie Mellon University chongyiz@andrew.cmu.edu Harris Chan Keiran Paster Michael R. Zhang Jimmy Ba University of Toronto & Vector Institute {hchan, keirp, michael, jba}@cs.toronto.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository. |
| Open Datasets | Yes | We evaluate PA-SF and all baselines on a set of GBMDP tasks based on multiworld benchmark Pong et al. [2018], which is widely used to evaluate the performance of visual input goal-conditioned algorithms. We use the following four basic tasks Nair et al. [2018], Pong et al. [2020]: Reach, Door, Pickup and Push. The multiworld benchmark is cited as Pong et al. [2018] with a GitHub URL: 'multiworld. https://github.com/vitchyr/multiworld, 2018.' |
| Dataset Splits | No | The paper refers to 'training environments' (Etrain) and 'test environments' (Etest), and mentions collecting '15% of all data collected' for aligned data, but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper mentions 'sawyer arm robot simulation' and resources provided by funding bodies (Province of Ontario, Government of Canada through CIFAR, Vector Institute), but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments. |
| Software Dependencies | No | The paper mentions several software components like β-VAE, SAC, Skew-Fit, RAD, MISA, DBC, and multiworld, but it does not specify version numbers for these software components or any programming languages used for implementation. |
| Experiment Setup | No | The paper states, 'Please refer to Appendix E for a full description of our experiment setup and implementation details of the baselines and our algorithm.' However, the provided text does not contain specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or system-level training settings. |