Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning
Authors: Sumedh A Sontakke, Arash Mehrjou, Laurent Itti, Bernhard Schölkopf
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work has 2 main thrusts the discovered experimental behaviors and the representations obtained from the outcome of the behaviors in environments. We visualize these learnt behaviors and verify that they are indeed semantically meaningful and interpretable. We quantify the utility of the learned behaviors by using the behaviors as pre-training for a downstream task. In our experimental setup, we verify that these behaviors are indeed invariant to all other causal factors except one. We visualize the representations obtained using these behaviors and verify that they are indeed the binary quantized representations for each of the ground truth causal factors that we manipulated in our experiments. Finally, we verify that the knowledge of the representation does indeed aid transfer learning and zero-shot generalizability in downstream tasks. |
| Researcher Affiliation | Academia | Sumedh A Sontakke 1 Arash Mehrjou 2 Laurent Itti 1 Bernhard Schölkopf 2 1University of Southern California 2Max Planck Institute for Intelligent Systems. Correspondence to: Sumedh A Sontakke <ssontakk@usc.edu>. |
| Pseudocode | Yes | Algorithm 1 Recursive Training Scheme |
| Open Source Code | No | The paper provides a general website link in the abstract ("Visit here for website") and for videos ("See videos of discovered behaviors here (website under construction)"), but no explicit statement or direct link to the source code for the methodology described in the paper. |
| Open Datasets | No | The paper mentions using "Causal World Simulation" and "Mujoco Control Suite" which are simulation environments for generating data, not pre-existing datasets for which specific access information like a DOI or repository link would be provided. |
| Dataset Splits | No | The paper discusses training on different sets of environments and testing on unseen environments, but it does not specify explicit train/validation/test data splits (e.g., percentages or sample counts) for a fixed dataset, as the data is generated dynamically through agent interaction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions "Pybullet Physics engine" and cites "Stable baselines" but does not provide specific version numbers for these or other software dependencies crucial for reproducibility. |
| Experiment Setup | No | The paper describes general experimental setups such as varying causal factors (Mass, Size Mass, Shape Size Mass) and policy types ("PPO-optimized Actor-Critic Policy"), but specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations are explicitly deferred to supplementary material (e.g., "See Supplementary Material A for implementation details."). |