Counterfactual Data Augmentation using Locally Factored Dynamics
Authors: Silviu Pitis, Elliot Creager, Animesh Garg
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments evaluate Co DA in the online, batch, and goal-conditioned settings, in each case finding that Co DA significantly improves agent performance as compared to non-Co DA baselines. Since Co DA only modifies an agent s training data, we expect these improvements to extend to other off-policy task settings in which the state space can be accurately disentangled. Below we outline our experimental design and results, deferring specific details and additional results to Appendix B. |
| Researcher Affiliation | Collaboration | Silviu Pitis, Elliot Creager, Animesh Garg Department of Computer Science, University of Toronto, Vector Institute {spitis, creager, garg}@cs.toronto.edu. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/partners). EC is a student researcher at Google Brain in Toronto. |
| Pseudocode | Yes | Algorithm 1 Mask-based Counterfactual Data Augmentation (Co DA) |
| Open Source Code | Yes | 1Code available at https://github.com/spitis/mrl |
| Open Datasets | Yes | We extend Spriteworld [89] to construct a bouncing ball environment... For this experiment we use a continuous control Pong environment based on Roboschool Pong [40]. ... obtains state-of-the-art results in Fetch Push-v1 [71]. The citations provide concrete access information to these established public datasets/environments. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., TD3, HER, transformer, MBPO) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For each task, we use Co DA to expand the replay buffer of a TD3 agent [22] by about 8 times. We train the same TD3 agent on the expanded datasets in batch mode for 500,000 optimization steps. For fair comparison, we use the same transformer used for Co DA masks for Dyna, which we pretrain using approximately 42,000 samples from a random policy. For this experiment, we specified a heuristic mask using domain knowledge ( objects are disentangled if more than 10cm apart ) that worked in both Fetch Push and Slide2 despite different dynamics. |