Counterfactual Data Augmentation using Locally Factored Dynamics

Authors: Silviu Pitis, Elliot Creager, Animesh Garg

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments evaluate Co DA in the online, batch, and goal-conditioned settings, in each case finding that Co DA significantly improves agent performance as compared to non-Co DA baselines. Since Co DA only modifies an agent s training data, we expect these improvements to extend to other off-policy task settings in which the state space can be accurately disentangled. Below we outline our experimental design and results, deferring specific details and additional results to Appendix B.
Researcher Affiliation Collaboration Silviu Pitis, Elliot Creager, Animesh Garg Department of Computer Science, University of Toronto, Vector Institute {spitis, creager, garg}@cs.toronto.edu. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/partners). EC is a student researcher at Google Brain in Toronto.
Pseudocode Yes Algorithm 1 Mask-based Counterfactual Data Augmentation (Co DA)
Open Source Code Yes 1Code available at https://github.com/spitis/mrl
Open Datasets Yes We extend Spriteworld [89] to construct a bouncing ball environment... For this experiment we use a continuous control Pong environment based on Roboschool Pong [40]. ... obtains state-of-the-art results in Fetch Push-v1 [71]. The citations provide concrete access information to these established public datasets/environments.
Dataset Splits No The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions various models and frameworks (e.g., TD3, HER, transformer, MBPO) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes For each task, we use Co DA to expand the replay buffer of a TD3 agent [22] by about 8 times. We train the same TD3 agent on the expanded datasets in batch mode for 500,000 optimization steps. For fair comparison, we use the same transformer used for Co DA masks for Dyna, which we pretrain using approximately 42,000 samples from a random policy. For this experiment, we specified a heuristic mask using domain knowledge ( objects are disentangled if more than 10cm apart ) that worked in both Fetch Push and Slide2 despite different dynamics.