reproducibilityindex.ai

Counterfactual Data Augmentation using Locally Factored Dynamics

Authors: Silviu Pitis, Elliot Creager, Animesh Garg

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments evaluate Co DA in the online, batch, and goal-conditioned settings, in each case ﬁnding that Co DA signiﬁcantly improves agent performance as compared to non-Co DA baselines. Since Co DA only modiﬁes an agent s training data, we expect these improvements to extend to other off-policy task settings in which the state space can be accurately disentangled. Below we outline our experimental design and results, deferring speciﬁc details and additional results to Appendix B.
Researcher Affiliation	Collaboration	Silviu Pitis, Elliot Creager, Animesh Garg Department of Computer Science, University of Toronto, Vector Institute {spitis, creager, garg}@cs.toronto.edu. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/partners). EC is a student researcher at Google Brain in Toronto.
Pseudocode	Yes	Algorithm 1 Mask-based Counterfactual Data Augmentation (Co DA)
Open Source Code	Yes	1Code available at https://github.com/spitis/mrl
Open Datasets	Yes	We extend Spriteworld [89] to construct a bouncing ball environment... For this experiment we use a continuous control Pong environment based on Roboschool Pong [40]. ... obtains state-of-the-art results in Fetch Push-v1 [71]. The citations provide concrete access information to these established public datasets/environments.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., TD3, HER, transformer, MBPO) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For each task, we use Co DA to expand the replay buffer of a TD3 agent [22] by about 8 times. We train the same TD3 agent on the expanded datasets in batch mode for 500,000 optimization steps. For fair comparison, we use the same transformer used for Co DA masks for Dyna, which we pretrain using approximately 42,000 samples from a random policy. For this experiment, we speciﬁed a heuristic mask using domain knowledge ( objects are disentangled if more than 10cm apart ) that worked in both Fetch Push and Slide2 despite different dynamics.