MoCoDA: Model-based Counterfactual Data Augmentation
Authors: Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that MOCODA enables RL agents to learn policies that generalize to unseen states and actions. We use MOCODA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail. |
| Researcher Affiliation | Collaboration | Silviu Pitis 1 Elliot Creager1 Ajay Mandlekar2 Animesh Garg1,2 1University of Toronto and Vector Institute, 2NVIDIA |
| Pseudocode | No | The paper describes the MOCODA framework and its steps (Figure 3), but it does not include a formally labeled "Algorithm" or "Pseudocode" block with structured steps. |
| Open Source Code | Yes | 1Visualizations & code available at https://sites.google.com/view/mocoda-neurips-22/ |
| Open Datasets | No | The paper describes the empirical data used (e.g., 'empirical training data consisting of left-to-right and bottom-to-top trajectories' for 2D Navigation, and 'dataset of 50000 transitions' for Hook Sweep2), but does not provide a specific link, DOI, or formal citation for public access to these datasets themselves. |
| Dataset Splits | Yes | The models are each trained on a empirical dataset of 35000 transitions for up to 600 epochs, which is early stopped using a validation set of 5000 transitions. |
| Hardware Specification | Yes | All our experiments were run on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer [26]' but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, scikit-learn) used in the implementation. |
| Experiment Setup | Yes | We train all models for 600 epochs using the Adam optimizer [26] with a learning rate of 1e-4 and a batch size of 256. |