Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

Authors: Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to answer six main questions: (1) can CDS prevent performance degradation when sharing data as observed in Section 4?, (2) how does CDS compare to vanilla multi-task offline RL methods and prior data sharing methods? (3) can CDS handle sparse reward settings, where data sharing is particularly important due to scarce supervision signal? (4) can CDS handle goal-conditioned offline RL settings where the offline dataset is undirected and highly suboptimal? (5) Can CDS scale to complex visual observations? (6) Can CDS be combined with any offline RL algorithms? Besides these questions, we visualize CDS weights for better interpretation of the data sharing scheme learned by CDS in Figure 4 in Appendix D.2. ... Table 3: Results for multi-task locomotion (walker2d), robotic manipulation (Meta-World) and navigation environments (Ant Maze) with low-dimensional state inputs.
Researcher Affiliation Collaboration Tianhe Yu ,1,2, Aviral Kumar ,2,3, Yevgen Chebotar2, Karol Hausman1,2, Sergey Levine2,3, Chelsea Finn1,2 1Stanford University, 2Google Research, 3UC Berkeley ( Equal Contribution) tianheyu@cs.stanford.edu, aviralk@berkeley.edu
Pseudocode Yes We summarize the pesudocode of CDS in Algorithm 1 in Appendix A and include the practical implementation details of CDS in Appendix C.
Open Source Code No The paper does not provide a direct link to a code repository or an explicit statement about the public release of its source code.
Open Datasets Yes To assess the efficacy of data sharing, we experimentally analyze various multi-task RL scenarios created with the walker2d environment in Gym [5]. ... To answer question (4), we consider maze navigation tasks where the temporal stitching ability of an offline RL algorithm is crucial to obtain good performance. We create goal reaching tasks using the ant robot in the medium and hard mazes from D4RL [19].
Dataset Splits No The paper describes different types of offline datasets (medium, expert, medium-replay) and their collection, but it does not specify explicit train/validation/test splits with percentages or sample counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions software components like 'Gym', 'CQL', 'BRAC', and 'SAC' but does not specify their version numbers or other software dependencies required for reproducibility.
Experiment Setup No The paper states, 'We discuss evaluations of CDS with CQL in the main text and include the results of CDS with BRAC in Table 5 in Appendix D.1. For more details on setup and hyperparameters, see Appendix C.' While hyperparameters are mentioned as being available in Appendix C, these details are not provided within the main text of the paper.