Offline Behavior Distillation
Authors: Shiye Lei, Sen Zhang, Dacheng Tao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple D4RL datasets reveal that Av-PBC offers significant improvements in OBD performance, fast distillation convergence speed, and robust cross-architecture/optimizer generalization. |
| Researcher Affiliation | Academia | Shiye Lei School of Computer Science The University of Sydney shiye.lei@sydney.edu.au Sen Zhang School of Computer Science The University of Sydney sen.zhang@sydney.edu.au Dacheng Tao College of Computing & Data Science Nanyang Technological University dacheng.tao@ntu.edu.sg |
| Pseudocode | Yes | Algorithm 1: Action-value weighted PBC |
| Open Source Code | Yes | The code is available at https://github.com/Leaves Lei/OBD. |
| Open Datasets | Yes | We conduct offline behavior distillation on D4RL [Fu et al., 2020], a widely used offline RL benchmark. |
| Dataset Splits | No | The paper uses D4RL datasets but does not explicitly state how these datasets are split into training, validation, and test sets for the authors' specific experimental setup, beyond using the full dataset for Cal-QL and synthesizing Dsyn for BC training. |
| Hardware Specification | Yes | OBD process is still computationally expensive (25 hours for 50k distillation steps on a single NVIDIA V100 GPU) |
| Software Dependencies | No | The paper mentions using Cal-QL and Standard SGD but does not provide specific version numbers for these or other software dependencies like programming languages or deep learning frameworks. |
| Experiment Setup | Yes | A four-layer MLP serves as the default architecture for policy networks. The size of synthetic data Nsyn is set to 256. Standard SGD is employed in both inner and outer optimization, and learning rates α0 = 0.1 and α1 = 0.1 for the inner and outer loop, respectively, and corresponding momentum rates β0 = 0 and β1 = 0.9. |