reproducibilityindex.ai

Offline Behavior Distillation

Authors: Shiye Lei, Sen Zhang, Dacheng Tao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple D4RL datasets reveal that Av-PBC offers significant improvements in OBD performance, fast distillation convergence speed, and robust cross-architecture/optimizer generalization.
Researcher Affiliation	Academia	Shiye Lei School of Computer Science The University of Sydney shiye.lei@sydney.edu.au Sen Zhang School of Computer Science The University of Sydney sen.zhang@sydney.edu.au Dacheng Tao College of Computing & Data Science Nanyang Technological University dacheng.tao@ntu.edu.sg
Pseudocode	Yes	Algorithm 1: Action-value weighted PBC
Open Source Code	Yes	The code is available at https://github.com/Leaves Lei/OBD.
Open Datasets	Yes	We conduct offline behavior distillation on D4RL [Fu et al., 2020], a widely used offline RL benchmark.
Dataset Splits	No	The paper uses D4RL datasets but does not explicitly state how these datasets are split into training, validation, and test sets for the authors' specific experimental setup, beyond using the full dataset for Cal-QL and synthesizing Dsyn for BC training.
Hardware Specification	Yes	OBD process is still computationally expensive (25 hours for 50k distillation steps on a single NVIDIA V100 GPU)
Software Dependencies	No	The paper mentions using Cal-QL and Standard SGD but does not provide specific version numbers for these or other software dependencies like programming languages or deep learning frameworks.
Experiment Setup	Yes	A four-layer MLP serves as the default architecture for policy networks. The size of synthetic data Nsyn is set to 256. Standard SGD is employed in both inner and outer optimization, and learning rates α0 = 0.1 and α1 = 0.1 for the inner and outer loop, respectively, and corresponding momentum rates β0 = 0 and β1 = 0.9.