reproducibilityindex.ai

Off-Policy Imitation Learning from Observations

Authors: Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results on challenging locomotion tasks indicate that our approach is comparable with state-of-the-art in terms of both sample-efﬁciency and asymptotic performance. (Abstract) and 5 Experiments We compare OPOLO against state-of-the-art Lf D and Lf O approaches on Mu Ju Co benchmarks, which are locomotion tasks in continuous state-action space.
Researcher Affiliation	Collaboration	Zhuangdi Zhu Michigan State University zhuzhuan@msu.edu Kaixiang Lin Michigan State University linkaixi@msu.edu Bo Dai Google Research bodai@google.com Jiayu Zhou Michigan State University jiayuz@msu.edu
Pseudocode	Yes	Algorithm 1 Off-POlicy Learning from Observations (OPOLO)
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We compare OPOLO against state-of-the-art Lf D and Lf O approaches on Mu Ju Co benchmarks, which are locomotion tasks in continuous state-action space.
Dataset Splits	No	The paper states that experiments are conducted on 'Mu Ju Co benchmarks' and 'For each task, we collect 4 trajectories from a pre-trained expert policy', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper mentions general aspects of the experimental setup such as collecting 4 expert trajectories, removing original rewards, and evaluating results across 5 random seeds. Algorithm 1 also includes `learning rate α`. However, it does not provide specific numerical values for hyperparameters (e.g., learning rate value, batch size, number of epochs) in the main text, stating that 'More experimental details can be found in the supplementary material'.