reproducibilityindex.ai

Operator Splitting Value Iteration

Authors: Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate both OS-VI and OS-Dyna in a finite MDP and compare them with existing methods. Here we present the results for the Control problem on a modified cliffwalk environment in a 6x6 grid with 4 actions (UP, DOWN, LEFT, RIGHT). The left plot in Figure 2 shows the convergence of OS-VI compared to VI and the solutions the model itself would lead to. The plot shows normalized error of Vk V * w.r.t V * . (Right) Comparison of OS-Dyna with Dyna and Q-Learning in the RL setting.
Researcher Affiliation	Collaboration	Amin Rakhsha1,2 Andrew Wang1,2 Mohammad Ghavamzadeh3 Amir-massoud Farahmand2,1 1Department of Computer Science, University of Toronto 2Vector Institute 3Google Research
Pseudocode	Yes	Algorithm 1 OS-Dyna
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The details are in the supplementary material.
Open Datasets	No	The paper mentions a 'modified cliffwalk environment in a 6x6 grid' but does not provide a link or citation to a public dataset, nor does it explicitly state its public availability.
Dataset Splits	No	The paper describes an RL setup where 'algorithms are given a sample (Xt, At, Rt, X't)' but does not specify traditional training, validation, or test dataset splits.
Hardware Specification	No	The experiments are simple and can be run on a personal computer.
Software Dependencies	No	The paper does not provide specific software names with version numbers.
Experiment Setup	Yes	The learning rates are constant α for iterations t ≥ N and then decay in the form of αt = α/(t − N) afterwards. We have fine-tuned the learning rate schedule for each algorithm separately for the best results.