reproducibilityindex.ai

Policy Optimization as Wasserstein Gradient Flows

Authors: Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the proposed WGF framework from two perspectives: i) the effectiveness of the proposed particleapproximation method for WGF, and ii) the advantages of the WGF framework for policy optimization. For i), a standard regression model to learn optimal parameter distributions, i.e., posterior distributions. For ii), we test our algorithms on several domains in Open AI rllab and Gym (Duan et al., 2016). All experiments are conducted on a single Tesla P100.
Researcher Affiliation	Academia	1Duke University 2SUNY at Buffalo.
Pseudocode	Yes	The full algorithm is given in Section G of the SM.
Open Source Code	No	The paper does not provide any statement about releasing its own source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	For ii), we test our algorithms on several domains in Open AI rllab and Gym (Duan et al., 2016). Table 1. Averaged predictions, with standard deviations, in terms of test log-likelihood. [Lists datasets: Boston, Concrete, Energy, Kin8nm, Naval, CCPP, Winequality, Yacht, Protein, Year Predict]
Dataset Splits	No	The paper refers to 'test log-likelihood' and 'learning curves' but does not explicitly provide specific train/validation/test dataset splits or their percentages in the main text.
Hardware Specification	Yes	All experiments are conducted on a single Tesla P100.
Software Dependencies	No	The paper mentions software like 'OpenAI rllab and Gym' and 'Mu Jo Co' and 'RMSprop optimizer', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	the policy is parameterized as a two-layer (25-16 hidden units) neural network with tanh activation function. The maximum horizon length is set to 500. A sample size of 5000 is used for policy gradient estimation. We use M = 16 particles to approximate parameter distributions, and h = 0.1 as the discretized stepsize.