Policy Optimization as Wasserstein Gradient Flows

Authors: Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the proposed WGF framework from two perspectives: i) the effectiveness of the proposed particleapproximation method for WGF, and ii) the advantages of the WGF framework for policy optimization. For i), a standard regression model to learn optimal parameter distributions, i.e., posterior distributions. For ii), we test our algorithms on several domains in Open AI rllab and Gym (Duan et al., 2016). All experiments are conducted on a single Tesla P100.
Researcher Affiliation Academia 1Duke University 2SUNY at Buffalo.
Pseudocode Yes The full algorithm is given in Section G of the SM.
Open Source Code No The paper does not provide any statement about releasing its own source code, nor does it include a link to a code repository for the methodology described.
Open Datasets Yes For ii), we test our algorithms on several domains in Open AI rllab and Gym (Duan et al., 2016). Table 1. Averaged predictions, with standard deviations, in terms of test log-likelihood. [Lists datasets: Boston, Concrete, Energy, Kin8nm, Naval, CCPP, Winequality, Yacht, Protein, Year Predict]
Dataset Splits No The paper refers to 'test log-likelihood' and 'learning curves' but does not explicitly provide specific train/validation/test dataset splits or their percentages in the main text.
Hardware Specification Yes All experiments are conducted on a single Tesla P100.
Software Dependencies No The paper mentions software like 'OpenAI rllab and Gym' and 'Mu Jo Co' and 'RMSprop optimizer', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes the policy is parameterized as a two-layer (25-16 hidden units) neural network with tanh activation function. The maximum horizon length is set to 500. A sample size of 5000 is used for policy gradient estimation. We use M = 16 particles to approximate parameter distributions, and h = 0.1 as the discretized stepsize.