reproducibilityindex.ai

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Authors: Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate and quantify the beneﬁt of the action-dependent baseline through both theoretical analysis as well as numerical results, including an analysis of the suboptimality of the optimal state-dependent baseline. Our experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and highdimensional hand manipulation and synthetic tasks.
Researcher Affiliation	Collaboration	1 Department of EECS, UC Berkeley 2 Department of CSE, University of Washington 3 Open AI 4 Institute for Transportation Studies, UC Berkeley
Pseudocode	Yes	Algorithm 1 Policy gradient for factorized policies using action-dependent baselines
Open Source Code	No	Videos and additional results of the paper are available at https://sites.google.com/view/ad-baselines. This link points to a project page with videos and additional results, but does not explicitly provide access to the source code for the methodology.
Open Datasets	No	The paper uses simulated environments like Mu Jo Co and synthetic tasks, which generate data dynamically through interaction rather than using pre-existing, publicly available datasets for which concrete access information could be provided. Therefore, the concept of a 'publicly available dataset' as per the question's criteria doesn't directly apply here.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages or counts for training, validation, or testing sets). The experiments are conducted in simulated environments where data is generated through interaction.
Hardware Specification	No	The paper mentions using the 'Mu Jo Co 1.5 simulator' but does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'Mu Jo Co 1.5 simulator' but does not list other key software components (e.g., programming languages, deep learning frameworks, or libraries) with their specific version numbers required for reproducibility.
Experiment Setup	Yes	Parameters: Unless otherwise stated, the following parameters are used in the experiments in this work: γ = 0.995, λGAE = 0.97, kldesired = 0.025. Policies: The policies used are 2-layer fully connected networks with hidden sizes=(32, 32). Initialization: the policy is initialized with Xavier initialization except ﬁnal layer weights are scaled down (by a factor of 100x). Table 2 and Table 3 provide further details on per-experiment configurations, including trajectories, horizon, RBF features, and action dimensionality.