reproducibilityindex.ai

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Authors: Samuel Neumann, Sungsu Lim, Ajin George Joseph, Yangchen Pan, Adam White, Martha White

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization.
Researcher Affiliation	Academia	Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White Department of Computing Science University of Alberta Edmonton, Alberta, Canada {sfneuman,amw8,whitem}@ualberta.ca
Pseudocode	Yes	Algorithm 1 Percentile Empirical Distribution(N, ρ); Algorithm 2 Conditional CEM for the Actor; Algorithm 3 Greedy Actor-Critic
Open Source Code	Yes	Code available at https://github.com/samuelfneumann/Greedy AC.
Open Datasets	Yes	We use the classic versions of Mountain Car (Sutton & Barto, 2018), Pendulum (Degris et al., 2012a), and Acrobot (Sutton & Barto, 2018). ... To demonstrate the potential of Greedy AC at scale, we also include experiments on Freeway and Breakout from Min Atar (Young & Tian, 2019) as well as on Swimmer-v3 from Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper states, "We sweep hyperparameters for 40 runs, tuning over the first 10 runs and reporting results using the final 30 runs for the best hyperparameters," which indicates a tuning/validation process, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general setups like "All algorithms use neural networks."
Software Dependencies	No	The paper mentions that "All algorithms use the Adam optimizer (Kingma & Ba, 2014), experience replay, and target networks for the value functions." However, it does not provide specific version numbers for these software components or any other libraries/frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We sweep critic step size α = 10x for x {−5, −4, . . . , −1}. We set the actor step size to be κα and sweep κ ∈ {10−3, 10−2, 10−1, 1, 2, 10}. We sweep entropy scales τ = 10y for y ∈ {−3, −2, −1, 0, 1}. For the classic control experiments, we used fixed batch sizes of 32 samples and a replay buffer capacity of 100,000 samples. For the Min Atar experiments, we used fixed batch sizes of 32 samples and a buffer capacity of 1 million. For the Swimmer experiments, we used fixed batch sizes of 100 samples and a buffer capacity of 1 million. For CCEM, we fixed ρ = 0.1 and sample N = 30 actions.