Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Authors: Samuel Neumann, Sungsu Lim, Ajin George Joseph, Yangchen Pan, Adam White, Martha White

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization.
Researcher Affiliation Academia Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White Department of Computing Science University of Alberta Edmonton, Alberta, Canada {sfneuman,amw8,whitem}@ualberta.ca
Pseudocode Yes Algorithm 1 Percentile Empirical Distribution(N, ρ); Algorithm 2 Conditional CEM for the Actor; Algorithm 3 Greedy Actor-Critic
Open Source Code Yes Code available at https://github.com/samuelfneumann/Greedy AC.
Open Datasets Yes We use the classic versions of Mountain Car (Sutton & Barto, 2018), Pendulum (Degris et al., 2012a), and Acrobot (Sutton & Barto, 2018). ... To demonstrate the potential of Greedy AC at scale, we also include experiments on Freeway and Breakout from Min Atar (Young & Tian, 2019) as well as on Swimmer-v3 from Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper states, "We sweep hyperparameters for 40 runs, tuning over the first 10 runs and reporting results using the final 30 runs for the best hyperparameters," which indicates a tuning/validation process, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general setups like "All algorithms use neural networks."
Software Dependencies No The paper mentions that "All algorithms use the Adam optimizer (Kingma & Ba, 2014), experience replay, and target networks for the value functions." However, it does not provide specific version numbers for these software components or any other libraries/frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We sweep critic step size α = 10x for x {−5, −4, . . . , −1}. We set the actor step size to be κα and sweep κ ∈ {10−3, 10−2, 10−1, 1, 2, 10}. We sweep entropy scales τ = 10y for y ∈ {−3, −2, −1, 0, 1}. For the classic control experiments, we used fixed batch sizes of 32 samples and a replay buffer capacity of 100,000 samples. For the Min Atar experiments, we used fixed batch sizes of 32 samples and a buffer capacity of 1 million. For the Swimmer experiments, we used fixed batch sizes of 100 samples and a buffer capacity of 1 million. For CCEM, we fixed ρ = 0.1 and sample N = 30 actions.