Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
Authors: Samuel Neumann, Sungsu Lim, Ajin George Joseph, Yangchen Pan, Adam White, Martha White
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization. |
| Researcher Affiliation | Academia | Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White Department of Computing Science University of Alberta Edmonton, Alberta, Canada {sfneuman,amw8,whitem}@ualberta.ca |
| Pseudocode | Yes | Algorithm 1 Percentile Empirical Distribution(N, ρ); Algorithm 2 Conditional CEM for the Actor; Algorithm 3 Greedy Actor-Critic |
| Open Source Code | Yes | Code available at https://github.com/samuelfneumann/Greedy AC. |
| Open Datasets | Yes | We use the classic versions of Mountain Car (Sutton & Barto, 2018), Pendulum (Degris et al., 2012a), and Acrobot (Sutton & Barto, 2018). ... To demonstrate the potential of Greedy AC at scale, we also include experiments on Freeway and Breakout from Min Atar (Young & Tian, 2019) as well as on Swimmer-v3 from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper states, "We sweep hyperparameters for 40 runs, tuning over the first 10 runs and reporting results using the final 30 runs for the best hyperparameters," which indicates a tuning/validation process, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general setups like "All algorithms use neural networks." |
| Software Dependencies | No | The paper mentions that "All algorithms use the Adam optimizer (Kingma & Ba, 2014), experience replay, and target networks for the value functions." However, it does not provide specific version numbers for these software components or any other libraries/frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We sweep critic step size α = 10x for x {−5, −4, . . . , −1}. We set the actor step size to be κα and sweep κ ∈ {10−3, 10−2, 10−1, 1, 2, 10}. We sweep entropy scales τ = 10y for y ∈ {−3, −2, −1, 0, 1}. For the classic control experiments, we used fixed batch sizes of 32 samples and a replay buffer capacity of 100,000 samples. For the Min Atar experiments, we used fixed batch sizes of 32 samples and a buffer capacity of 1 million. For the Swimmer experiments, we used fixed batch sizes of 100 samples and a buffer capacity of 1 million. For CCEM, we fixed ρ = 0.1 and sample N = 30 actions. |