Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Authors: Sharan Vaswani, Amirreza Kazemi, Reza Babanezhad Harikandeh, Nicolas Le Roux
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems. |
| Researcher Affiliation | Collaboration | Sharan Vaswani Simon Fraser University vaswani.sharan@gmail.com Amirreza Kazemi Simon Fraser University aka208@sfu.ca Reza Babanezhad Samsung SAIT AI Lab, Montreal babanezhad@gmail.com Nicolas Le Roux Microsoft Research, Mila nicolas@le-roux.name |
| Pseudocode | Yes | Algorithm 1: Generic actor-critic algorithm |
| Open Source Code | Yes | Code to reproduce the experiments is available at https://github.com/amirrezakazemi/ACPG |
| Open Datasets | Yes | We consider two grid-world environments, namely Cliff World [53] and Frozen Lake [6] |
| Dataset Splits | No | The paper describes using Monte-Carlo rollouts and training settings but does not specify explicit train/validation/test dataset splits with percentages or counts, which is typical for supervised learning but less so for RL environments where continuous interaction is common. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "gym framework [6]" but does not specify version numbers for Python, PyTorch, or any other critical software libraries or dependencies. Table 1 and 2 list parameter ranges but not software versions. |
| Experiment Setup | Yes | Table 1: Parameters for the Cliff World environment; Table 2: Parameters for the Frozen Lake environment. These tables include specific values/ranges for parameters like '# of samples', 'length of episode', 'mc', 'ma', 'Armijo max step-size', 'η', and 'c'. |