Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Authors: Sharan Vaswani, Amirreza Kazemi, Reza Babanezhad Harikandeh, Nicolas Le Roux

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
Researcher Affiliation Collaboration Sharan Vaswani Simon Fraser University vaswani.sharan@gmail.com Amirreza Kazemi Simon Fraser University aka208@sfu.ca Reza Babanezhad Samsung SAIT AI Lab, Montreal babanezhad@gmail.com Nicolas Le Roux Microsoft Research, Mila nicolas@le-roux.name
Pseudocode Yes Algorithm 1: Generic actor-critic algorithm
Open Source Code Yes Code to reproduce the experiments is available at https://github.com/amirrezakazemi/ACPG
Open Datasets Yes We consider two grid-world environments, namely Cliff World [53] and Frozen Lake [6]
Dataset Splits No The paper describes using Monte-Carlo rollouts and training settings but does not specify explicit train/validation/test dataset splits with percentages or counts, which is typical for supervised learning but less so for RL environments where continuous interaction is common.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper mentions using the "gym framework [6]" but does not specify version numbers for Python, PyTorch, or any other critical software libraries or dependencies. Table 1 and 2 list parameter ranges but not software versions.
Experiment Setup Yes Table 1: Parameters for the Cliff World environment; Table 2: Parameters for the Frozen Lake environment. These tables include specific values/ranges for parameters like '# of samples', 'length of episode', 'mc', 'ma', 'Armijo max step-size', 'η', and 'c'.