Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Provable Log Density Policy Gradient

Authors: Pulkit Katdare, Anant A Joshi, Katherine Rose Driggs-Campbell

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also demonstrate a proof-of-concept for our log density gradient method on gridworld environment, and observe that our method is able to improve upon the classical policy gradient method by a clear margin, thus indicating a promising novel direction to develop reinforcement learning algorithms that require fewer samples. For the average reward scenario, performance of classical policy gradient (blue) algorithm as compared to log density gradient (green) algorithm over a n n gridworld environment, for n = 5, 10. We observe that log density gradient algorithm consistently converges to better policy performance. Theoretical calculated solutions are used for implementation. ... We present a proof of concept for our log density gradient estimation on two sets of environments 5 5 and 3 3 gridworld environment (Towers et al., 2023). Our results for 5 5 are in Figure 2 and for 3 3 in Figure 3.
Researcher Affiliation Academia Pulkit Katdare University of Illinois at Urbana-Champaign Anant A. Joshi University of Illinois at Urbana-Champaign Katherine Driggs-Campbell University of Illinois at Urbana-Champaign
Pseudocode Yes Algorithm 1 Projected Log Density Gradient Algorithm 2 Linear Log Density Gradient
Open Source Code No The paper does not provide an explicit statement of code release, a link to a repository for the described methodology, or mention code in supplementary materials.
Open Datasets No The paper states: "gridworld environment (Towers et al., 2023)". While 'gridworld' refers to a common type of reinforcement learning environment, the citation points to 'Gymnasium', which is a toolkit for creating and running RL environments, not a specific pre-collected, publicly available dataset of interactions or states. The paper uses the environment for experiments, but does not provide access information for a public dataset.
Dataset Splits No The paper uses a 'gridworld environment' for experiments, which is an online reinforcement learning setup where data is generated through interaction. It evaluates performance over 'Episodes' (Figures 1, 2, 3) but does not involve explicit training/test/validation dataset splits as would be typical for supervised learning tasks with pre-collected datasets.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper mentions using "linear function approximation (Algorithm 2)" for the gridworld experiments and defines features "ϕ : S A R|S| |A|". It also mentions a learning rate 'εt' in Algorithm 1, whose details are referred to Appendix 9.9, which is not provided in the main text. However, specific hyperparameters like the value of 'εt', batch size, number of training episodes, or other concrete training configurations are not explicitly detailed in the main body of the paper.