Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
Authors: Gergely Neu, Nneka Okolo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Besides providing a set of general results, we also apply our algorithm to a specific problem in reinforcement learning, where it leads to performance guarantees for finding near-optimal policies in an average-reward MDP without prior knowledge of the bias span. |
| Researcher Affiliation | Academia | 1Universitat Pompeu Fabra, Barcelona, Spain. Correspondence to: Gergely Neu <gergely.neu@gmail.com>, Nneka Okolo <nnekamaureen.okolo@upf.edu>. |
| Pseudocode | Yes | Algorithm 1 COMIDA-MDP |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology. |
| Open Datasets | No | The paper discusses applying the method to Average-Reward Markov Decision Processes (AMDPs) and mentions using 'a simulator (or generative model) of the transition function P', but it does not specify or provide access information for a public dataset for training. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for validation, as it is a theoretical paper. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers. |
| Experiment Setup | No | While the paper defines parameters for its theoretical algorithms and bounds (e.g., ϱx, ϱy, ηx, ηy), it does not describe a concrete experimental setup with hyperparameters or system-level training settings for an empirical evaluation. |