On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
Authors: Elena Smirnova, Elvis Dohmatob
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical illustration We numerically confirm the implications of the smoothing technique on the convergence, provided by Proposition 4. We run experiments on a toy stochastic gridworld problem with the evaluation step error due to the sampling of state-transitions. We plot the performance loss over 30 runs with varying values of smoothing factor β. As can be seen from Figure 1, smaller values of β result in tighter confidence intervals, but slower convergence speed. Figure 1: Performance loss computed over 30 runs of the smooth AMPI (8) with sampling of environment transitions under varying smoothing degree β. |
| Researcher Affiliation | Collaboration | Elena Smirnova esmirnovae@gmail.com Elvis Dohmatob Criteo AI Lab e.dohmatob@criteo.com |
| Pseudocode | No | The paper describes various algorithmic schemes mathematically (e.g., (MPI), (AMPI), (smooth AMPI)), but does not provide any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement or link providing access to open-source code for the described methodology. |
| Open Datasets | No | The paper mentions running experiments on a 'toy stochastic gridworld problem', but does not provide any concrete access information (link, citation, or repository) for this or any other public dataset. |
| Dataset Splits | No | The paper mentions a 'toy stochastic gridworld problem' but does not provide specific details on training, validation, or test dataset splits. It only refers to 'sampling of environment transitions'. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud resources). |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | No | The paper mentions 'varying values of smoothing factor β' for its numerical illustration but does not provide concrete hyperparameter values or detailed system-level training settings in the main text. |