On the Convergence of Smooth Regularized Approximate Value Iteration Schemes

Authors: Elena Smirnova, Elvis Dohmatob

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical illustration We numerically confirm the implications of the smoothing technique on the convergence, provided by Proposition 4. We run experiments on a toy stochastic gridworld problem with the evaluation step error due to the sampling of state-transitions. We plot the performance loss over 30 runs with varying values of smoothing factor β. As can be seen from Figure 1, smaller values of β result in tighter confidence intervals, but slower convergence speed. Figure 1: Performance loss computed over 30 runs of the smooth AMPI (8) with sampling of environment transitions under varying smoothing degree β.
Researcher Affiliation Collaboration Elena Smirnova esmirnovae@gmail.com Elvis Dohmatob Criteo AI Lab e.dohmatob@criteo.com
Pseudocode No The paper describes various algorithmic schemes mathematically (e.g., (MPI), (AMPI), (smooth AMPI)), but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any statement or link providing access to open-source code for the described methodology.
Open Datasets No The paper mentions running experiments on a 'toy stochastic gridworld problem', but does not provide any concrete access information (link, citation, or repository) for this or any other public dataset.
Dataset Splits No The paper mentions a 'toy stochastic gridworld problem' but does not provide specific details on training, validation, or test dataset splits. It only refers to 'sampling of environment transitions'.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud resources).
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup No The paper mentions 'varying values of smoothing factor β' for its numerical illustration but does not provide concrete hyperparameter values or detailed system-level training settings in the main text.