reproducibilityindex.ai

Reinforcement Learning with Logarithmic Regret and Policy Switches

Authors: Grigoris Velegkas, Zhuoran Yang, Amin Karbasi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This is theoretical work and does not have any negative societal implications.
Researcher Affiliation	Collaboration	Grigoris Velegkas Yale University grigoris.velegkas@yale.eduZhuoran Yang Yale University zhuoran.yang@yale.eduAmin Karbasi Yale University, Google Researchamin.karbasi@yale.edu
Pseudocode	Yes	Algorithm 1 Low Switching Cost Value Iteration (with parameters δ, K)Require: Failure probability δ 2 (0, 1), number of episodes K, and setting of operation 1: ek 1 2: b Z1h ;, 8h 2 [H] 3: for k 2 [K] do 4: for h = H, H 1, . . . , 1 do 5: if k 2 then 6: b Zkh Sample(F, b Zk 1h , δ) 7: end if 8: end for 9: if k = 1 or 9h 2 [H] : b Zkh 10: ek k 11: QkH+1( , ) 0, V kH+1( ) 0 12: for h = H, H 1, . . . , 1 do 13: T kh history of execution 14: Qkh( , ) Q-Estimator(T kh ( ) = maxa2A Qkh( , a) 16: kh( ) arg maxa2A Qkh( , a) 17: end for 18: end if 19: Receive initial state s1 of episode k 20: for h 2 [H] do 21: Take action akh) 22: end for 23: end for
Open Source Code	No	The paper states '[N/A]' for including code, data, and instructions needed to reproduce the main experimental results in the author checklist. No specific repository link or explicit code release statement is provided.
Open Datasets	No	This is a theoretical paper that focuses on algorithms and proofs for Reinforcement Learning within certain function approximation regimes (e.g., tabular setting, linear function approximation). It does not describe experiments using a specific, named, publicly available dataset that would require access information.
Dataset Splits	No	This paper is theoretical and does not involve empirical experiments with datasets that would require explicit training/validation/test splits.
Hardware Specification	No	The paper is theoretical and does not conduct experiments, therefore no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not describe empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not describe empirical experiments that would involve hyperparameter values, training configurations, or system-level settings.