Reinforcement Learning with Logarithmic Regret and Policy Switches

Authors: Grigoris Velegkas, Zhuoran Yang, Amin Karbasi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This is theoretical work and does not have any negative societal implications.
Researcher Affiliation Collaboration Grigoris Velegkas Yale University grigoris.velegkas@yale.eduZhuoran Yang Yale University zhuoran.yang@yale.eduAmin Karbasi Yale University, Google Researchamin.karbasi@yale.edu
Pseudocode Yes Algorithm 1 Low Switching Cost Value Iteration (with parameters δ, K)Require: Failure probability δ 2 (0, 1), number of episodes K, and setting of operation 1: ek 1 2: b Z1h ;, 8h 2 [H] 3: for k 2 [K] do 4: for h = H, H 1, . . . , 1 do 5: if k 2 then 6: b Zkh Sample(F, b Zk 1h , δ) 7: end if 8: end for 9: if k = 1 or 9h 2 [H] : b Zkh 10: ek k 11: QkH+1( , ) 0, V kH+1( ) 0 12: for h = H, H 1, . . . , 1 do 13: T kh history of execution 14: Qkh( , ) Q-Estimator(T kh ( ) = maxa2A Qkh( , a) 16: kh( ) arg maxa2A Qkh( , a) 17: end for 18: end if 19: Receive initial state s1 of episode k 20: for h 2 [H] do 21: Take action akh) 22: end for 23: end for
Open Source Code No The paper states '[N/A]' for including code, data, and instructions needed to reproduce the main experimental results in the author checklist. No specific repository link or explicit code release statement is provided.
Open Datasets No This is a theoretical paper that focuses on algorithms and proofs for Reinforcement Learning within certain function approximation regimes (e.g., tabular setting, linear function approximation). It does not describe experiments using a specific, named, publicly available dataset that would require access information.
Dataset Splits No This paper is theoretical and does not involve empirical experiments with datasets that would require explicit training/validation/test splits.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup No This paper is theoretical and does not describe empirical experiments that would involve hyperparameter values, training configurations, or system-level settings.