Reinforcement Learning with Logarithmic Regret and Policy Switches
Authors: Grigoris Velegkas, Zhuoran Yang, Amin Karbasi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This is theoretical work and does not have any negative societal implications. |
| Researcher Affiliation | Collaboration | Grigoris Velegkas Yale University grigoris.velegkas@yale.eduZhuoran Yang Yale University zhuoran.yang@yale.eduAmin Karbasi Yale University, Google Researchamin.karbasi@yale.edu |
| Pseudocode | Yes | Algorithm 1 Low Switching Cost Value Iteration (with parameters δ, K)Require: Failure probability δ 2 (0, 1), number of episodes K, and setting of operation 1: ek 1 2: b Z1h ;, 8h 2 [H] 3: for k 2 [K] do 4: for h = H, H 1, . . . , 1 do 5: if k 2 then 6: b Zkh Sample(F, b Zk 1h , δ) 7: end if 8: end for 9: if k = 1 or 9h 2 [H] : b Zkh 10: ek k 11: QkH+1( , ) 0, V kH+1( ) 0 12: for h = H, H 1, . . . , 1 do 13: T kh history of execution 14: Qkh( , ) Q-Estimator(T kh ( ) = maxa2A Qkh( , a) 16: kh( ) arg maxa2A Qkh( , a) 17: end for 18: end if 19: Receive initial state s1 of episode k 20: for h 2 [H] do 21: Take action akh) 22: end for 23: end for |
| Open Source Code | No | The paper states '[N/A]' for including code, data, and instructions needed to reproduce the main experimental results in the author checklist. No specific repository link or explicit code release statement is provided. |
| Open Datasets | No | This is a theoretical paper that focuses on algorithms and proofs for Reinforcement Learning within certain function approximation regimes (e.g., tabular setting, linear function approximation). It does not describe experiments using a specific, named, publicly available dataset that would require access information. |
| Dataset Splits | No | This paper is theoretical and does not involve empirical experiments with datasets that would require explicit training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | This paper is theoretical and does not describe empirical experiments that would involve hyperparameter values, training configurations, or system-level settings. |