Intelligent Switching for Reset-Free RL
Authors: Darshan Patil, Janarthanan Rajendran, Glen Berseth, Sarath Chandar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically analyze the performance of RISC. Specifically, we: (1) Investigate whether reverse curriculums are the best approach for reset-free RL; (2) Compare the performance of RISC to other reset-free methods on the EARL benchmark; (3) Evaluate the necessity of both; timeout-nonterminal bootstrapping and early switching for RISC with an ablation study. |
| Researcher Affiliation | Academia | Darshan Patil Mila, Université de Montréal Janarthanan Rajendran Dalhousie University Glen Berseth Mila, Université de Montréal Canada CIFAR AI Chair Sarath Chandar Mila, École Polytechnique de Montréal Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1: Reset Free RL with Intelligently Switching Controller (RISC) Input : Trajectory switching probability: ζ s, g = env.reset() t = 0 check_switch = random() < ζ while True do a = agent.act(s, g) s , r = env.step(a) agent.update(s, a, r, s , g) t = t + 1 if should_switch(t, agent.Qf, s , g, check_switch) then g = switch_goals() t = 0 check_switch = random() < ζ end s = s end |
| Open Source Code | Yes | *Code available at https://github.com/chandar-lab/RISC. |
| Open Datasets | Yes | We evaluate our algorithm’s performance on the recently proposed EARL benchmark (Sharma et al., 2021b). |
| Dataset Splits | No | The paper does not specify explicit training/validation/test splits, but rather describes an evaluation protocol where agents are evaluated after certain timesteps in simulated environments. |
| Hardware Specification | No | All experiments were run as CPU jobs. The paper does not specify any particular CPU models, GPUs, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | All of our agents for the experiments on the EARL benchmark (Sharma et al., 2021b) use SAC (Haarnoja et al., 2018) as the base agent. ... For the 4 rooms experiments, all agents use a DQN (Mnih et al., 2015) agent as their base. The paper mentions the use of SAC and DQN as base agents but does not provide specific version numbers for software dependencies like PyTorch, TensorFlow, Python, or CUDA. |
| Experiment Setup | Yes | The other hyperparameters used for the base agent are described in Table 1. The experiments on the 4-rooms gridworld (Chevalier-Boisvert et al., 2018) use DQN (Mnih et al., 2015) as the base agent. The corresponding hyperparameters for those experiments are shown in Tables 2. The additional hyperparameters for RISC are shown in Table 3. |