Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents
Authors: Quentin Delfosse, Sebastian Sztwiertnia, Mark Rothermel, Wolfgang Stammer, Kristian Kersting
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results provide evidence of SCo Bots competitive performances, but also of their potential for domain experts to understand and regularize their behavior. Among other things, SCo Bots enabled us to identify a previously unknown misalignment problem in the iconic video game, Pong, and resolve it. |
| Researcher Affiliation | Academia | Quentin Delfosse ,1 Sebastian Sztwiertnia ,1 Mark Rothermel1 Wolfgang Stammer1,2 Kristian Kersting1,2,3,4 1Computer Science Department, TU Darmstadt, Germany 2Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Germany 3Centre for Cognitive Science, TU Darmstadt, Germany 4German Research Center for Artificial Intelligence (DFKI), Darmstadt, Germany {firstname.lastname}@cs.tu-darmstadt.de |
| Pseudocode | No | The paper describes the architecture and processes verbally and with diagrams (e.g., Figure 2), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code available at https://github.com/k4ntz/SCoBots |
| Open Datasets | Yes | We evaluate SCo Bots on 9 Atari games (cf. Fig. 3 from the Atari Learning Environments [Bellemare et al., 2012]) (by far the most used RL framework (cf. App. A.1), as well as the Hack Atari modified [Delfosse et al., 2024a] Pong environments |
| Dataset Splits | Yes | Each training seed s performance is evaluated every 500k frames on 4 differently seeded (42+training seed) environments for 8 episodes each. After training, the best performing checkpoint is then ultimately evaluated on 4 seeded (123, 456, 789, 1011) test environments. |
| Hardware Specification | Yes | All Experiments were run on a AMD Ryzen 7 processor, 64GB of RAM and one NVIDIA Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | All agents are trained for 20M frames under the Proximal Policy Optimization algorithm (PPO, [Schulman et al., 2017]), specifically the stable-baseline3 implementation [Raffin et al., 2021] and its default hyperparameters (cf. Tab. 2 in App. A.5). While 'stable-baseline3' is mentioned, a specific version number for the software package itself is not provided. |
| Experiment Setup | Yes | All agents are trained for 20M frames under the Proximal Policy Optimization algorithm (PPO, [Schulman et al., 2017]), specifically the stable-baseline3 implementation [Raffin et al., 2021] and its default hyperparameters (cf. Tab. 2 in App. A.5). |