Critic Sequential Monte Carlo
Authors: Vasileios Lioutas, Jonathan Wilder Lavington, Justice Sefas, Matthew Niedoba, Yunpeng Liu, Berend Zwartsenberg, Setareh Dabiri, Frank Wood, Adam Scibior
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on collision avoidance in a high-dimensional simulated driving task show that Critic SMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios. |
| Researcher Affiliation | Collaboration | Vasileios Lioutas 1,2, J. Wilder Lavington1,2, Justice Sefas1,2, Matthew Niedoba1,2, Yunpeng Liu1,2, Berend Zwartsenberg1, Setareh Dabiri1, Frank Wood1,2,3, Adam Scibior1 1Inverted AI, 2University of British Columbia, 3Mila |
| Pseudocode | Yes | Algorithm 1 Critic Sequential Monte Carlo. Algorithm 2 Sequential Monte Carlo. |
| Open Source Code | Yes | We include in the supplementary material a demo code implementation of Critic SMC applied to the following linear Gaussian state-space model (LGSSM) with well-defined critic function |
| Open Datasets | Yes | The prior model is trained on the INTERACTION (Zhan et al., 2019) dataset and the task is that given 10 timesteps of observed behavior, predict the next 30 timesteps of future trajectories. |
| Dataset Splits | Yes | The evaluation is performed using the validation split of the INTERACTION dataset, which neither ITRA nor the critic saw during training. |
| Hardware Specification | Yes | We train the model using a single Nvidia RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions "stable-baselines3 (Brockman et al., 2016)" for SAC implementation, but does not provide specific version numbers for this or other software components. |
| Experiment Setup | Yes | We train the model using a single Nvidia RTX 2080Ti GPU. The prioritized experience replay buffer has a size of 1 million stored experiences. The discount factor is set to 0.99, the batch size to 256 and the learning rate to 0.001. Finally, we sample 1024 actions during running Critic SMC while training the critic model. |