Locality Sensitive Sparse Encoding for Learning World Models Online
Authors: Zichen Liu, Chao Du, Wee Sun Lee, Min Lin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods. and EMPIRICAL RESULTS ON SUPERVISED LEARNING and EMPIRICAL RESULTS ON REINFORCEMENT LEARNING |
| Researcher Affiliation | Collaboration | Sea AI Lab National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Online model learning and Algorithm 2 Sparse online model learning and Algorithm 3 Dyna MBRL with Losse-FTL |
| Open Source Code | Yes | We provide Jax-based Python codes in Listing 1. |
| Open Datasets | Yes | We test different encoding methods on an image denoising task, where the inputs are MNIST (Deng, 2012) images with Gaussian noise, and the outputs are clean images. and For continuous control tasks from Gym Mujoco (Todorov et al., 2012) |
| Dataset Splits | Yes | We add pixel-wise Gaussian noise with µ = 0, σ = 0.3 to normalized images from the MNIST dataset (Deng, 2012), and create train and test splits with ratio 9 : 1. and Before the training loop starts, the data is split into train and holdout sets with a ratio of 4:1. The loss on the holdout set is measured after each epoch. We stop training if the loss does not improve over 5 consecutive epochs. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or memory) used for experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Jax-based Python codes' in Appendix E and imports JAX in Listing 1, but does not provide specific version numbers for Python, JAX, or any other software dependencies. |
| Experiment Setup | Yes | The linear layer is optimized using Adam (Kingma & Ba, 2015) with a learning rate 0.0001. and Adam optimizer is used with the best learning rate swept from {5 10 6, 1 10 5, 5 10 5, . . . , 1 10 2}. and For Losse-FTL, we use 2-d binning with 10 bins for each grid and use 10 grids to construct the final feature. and We use κ = 30, ρ = 2, and λ = 10 for Losse-FTL, and 3-layer MLPs with 32 hidden units for neural networks. and The DQN parameters are updated using real data with an interval of 4 interactions. In MBRL, 16 planning steps are conducted after each real data update. Both real data updates and planning updates use a mini batch of 32. |