MICo: Improved representations via sampling-based state similarity for Markov decision processes
Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark. and 6 Large-scale empirical evaluation |
| Researcher Affiliation | Collaboration | Pablo Samuel Castro Google Research, Brain Team Tyler Kastner Mc Gill University Prakash Panangaden Mc Gill University Mark Rowland Deep Mind |
| Pseudocode | No | The paper describes algorithms mathematically (e.g., Equation 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluated on all 60 Atari 2600 games over 5 seeds and report the results in Figure 1 (left), using the interquantile metric (IQM), proposed by Agarwal et al. [2021b] as a more robust and reliable alternative to mean and median (which are reported in Figure 6). and Additionally, we evaluated the MICo loss on twelve of the DM-Control suite from pixels environments [Tassa et al., 2018]. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages, sample counts, or detailed methodology for training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or TPU versions) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'JAX agents provided in the Dopamine library' and 'Tensor Flow' without providing specific version numbers for these software components. |
| Experiment Setup | Yes | For all experiments we used the hyperparameter settings provided with Dopamine. We found that a value of α = 0.5 worked well with quantile-based agents (QR-DQN, IQN, and M-IQN), while a value of α = 0.01 worked well with DQN and Rainbow. and We found it important to use the Huber loss [Huber, 1964] to minimize LMICo as this emphasizes greater accuracy for smaller distances as oppoosed to larger distances. |