Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies

Authors: Haanvid Lee, Tri Wahyu Guntara, Jongmin Lee, Yung-Kyun Noh, Kee-Eung Kim

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In empirical studies using various test domains, we show that the OPE with in-sample learning using the kernel with optimized metric achieves significantly improved accuracy than other baselines. ... For empirical studies, we evaluate KMIFQE using a modified classic control domain sourced from Open AI Gym (Brockman et al., 2016). This evaluation serves to verify that the metrics and bandwidths are learned as intended. Furthermore, we conduct experiments on a more complex Mu Jo Co domain (Todorov et al., 2012). The experimental results demonstrate the effectiveness of our metric learning approach.
Researcher Affiliation	Academia	Haanvid Lee1, Tri Wahyu Guntara1, Jongmin Lee2, Yung-Kyun Noh3,4, Kee-Eung Kim1 1KAIST, 2UC Berkeley, 3Hanyang Univ., 4KIAS
Pseudocode	Yes	The detailed procedure is in Algorithm 1 in Appendix B.
Open Source Code	No	The paper mentions using implementations for baselines (SR-DICE and FQE) from a public GitHub repository, but it does not provide an explicit statement or a link for its own proposed method (KMIFQE) code.
Open Datasets	Yes	For empirical studies, we evaluate KMIFQE using a modified classic control domain sourced from Open AI Gym (Brockman et al., 2016). ... Furthermore, we conduct experiments on a more complex Mu Jo Co domain (Todorov et al., 2012). ... Lastly, KMIFQE and baselines are evaluated on D4RL (Fu et al., 2020) datasets...
Dataset Splits	Yes	The validation set is the 10% of the data, and rest of the data is used for training.
Hardware Specification	Yes	One i7 CPU with one NVIDIA Titan Xp GPU runs KMIFQE for two million train steps in 5 hours.
Software Dependencies	No	The paper mentions using the Adam optimizer and implementations for SR-DICE and FQE, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, specific library versions).
Experiment Setup	Yes	all networks are trained with Adam optimizer (Kingma & Ba, 2014) with the learning rate of 3e 4. For mini-batch sizes, the encoder-decoder network, and successor representation network of SRDICE, as well as FQE, use a mini-batch size of 256. For the learning of density ratio in SR-DICE and our algorithm, we use a mini-batch size of 1024. ... FQE and SR-DICE use update rate τ = 0.005... For our proposed method, target critic network is hard updated every 1000 iterations. ... The IS ratios are clipped to be in the range of [0.001, 2] selected by grid search