Explanation-Guided Reward Alignment
Authors: Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of learning aligned linear and non-linear rewards with REVEALE using three explanation generation techniques: gradient as explanations (Ga E), LIME, and saliency map (SM). We evaluate the performance in five proof-of-concept domains. Our empirical results on five domains demonstrate that learning with REVEALE generalizes well and achieves higher prediction accuracy and average reward, often matching optimal performance. |
| Researcher Affiliation | Academia | Saaduddin Mahmud1 , Sandhya Saisubramanian2 , Shlomo Zilberstein1 1University of Massachusetts Amherst, USA 2Oregon State University, USA smahmud@umass.edu, sandhya.sai@oregonstate.edu, shlomo@umass.edu |
| Pseudocode | No | The paper describes the 'iterative learning and verification of reward (ILV)' algorithm in Section 4 but does not provide it in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper states 'We implemented all algorithms in Python' but does not provide any specific repository link or explicit statement about open-sourcing the code for the described methodology. |
| Open Datasets | No | The paper states 'Training data is generated by sub-optimally solving a set of training instances' and 'the generated dataset varies for each seed', indicating a custom-generated dataset without providing concrete access information (link, DOI, repository, or citation to an established public dataset). |
| Dataset Splits | No | The paper discusses 'training data' and 'test instances' but does not explicitly specify the use of a separate 'validation' dataset split with details like percentages, sample counts, or predefined citations. |
| Hardware Specification | Yes | We implemented all algorithms in Python and tested them on an Ubuntu machine with 32GB RAM and 12GB GPU. |
| Software Dependencies | No | The paper states 'We implemented all algorithms in Python' but does not provide specific version numbers for Python or any key ancillary software libraries or solvers used. |
| Experiment Setup | Yes | We implemented all algorithms in Python and tested them on an Ubuntu machine with 32GB RAM and 12GB GPU. The reported values are averaged over 60 different random seeds. It is important to note that the generated dataset varies for each seed. For non-linear rewards, we utilized a 4-layer neural network with Relu activation. We observe that 64 explanations for linear reward and 256 for non-linear reward help reach maximal accuracy. Based on these two results, for all the subsequent experiments, we use 128 trajectories, 64 ranked feedback over pairs of explanations, and cosine similarity for domains with linear rewards. For non-linear rewards we use, 1024 trajectories, 256 ranked feedback over pairs of explanations, and L2 distance. |