Personalized Reward Learning with Interaction-Grounded Learning (IGL)
Authors: Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces. 4 EMPIRICAL EVALUATIONS |
| Researcher Affiliation | Collaboration | Jessica Maghakian Stony Brook University jessica.maghakian@stonybrook.edu Paul Mineiro Microsoft Research NYC pmineiro@microsoft.com Kishan Panaganti Texas A&M University kpb@tamu.edu Mark Rucker University of Virginia mr2an@virginia.edu Akanksha Saran Microsoft Research NYC akanksha.saran@microsoft.com Cheng Tan Microsoft Research NYC tan.cheng@microsoft.com |
| Pseudocode | Yes | Algorithm 1 IGL; Inverse Kinematics; 2 or 3 Latent States; On or Off-Policy. |
| Open Source Code | Yes | Our code3 is available for all publicly replicable experiments (i.e. except production data). The code will be made publicly available at {url redacted}. |
| Open Datasets | Yes | We simulated using the Covertype (Blackard & Dean, 1999) dataset with M = N = 100, and an (inverse kinematics) model class which embedded both user and word ids into a 2 dimensional space. Our simulations are built on a dataset (Martinchek, 2016) of all posts by the official Facebook pages of 5 popular news outlets (ABC News, CBS News, CNN, Fox News and The New York Times) that span the political spectrum. |
| Dataset Splits | No | The paper does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts). It refers to simulation setups but lacks specific partitioning details. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and Adam optimizer but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | Both CB and IK are linear logistic regression models implemented in Py Torch, trained using the cross-entropy loss. Both models used Adam to update their weights with a learning rate of 2.5e 3. All models used Adam to update their weights with a learning rate of 10 3, batch size 100, and cross-entropy loss function. |