An Information Theoretic Approach to Interaction-Grounded Learning
Authors: Xiaoyan Hu, Farzan Farnia, Ho-Fung Leung
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, The Chinese University of Hong Kong 2Independent Researcher. Correspondence to: Xiaoyan Hu <xyhu21@cse.cuhk.edu.hk>. |
| Pseudocode | Yes | Algorithm 1 Variational Information-based IGL (VI-IGL) |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | a random image xt (context), whose corresponding number is denoted by lxt {0, 1, , 9}, is drawn from the MNIST dataset (Lecun et al., 1998) |
| Dataset Splits | No | The paper specifies a training dataset of 60,000 samples and a test dataset of 10,000 samples, but does not mention explicit validation dataset splits or specific percentages for data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper does not specify the version numbers of any key software components or libraries used for the experiments. |
| Experiment Setup | Yes | For the f-variational estimators (functions T and G), the reward decoder ψ, and the linear policy π, we use a 2-layer fully-connected network to process each input image (i.e., the context or the feedback). Then, the concatenated inputs go through an additional linear layer and the final value is output. The same network structures are used to implement the reward decoder and the policy of the previous IGL algorithm (Xie et al., 2021b). In each experiment, we train the f-VI-IGL algorithm for 1, 000 epochs with a batch size of 600. Particularly, we alternatively update the parameters of the f-MI estimators and the reward decoders (i.e., 500 epochs of training for each). To stabilize the training, we clip the gradient norm to be no greater than 1 and use an exponential moving average (EMA) with a rate of 0.99. |