An Information Theoretic Approach to Interaction-Grounded Learning

Authors: Xiaoyan Hu, Farzan Farnia, Ho-Fung Leung

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, The Chinese University of Hong Kong 2Independent Researcher. Correspondence to: Xiaoyan Hu <xyhu21@cse.cuhk.edu.hk>.
Pseudocode Yes Algorithm 1 Variational Information-based IGL (VI-IGL)
Open Source Code No The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes a random image xt (context), whose corresponding number is denoted by lxt {0, 1, , 9}, is drawn from the MNIST dataset (Lecun et al., 1998)
Dataset Splits No The paper specifies a training dataset of 60,000 samples and a test dataset of 10,000 samples, but does not mention explicit validation dataset splits or specific percentages for data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments.
Software Dependencies No The paper does not specify the version numbers of any key software components or libraries used for the experiments.
Experiment Setup Yes For the f-variational estimators (functions T and G), the reward decoder ψ, and the linear policy π, we use a 2-layer fully-connected network to process each input image (i.e., the context or the feedback). Then, the concatenated inputs go through an additional linear layer and the final value is output. The same network structures are used to implement the reward decoder and the policy of the previous IGL algorithm (Xie et al., 2021b). In each experiment, we train the f-VI-IGL algorithm for 1, 000 epochs with a batch size of 600. Particularly, we alternatively update the parameters of the f-MI estimators and the reward decoders (i.e., 500 epochs of training for each). To stabilize the training, we clip the gradient norm to be no greater than 1 and use an exponential moving average (EMA) with a rate of 0.99.