Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization
Authors: Junyi Liao, Zihan Zhu, Ethan X Fang, Zhuoran Yang, Vahid Tarokh
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate the framework s effectiveness in accurately recovering reward functions across various scenarios, offering new insights into decisionmaking in competitive environments. In this section, we implement our reward-learning algorithm and conduct numerical experiments in both entropyregularized zero-sum matrix games and Markov games. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Duke University, Durham NC, USA 2Department of Statistics and Data Science, University of Pennsylvania, Philadelphia PA, USA 3Department of Statistics and Data Science, Yale University, New Haven CT, USA. Correspondence to: Junyi Liao <EMAIL>. |
| Pseudocode | No | The paper describes the steps of an algorithm in prose format and bullet points in Section 3.2, but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block with code-like formatting. For example: 'Next, we propose an algorithm to recover the feasible reward functions. For all h [H], the algorithm performs the following four steps: Recover the feasible set by solving the least square problem...' |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. The text does not mention any plans for future release or code availability upon request either. |
| Open Datasets | No | The paper does not mention the use of any publicly available datasets. The numerical experiments described in Section 4 use a simulated setup with defined parameters. The text 'Given a dataset D = {Dh}h [H] = {{(st h, at h, bt h)}t [T ]}h [H]' suggests the generation of data within the experimental setup rather than using an external public dataset. |
| Dataset Splits | No | The paper conducts numerical experiments using simulated data, where sample size N is varied (e.g., from 10^4 to 10^5). However, it does not specify any training, validation, or test splits for this data. The entire dataset appears to be used for evaluating the algorithm's performance. |
| Hardware Specification | No | The paper states: 'All experiments are conducted in Google Colab.' While Google Colab provides hardware, the specific GPU or CPU models, memory, or other detailed specifications are not explicitly mentioned in the text. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. It mentions implementing the algorithm and conducting numerical experiments, but no programming languages, libraries, or tools with their specific versions are listed. |
| Experiment Setup | Yes | Setup. We define the kernel function ϕ : A B Rd with dimension d = 2, and set the true parameter ωh that specifying reward functions to be ω h = (0.8, 0.6) for all steps h [H]. We set the sizes of action spaces to be m = 5 and n = 5, the size of state space S = 4, and the horizon H = 6. The entropy regularization term is η = 0.5. We implement the algorithm proposed in 3.2. In each experiment, our algorithm outputs a parameter bθh in the confidence set bΘh. We set the bound of feasible parameters θh to be R = 10, and set the threshold κh = 103/N, where N is the sample size. The regularization term in ridge regression is λ = 0.01. |