reproducibilityindex.ai

Learning Parametric Closed-Loop Policies for Markov Potential Games

Authors: Sergio Valcarcel Macua, Javier Zazo, Santiago Zazo

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the theoretical contributions with an example by applying our approach to a noncooperative communications engineering game. We then solve the game with a deep reinforcement learning algorithm that learns policies that closely approximates an exact variational NE of the game. In this section, we show how to use the proposed MPGs framework to learn an equilibrium of a communications engineering application. As a proof of concept, we perform simulations with TRPO, approximating the policy with a neural network with 3 hidden layers of size 32 neurons per layer and RELU activation function...
Researcher Affiliation	Collaboration	Sergio Valcarcel Macua PROWLER.io Cambridge, UK sergio@prowler.io Javier Zazo, Santiago Zazo Information Processing and Telecommunications Center Universidad Politécnica de Madrid Madrid, Spain javier.zazo.ruiz@upm.es santiago@gaps.ssr.upm.es
Pseudocode	No	The paper describes methods in text and mathematical formulations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	No	The paper does not provide a specific link to source code nor an explicit statement confirming that the source code for the described methodology is publicly available.
Open Datasets	No	The paper states: 'To surmount this issue, we generated 100 independent sequences of samples of hk,i and δk,i for all k N and length T = 100 time steps each, and obtain two solutions with them.' This indicates self-generated data, but no access information (link, citation, or repository) for a publicly available or open dataset is provided.
Dataset Splits	No	The paper describes generating sequences for benchmarking and training a DRL agent that learns by interacting with a simulator, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper mentions running simulations and training a neural network but does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for its experiments.
Software Dependencies	No	The paper mentions using the 'Trust Region Policy Optimization (TRPO) algorithm' and refers to 'CVX' for convex optimization, but it does not specify version numbers for these or any other software components.
Experiment Setup	Yes	As a proof of concept, we perform simulations with TRPO, approximating the policy with a neural network with 3 hidden layers of size 32 neurons per layer and RELU activation function, and an output layer that is the mean of a Gaussian distribution. Each iteration of TRPO uses a batch of size 4000 simulation steps (i.e., tuples of state transition, action and rewards). The step-size is 0.01.