reproducibilityindex.ai

Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning

Authors: Riccardo Zamboni, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we report the results of some illustrative numerical simulations, showing that the proposed algorithm matches the expected theoretical behavior and highlighting the relationship between aggregations and sample regimes.
Researcher Affiliation	Academia	Riccardo Zamboni DEIB, Politecnico di Milano Milan, Italy riccardo.zamboni@polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano Milan, Italy marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1 Distributional Max-Ent Policy Evaluation... Algorithm 2 Distributional Max-Ent Progressive Factorization
Open Source Code	No	The paper does not include an explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology.
Open Datasets	No	The paper describes a custom-designed 'rectangular Grid World' for its simulations but does not provide concrete access information (link, DOI, repository, or formal citation) for it to be considered a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, sample counts, or methodology) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Finally, the value of the partition splitting K is set to 2, to reduce the exponential search space of all possible uniform partitions, the discount factor γ is set to 0.98 and the confidence δ to 0.1, the results are averaged over 10 rounds with the respective standard deviation.