Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning

Authors: Riccardo Zamboni, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we report the results of some illustrative numerical simulations, showing that the proposed algorithm matches the expected theoretical behavior and highlighting the relationship between aggregations and sample regimes.
Researcher Affiliation Academia Riccardo Zamboni DEIB, Politecnico di Milano Milan, Italy riccardo.zamboni@polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano Milan, Italy marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1 Distributional Max-Ent Policy Evaluation... Algorithm 2 Distributional Max-Ent Progressive Factorization
Open Source Code No The paper does not include an explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology.
Open Datasets No The paper describes a custom-designed 'rectangular Grid World' for its simulations but does not provide concrete access information (link, DOI, repository, or formal citation) for it to be considered a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, or methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Finally, the value of the partition splitting K is set to 2, to reduce the exponential search space of all possible uniform partitions, the discount factor γ is set to 0.98 and the confidence δ to 0.1, the results are averaged over 10 rounds with the respective standard deviation.