Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning
Authors: Riccardo Zamboni, Alberto Maria Metelli, Marcello Restelli
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we report the results of some illustrative numerical simulations, showing that the proposed algorithm matches the expected theoretical behavior and highlighting the relationship between aggregations and sample regimes. |
| Researcher Affiliation | Academia | Riccardo Zamboni DEIB, Politecnico di Milano Milan, Italy riccardo.zamboni@polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano Milan, Italy marcello.restelli@polimi.it |
| Pseudocode | Yes | Algorithm 1 Distributional Max-Ent Policy Evaluation... Algorithm 2 Distributional Max-Ent Progressive Factorization |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes a custom-designed 'rectangular Grid World' for its simulations but does not provide concrete access information (link, DOI, repository, or formal citation) for it to be considered a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, or methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Finally, the value of the partition splitting K is set to 2, to reduce the exponential search space of all possible uniform partitions, the discount factor γ is set to 0.98 and the confidence δ to 0.1, the results are averaged over 10 rounds with the respective standard deviation. |