reproducibilityindex.ai

Autoregressive Policy Optimization for Constrained Allocation Tasks

Authors: David Winkel, Niklas Strauß, Maximilian Bernhard, Zongyue Li, Thomas Seidl, Matthias Schubert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.
Researcher Affiliation	Academia	David Winkel Niklas Strauß Maximilian Bernhard Zongyue Li Thomas Seidl Matthias Schubert Munich Center for Machine Learning, LMU Munich {winkel,strauss,bernhard,li,seidl,schubert}@dbs.ifi.lmu.de
Pseudocode	Yes	Algorithm 1 Maximum likelihood estimation of parameter de-biasing terms
Open Source Code	Yes	Our code is available at: https://github.com/ niklasdbs/paspo.
Open Datasets	Yes	We use the environment of [27]...The financial market trajectories in this environment are sampled from a hidden Markov model, which was fitted based on real-world NASDAQ-100 data...The environment is based on the paper of [3].
Dataset Splits	No	The paper describes evaluation procedures during training (e.g., 'After every 5120 environment steps, we run eight parallel evaluations on 200 fixed trajectories') but does not specify a distinct validation dataset split in percentages or counts typically used for hyperparameter tuning, separate from a final test set.
Hardware Specification	Yes	Given the relatively small network sizes, training is conducted exclusively on CPUs. We used an internal CPU cluster with consumer machines and servers ranging from 8 to 90 cores and RAM between 32GB and 512GB.
Software Dependencies	No	The paper states 'We implement our algorithm and the baselines using RLlib and Py Torch' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	In Table 5 we list the most important parameters and hyperparameters... We use a fully-connected MLP with two hidden layers of 32 units and Re LU non-linearities for each policy, cost, and value function.