Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Transfer of Deep Reactive Policies for MDP Planning
Authors: Aniket (Nick) Bajpai, Sankalp Garg, Mausam
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three different benchmark domains underscore the value of our transfer algorithm. Compared against planning from scratch, and a state-of-the-art RL transfer algorithm, our transfer solution has significantly superior learning curves. |
| Researcher Affiliation | Academia | Aniket Bajpai, Sankalp Garg, Mausam Indian Institute of Technology, Delhi New Delhi, India EMAIL, EMAIL |
| Pseudocode | No | The paper describes the architecture and algorithms in detail, including components like the State Encoder, RL Module, Action Decoder, and Transition Transfer Module. However, it does not provide any formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | We release the code of TORPIDO for future research.1 Available at https://github.com/dair-iitd/torpido |
| Open Datasets | Yes | Domains: We make all comparisons on three different RDDL domains used in IPPC, International Planning Competition 2014 [Grzes et al., 2014] Sys Admin, Game of Life and Navigation. |
| Dataset Splits | No | The paper mentions a "training phase" and that "the training phase uses four source problems" and "All problems are trained using the generators available for each domain." However, it does not explicitly specify distinct training/validation/test splits with percentages, sample counts, or references to predefined splits for their experiments. |
| Hardware Specification | No | The paper states: "We thank Microsoft Azure sponsorships, and the IIT Delhi HPC facility for computational resources." This mentions general computing environments but lacks specific hardware details such as GPU models, CPU types, or memory amounts. |
| Software Dependencies | No | The paper mentions software components like "RMSProp" (an optimizer) and "A3C" (an algorithm) and states "All layers use the exponential linear unit (ELU) activations". However, it does not provide specific version numbers for any programming languages, libraries (e.g., TensorFlow, PyTorch), or other software tools used. |
| Experiment Setup | Yes | All hyperparameters are kept constant for all problems in all domains. Our parameters are as follows. A3C s value network as well as policy network use two GCN layers (3, 7 feature maps) and two fully connected layers. The action decoder implements two fully connected layers. All layers use the exponential linear unit (ELU) activations [Clevert et al., 2015]. All networks are trained using RMSProp with a learning rate of 5e 5. For TORPIDO, we set N = 4, i.e., the training phase uses four source problems. |