reproducibilityindex.ai

Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Authors: Zhuobin Zheng, Chun Yuan, Xinrui Zhu, Zhihui Lin, Yangyang Cheng, Cheng Shi, Jiahui Ye5933-5940

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our approach learns faster and achieves better performance by efﬁcient transfer and robust generalization, outperforming several related methods on extended Open AI Gym s Mu Jo Co multi-task environments. ... We present empirical experiments to analyze our algorithm dealing with a series of continuous control tasks on extended Mu Jo Co environments (Henderson et al. 2017).
Researcher Affiliation	Academia	1Department of Computer Science and Technologies, Tsinghua University, Beijing, China 2Graduate School at Shenzhen, Tsinghua University, Shenzhen, China 3Tsinghua-Berkeley Shenzhen Institue, Tsinghua University, Shenzhen, China
Pseudocode	Yes	Algorithm 1 Self-Supervised Mixture-of-Experts
Open Source Code	No	The paper does not provide any explicit statement about releasing its source code or a link to a code repository.
Open Datasets	Yes	We evaluate our approach on continuous control environment Mu Jo Co (Todorov, Erez, and Tassa 2012) and its multitask extension (Henderson et al. 2017) (see Figure 2).
Dataset Splits	No	The paper describes training procedures, including the use of a replay buffer and mini-batches, typical for RL. However, it does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts for a fixed dataset) as would be common in supervised learning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions tools like "Adam (Kingma and Ba 2015)" (an optimizer), "Mu Jo Co (Todorov, Erez, and Tassa 2012)" and "Open AI Gym" (environments), but it does not specify version numbers for these or other software components necessary for reproducibility.
Experiment Setup	Yes	In all cases, we use fully-connected network (see Figure 1), where hidden layer and head layer sizes are denoted by (N, M). Unless otherwise stated, we adopt the network structure and common hyperparameters same as (Zheng et al. 2018): (256, 256, 128) for the critic and (256, 128) for the actor with Leaky Re LU activation. The gating network is (256, 128) with a softmax layer and updated by a learning rate 1e 4. These networks are trained by Adam (Kingma and Ba 2015) with a batch size n = 1024. Besides, we ﬁx the decay rate for DMER λ = 0.9997.