Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Authors: Zhuobin Zheng, Chun Yuan, Xinrui Zhu, Zhihui Lin, Yangyang Cheng, Cheng Shi, Jiahui Ye5933-5940

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended Open AI Gym s Mu Jo Co multi-task environments. ... We present empirical experiments to analyze our algorithm dealing with a series of continuous control tasks on extended Mu Jo Co environments (Henderson et al. 2017).
Researcher Affiliation Academia 1Department of Computer Science and Technologies, Tsinghua University, Beijing, China 2Graduate School at Shenzhen, Tsinghua University, Shenzhen, China 3Tsinghua-Berkeley Shenzhen Institue, Tsinghua University, Shenzhen, China
Pseudocode Yes Algorithm 1 Self-Supervised Mixture-of-Experts
Open Source Code No The paper does not provide any explicit statement about releasing its source code or a link to a code repository.
Open Datasets Yes We evaluate our approach on continuous control environment Mu Jo Co (Todorov, Erez, and Tassa 2012) and its multitask extension (Henderson et al. 2017) (see Figure 2).
Dataset Splits No The paper describes training procedures, including the use of a replay buffer and mini-batches, typical for RL. However, it does not specify explicit training, validation, and test *dataset splits* (e.g., percentages or sample counts for a fixed dataset) as would be common in supervised learning.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions tools like "Adam (Kingma and Ba 2015)" (an optimizer), "Mu Jo Co (Todorov, Erez, and Tassa 2012)" and "Open AI Gym" (environments), but it does not specify version numbers for these or other software components necessary for reproducibility.
Experiment Setup Yes In all cases, we use fully-connected network (see Figure 1), where hidden layer and head layer sizes are denoted by (N, M). Unless otherwise stated, we adopt the network structure and common hyperparameters same as (Zheng et al. 2018): (256, 256, 128) for the critic and (256, 128) for the actor with Leaky Re LU activation. The gating network is (256, 128) with a softmax layer and updated by a learning rate 1e 4. These networks are trained by Adam (Kingma and Ba 2015) with a batch size n = 1024. Besides, we fix the decay rate for DMER λ = 0.9997.