Self-Supervised Mixture-of-Experts by Uncertainty Estimation
Authors: Zhuobin Zheng, Chun Yuan, Xinrui Zhu, Zhihui Lin, Yangyang Cheng, Cheng Shi, Jiahui Ye5933-5940
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended Open AI Gym s Mu Jo Co multi-task environments. ... We present empirical experiments to analyze our algorithm dealing with a series of continuous control tasks on extended Mu Jo Co environments (Henderson et al. 2017). |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technologies, Tsinghua University, Beijing, China 2Graduate School at Shenzhen, Tsinghua University, Shenzhen, China 3Tsinghua-Berkeley Shenzhen Institue, Tsinghua University, Shenzhen, China |
| Pseudocode | Yes | Algorithm 1 Self-Supervised Mixture-of-Experts |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our approach on continuous control environment Mu Jo Co (Todorov, Erez, and Tassa 2012) and its multitask extension (Henderson et al. 2017) (see Figure 2). |
| Dataset Splits | No | The paper describes training procedures, including the use of a replay buffer and mini-batches, typical for RL. However, it does not specify explicit training, validation, and test *dataset splits* (e.g., percentages or sample counts for a fixed dataset) as would be common in supervised learning. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions tools like "Adam (Kingma and Ba 2015)" (an optimizer), "Mu Jo Co (Todorov, Erez, and Tassa 2012)" and "Open AI Gym" (environments), but it does not specify version numbers for these or other software components necessary for reproducibility. |
| Experiment Setup | Yes | In all cases, we use fully-connected network (see Figure 1), where hidden layer and head layer sizes are denoted by (N, M). Unless otherwise stated, we adopt the network structure and common hyperparameters same as (Zheng et al. 2018): (256, 256, 128) for the critic and (256, 128) for the actor with Leaky Re LU activation. The gating network is (256, 128) with a softmax layer and updated by a learning rate 1e 4. These networks are trained by Adam (Kingma and Ba 2015) with a batch size n = 1024. Besides, we fix the decay rate for DMER λ = 0.9997. |