Hierarchical Reinforcement Learning for Course Recommendation in MOOCs
Authors: Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, Jimeng Sun435-442
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from Xuetang X one of the largest MOOCs in China. Experimental results show that the proposed model significantly outperforms the state-of-the-art recommendation models (improving 5.02% to 18.95% in terms of HR@10). |
| Researcher Affiliation | Academia | 1Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China 2Information School, Renmin University of China 3Computational Science and Engineering at College of Computing, Georgia Institute of Technology |
| Pseudocode | Yes | Algorithm 1: The Overall Training Process Input: Training data {E1, E2, , E|U|}, a pre-trained basic recommendation model and a profile reviser parameterized by Φ0 and Θ0 respectively Initialize Θ = Θ0, Φ = Φ0 ; for episode l=1 to L do foreach Eu := (eu 1, , eu tu) and ci do Sample a high-level action ah with Θh; if ah = 0 then R(sh, ah) = 0 else Sample a sequence of low-level actions {al 1, al 2, , al tu} with Θl; Compute R(al tu, sl tu) and G(al tu, sl tu) ; Compute gradients by Eq. (??) and (??); end end Update Θ by the gradients; Update Φ in the basic recommendation model; end Algorithm 2: The Hierarchical Reinforcement Learning |
| Open Source Code | Yes | The code is online now3. 3https://github.com/jerryhao66/HRL |
| Open Datasets | No | We collect the dataset from Xuetang X2, one of the largest MOOCs platforms in China. We unify the same courses offered in different years such as Data Structure(2017) and Data Structure(2018) into one course and only select the users who enrolled at least three courses from October 1st, 2016 to March 31st, 2018. The resulting dataset consists of 1,302 courses which belonging to 23 categories, 82,535 users and 458,454 user-course pairs. The paper describes the collection process but does not provide specific access information (link, DOI, citation) for their processed dataset. |
| Dataset Splits | No | We select the enrolled behaviors from October 1st, 2016 to December 30th, 2017 as the training set, and those from January 1st, 2018 to March 31st, 2018 as the test set. No explicit mention of a separate validation dataset split was found. |
| Hardware Specification | Yes | We implement the model by Tensorflow and run the code on an Enterprise Linux Server with 40 Intel(R) Xeon(R) CPU cores (E5-2630 and 512G memory) and 1 NVIDIA TITAN V GPU core (12G memory). |
| Software Dependencies | No | The paper mentions 'Tensorflow' but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | For the profile reviser, sampling time M is set as 3, the learning rate is set as 0.001/0.0005 at the pre-training and joint-training stage respectively. In the policy function, the dimensions of the hidden layer dl 2 and dh 2 are both set as 8. For the basic recommender, the dimension of the course embeddings is set to 16, the learning rate is 0.01 at both the pre-training and joint-training stage, and the size of the minibatch is 256. The delayed coefficient λ for the joint-training is 0.0005. |