Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location
Authors: Rasheed El-Bouri, David Eyre, Peter Watkinson, Tingting Zhu, David Clifton
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By validating on three datasets, not only do we show that our approach outperforms state-of-the-art methods on tabular data and performs competitively on image recognition, but also that novel curricula are learned by the teacher network. We demonstrate experimentally that the teacher network can actively learn about the student network and guide it to achieve better performance than if trained alone. |
| Researcher Affiliation | Academia | 1University of Oxford 2Oxford University Hospitals Trust. |
| Pseudocode | Yes | Algorithm 1 The student-teacher training routine for discrete batches using the DDPG algorithm Data: Training dataset organised into N batches of Mahalnobis curriculum initialise teacher critic network, Q, actor network, µ initialise target teacher and actor, QT , µT initialise random process N for action exploration initialise replay buffer, R, select batchsize of replay, m select update frequency value, U select stable update value, τ for x in X students do -Extract state of fx, si -Select action ai = µ (si | θµ) + Ni according to current policy and exploration noise -Execute action ai and observe reward ri based on improvement in performance on validation set and observe new state si+1 -Store transition (si, ai, ri, si+1) in replay buffer, R -Sample random minibatch of n transition tuples from R -Set yi = ri + γQT si+1, µT (si+1 | θµT ) | θQT -Update critic by minimising: L = 1 i yi Q si, ai | θQ 2 -Update actor using sampled policy gradient: θµJ 1 i a Q s, a | θQ θµµ (s | θµ) if i mod U = 0 then Update target networks: θQT τθQ + (1 τ) θQT θµT τθµ + (1 τ) θµT continue end end end |
| Open Source Code | No | The paper does not provide any statement about releasing source code or include a link to a code repository for their methodology. |
| Open Datasets | Yes | In this study we considered the patient data collected in the electronic health records (EHR) of the Oxford University Hospitals Trust between January 2013 and April 2017. Specifically, we use the Infections in Oxfordshire Research Database (IORD). To validate the efficacy of the methodology we implement the algorithm on another classification problem from the MIMIC-III dataset in the next section (Johnson et al., 2016). To further validate our methodology we also report results on the CIFAR-10 image recognition dataset (Krizhevsky, 2009). |
| Dataset Splits | Yes | A training set of 60% of the dataset was used and was balanced, leaving 8,589 patients for training on. The validation set was 20% of the dataset and testing was also 20% and the classes were kept in the same distribution as the original dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Re LU function', 'Stochastic gradient descent', and 'deep Q-networks' but does not specify any version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The architecture used for the teacher network is a fully-connected feedforward neural network with 3 hidden layers and 50 nodes in each layer. The hidden layer nodes are activated by the Re LU function and a dropout rate of 20% is used to prevent overfitting. Stochastic gradient descent with momentum (Sutskever et al., 2013) is used to train the network weights with a momentum value of 0.9. A discount factor of 0.95 is used and the target network is updated after every 20 times the non-target network is updated. An experience replay batch size of 10 is used after every 10 updates on the non-target network. We define the student to be a fully-connected feedforward neural network with 2 hidden layers, with nodes Mi = Mj = 50. Each node in the hidden layer is activated by the Re LU function apart from the final layer where a softmax function is used to classify. The student is trained using stochastic gradient descent with a fixed learning rate of 0.001. |