reproducibilityindex.ai

Selective Sampling and Imitation Learning via Online Regression

Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to verify our theory.
Researcher Affiliation	Academia	1MIT, 2Cornell University
Pseudocode	Yes	Algorithm 1 Selective SAmplin G with Expert Feedback (SAGE); Algorithm 2 Inte RActi Ve Imitati On Learning VIa Active Expert Querying (RAVIOLI); Algorithm 3 Selective Sampling with Expert Feedback for Stochastic Contexts; Algorithm 4 Inte RActi Ve Imitati On Learning VIa Active Queries to M Experts (RAVIOLI M)
Open Source Code	No	The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We first introduce the simulator, Cart Pole [Barto et al., 1983, Brockman et al., 2016]
Dataset Splits	No	The paper mentions using the CartPole environment but does not specify training, validation, or test dataset splits or percentages. It describes generating expert policies and using a neural network for the function class.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions using a neural network.
Software Dependencies	No	The paper mentions using a 'neural network (single hidden layer neural network,with 4 neurons in the hidden layer)' but does not specify any software names with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, along with their versions).
Experiment Setup	Yes	First, we use a neural network (single hidden layer neural network,with 4 neurons in the hidden layer) as our function class {Fm h }h H,m M. Second, we specify Select Action to pick the action of the most confident expert, i.e., Select Action(f 1 t,h(x),...,f M t,h(x)) = sign(f ˆi t,h(x)) where ˆi = arg max i [M] f i t,h(x) . In our key experiments, we choose α = 50 when the number of experts is 1, 2 or 3, and choose 200 for 5-expert experiments.