Learning with Relaxed Supervision
Authors: Jacob Steinhardt, Percy S. Liang
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now empirically explore our method s behavior. All of our code, data, and experiments may be found on the Coda Lab worksheet for this paper at https://www.codalab.org/worksheets/ 0xc9db508bb80446d2b66cbc8e2c74c052/, which also contains more detailed plots beyond those shown here. We would like to answer the following questions: Fixed β: For a fixed β, how does the relaxation parameter β affect the learned parameters? What is the trade-off between accuracy and computation as we vary β? Adapting β: Does optimizing β affect performance? Is the per-coordinate adaptivity of our relaxation advantageous, or can we set all coordinates of β to be equal? How does the computational budget τ (from C0 and C1) impact the optimization? |
| Researcher Affiliation | Academia | Jacob Steinhardt Stanford University jsteinhardt@cs.stanford.edu Percy Liang Stanford University pliang@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Minimizing L(θ, β) while guaranteeing tractable inference. Input training data (x(i), y(i))n i=1. Initialize θ = 0, βj = ϵ for j = 1, . . . , k. while not converged do Estimate φ(i), ψ(i), and A( θ, β; x(i), y(i)) for i = 1, . . . , n by sampling p θ, β(z |x(i), y(i)). Estimate the functions A(θ, β; x(i), y(i)) using the output from the preceding step. Let (ˆθ, ˆβ) be the solution to minimize θ,β 1 n A(θ; x(i)) + A(β) A(θ, β; x(i), y(i)) subject to (C0), βj ϵ for j = 1, . . . , k Update ( θ, β) (ˆθ, ˆβ). end while Repeat the same loop as above, with the constraint (C0) replaced by (C1). Output ( θ, β). |
| Open Source Code | Yes | All of our code, data, and experiments may be found on the Coda Lab worksheet for this paper at https://www.codalab.org/worksheets/ 0xc9db508bb80446d2b66cbc8e2c74c052/, which also contains more detailed plots beyond those shown here. |
| Open Datasets | No | The paper states: 'In our experiments, we generated n = 100 sentences of length L = 20 with vocabulary size V = 102' for one task and 'We used n = 100 and sentence length L = 25' for another, indicating they generated their own datasets without providing specific access information for these generated datasets. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'the solver SNOPT [15] for the inner optimization' but does not provide a specific version number for SNOPT or any other software dependencies. |
| Experiment Setup | Yes | For optimization, we used Algorithm 1, using S = 50 samples to approximate each φ(i) and ψ(i), and using the solver SNOPT [15] for the inner optimization. We ran Algorithm 1 for 50 iterations; when β is not fixed, we apply the constraint (C0) for the first 10 iterations and (C1) for the remaining 40 iterations; when it is fixed, we do not apply any constraint. We set the computational budget τ = 50 for the constraints C0 and C1, and ϵ = 1 L as the lower bound on β. |