Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization
Authors: Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, El Houcine Bergou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive numerical evaluations to assess the computational efficiency of Mi STP and compare its performance to other state-of-the-art methods by testing it on several machine learning tasks. |
| Researcher Affiliation | Academia | Soumia Boucherouite1, Grigory Malinovsky2, Peter Richt arik2, El Houcine Bergou1 1College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco 2King Abdullah University of Science and Technology, Thuwal, Saudi Arabia |
| Pseudocode | Yes | Algorithm 1: Minibatch Stochastic Three Points (Mi STP) |
| Open Source Code | Yes | All codes for the experiments are available at: https://github. com/Soumia Bouch/Minibatch-STP. |
| Open Datasets | Yes | The experiments of this section are conducted using LIBSVM datasets (Chang and Lin 2011). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers. |
| Experiment Setup | Yes | For each minibatch size, we choose the learning rate α by performing a grid search on the values 1,0.1,0.01,... and select the one that gives the best performance. [...] The architecture we used has three fully-connected layers of size 256, 128, 10, with Re LU activation after the first two layers and a Softmax activation function after the last layer. The loss function is the categorical cross entropy. [...] we generate an adversarial attack to a set of n = 10 images of class 1 using a minibatch size of τ = 5 and a fixed stepsize α = 2 for Mi STP, α = 5/d for ZO-SVRG, and α = 30/d for both RSGF and ZO-SVRG-Ave. We set the epoch length to 10, µ = 0.01, and c = 1. |