Flashlight: Enabling Innovation in Tools for Machine Learning

Authors: Jacob D Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the sections that follow, we compare Flashlight to two widely-used deep learning frameworks Py Torch and Tensorflow with metrics relevant to framework research velocity. We also evaluate framework performance to demonstrate the potential of our approach and the quality of the default implementations of all our components. We outline the steps needed to reproduce all our results in the Appendix. Table 3. Performance on common state-of-the-art models across frameworks. Values are the number of seconds needed to perform 100 iterations of the forward and backwards passes, with data loading (unless indicated).
Researcher Affiliation Industry 1Facebook AI Research, Menlo Park, CA, U.S.A. 2Zoom AI, San Jose, CA, U.S.A. 3Facebook AI Research, Paris, France 4Facebook, Menlo Park, CA, U.S.A. Currently at Apple, Cupertino, CA, U.S.A.
Pseudocode Yes Listing 1. The Tensor Adapter interface for implementing operations on tensor metadata and storing tensor state for individual tensor instances. Listing 2. The Tensor Backend interface for implementing operations on tensors and storing global backend state. Listing 3. An implementation of a memory manager using the memory management API. Listing 4. Defining a cosine autograd operator in Flashlight using TENSOR operations and VARIABLE. Listing 5. Part of Flashlight s distributed computation API. Listing 6. A Dropout layer implemented as a Flashlight module. Listing 7. Loading MNIST data into a train and evaluation set. Listing 8. Constructing a CNN for MNIST training. Listing 9. A simple training loop. Listing 10. Evaluating a training model on MNIST. Listing 11. Evaluating a training model on MNIST.
Open Source Code Yes Flashlight is available at this URL. Flashlight code can be found on Github at https://github.com/flashlight/flashlight.
Open Datasets Yes Flashlight offers built-in data loaders for standard computer vision benchmarks (such as Image Net (Deng et al., 2009) and COCO (Lin et al., 2014)) along with large set of efficient data-augmentations and transformations. Listing 7. Loading MNIST data into a train and evaluation set.
Dataset Splits Yes const int k Train Size = 60000; const int k Val Size = 5000; auto val_x = train_x(span, span, range(0, k Val Size)); train_x = train_x(span, span, range(k Val Size, k Train Size)); auto val_y = train_y(range(0, k Val Size)); train_y = train_y(range(k Val Size, k Train Size)); Make the training batch dataset Batch Dataset trainset(std::make_shared<Tensor Dataset>(std::vector<Tensor>{train_x, train_y}), batch_size); Make the validation batch dataset Batch Dataset valset(std::make_shared<Tensor Dataset>(std::vector<Tensor>{val_x, val_y}), batch_size);
Hardware Specification Yes Times were measured for both from-scratch and incremental builds with Intel Xeon Gold 6138 CPUs with 80 cores and 750 GB of memory. Benchmarks are performed on Intel E5-2698 CPUs with 512GB of RAM, and NVIDIA V100-32GB GPUs in a DGX1 server. Inter-GPU interconnects in the 8 GPUs (1 node) setting are Nvidia NVLink-based.
Software Dependencies Yes Flashlight v0.3.1 is used to reproduce results using Array Fire 3.8 as the underlying tensor backend.
Experiment Setup Yes Listing 7. Loading MNIST data into a train and evaluation set. Listing 8. Constructing a CNN for MNIST training. Listing 9. A simple training loop. const int k Image Dim = 28; auto pad = Padding Mode::SAME; Sequential model; model.add(View({k Image Dim, k Image Dim, 1, -1})); model.add(Conv2D(1 /* input channels */, 32 /* output channels */, 5 /* kernel width */, 5 /* kernel height */, 1 /* stride x */, 1 /* stride y */, pad /* padding mode */, pad /* padding mode */)); model.add(Re LU()); model.add(Pool2D(2 /* kernel width */, 2 /* kernel height */, 2 /* stride x */, 2 /* stride y */)); // Make the optimizer SGDOptimizer opt(model.params(), learning_rate); // The main training loop for (int e = 0; e < epochs; e++)