AutoChunk: Automated Activation Chunk for Memory-Efficient Deep Learning Inference

Authors: Xuanlei Zhao, Shenggan Cheng, Guangyang LU, Haotian Zhou, Bin Jia, Yang You

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments demonstrate that Auto Chunk can reduce over 80% of activation memory while maintaining speed loss within 10%, extend max sequence length by 3.2x to 11.7x, and outperform state-of-the-art methods by a large margin.
Researcher Affiliation Collaboration Xuanlei Zhao1 , Shenggan Cheng1, Guangyang Lu2, Haotian Zhou2, Bin Jia2, Yang You1 1National University of Singapore 2HPC-AI Technology Inc.
Pseudocode Yes Algorithm 1: Auto Chunk s chunk search algorithm
Open Source Code No The paper does not provide any explicit statement about releasing its source code for Auto Chunk, nor does it include a link to a code repository.
Open Datasets No The paper mentions models like GPT, ViT, AlphaFold, and UNet but does not specify the datasets used for the experiments, nor does it provide any information on their public availability or access (e.g., links, citations for standard datasets).
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification Yes All experiments are carried out on the NVIDIA Tesla A100 80GB platform with Pytorch.
Software Dependencies No The paper mentions "Pytorch" as a software component but does not specify its version number or any other software dependencies with their versions.
Experiment Setup No The paper states that "The hyper parameters of the cost functions in Equations 8 and 9 are automatically tuned." but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings.