Deepcode: Feedback Codes via Deep Learning

Authors: Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we present the first family of codes obtained via deep learning, which significantly beats state-of-the-art codes designed over several decades of research. Our experiments focus on the setting of rate 1/3 and information block length of 50 for concreteness.
Researcher Affiliation Collaboration Hyeji Kim , Yihan Jiang , Sreeram Kannan , Sewoong Oh , Pramod Viswanath Samsung AI Centre Cambridge*, University of Washington , University of Illinois at Urbana Champaign
Pseudocode No No pseudocode or algorithm block is explicitly labeled or provided.
Open Source Code Yes Source codes are available under https://github.com/hyejikim1/feedback_code (Keras) and https://github.com/yihanjiang/feedback_code (Py Torch).
Open Datasets No The paper states that 'AWGN channels are simulated for the channels from the encoder to the decoder and from decoder to the encoder.' This indicates that data for training was simulated, not obtained from a publicly available dataset with concrete access information.
Dataset Splits No No specific train/validation/test dataset splits (e.g., percentages or counts) are provided. The paper mentions training 'over 4x10^6 examples' generated via simulation, but does not specify how these are partitioned into distinct validation sets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup.
Software Dependencies No The paper mentions 'Keras' and 'Py Torch' in relation to the source code availability, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Both the encoder and decoder are trained jointly using binary cross-entropy as the loss function over 4 106 examples, with batch size 200, via an Adam optimizer (β1=0.9, β2=0.999, =1e-8). During the training, we let K = 100. We also use a decaying learning rate and gradient clipping; we reduce the learning rate by 10 times after training with 106 examples, starting from 0.02. Gradients are clipped to 1 if L2 norm of the gradient exceeds 1.