| A Box-Constrained Approach for Hard Permutation Problems |
β
|
β |
β
|
β |
β |
β
|
β
|
4 |
| A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation |
β |
β |
β
|
β
|
β |
β |
β |
2 |
| A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| A Convolutional Attention Network for Extreme Summarization of Source Code |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| A Deep Learning Approach to Unsupervised Ensemble Learning |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| A Kernel Test of Goodness of Fit |
β |
β
|
β |
β |
β |
β |
β
|
2 |
| A Kernelized Stein Discrepancy for Goodness-of-fit Tests |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| A Kronecker-factored approximate Fisher matrix for convolution layers |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| A Neural Autoregressive Approach to Collaborative Filtering |
β |
β
|
β
|
β
|
β
|
β |
β
|
5 |
| A New PAC-Bayesian Perspective on Domain Adaptation |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| A Random Matrix Approach to Echo-State Neural Networks |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization |
β
|
β
|
β
|
β |
β
|
β
|
β
|
6 |
| A Simple and Provable Algorithm for Sparse Diagonal CCA |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| A Simple and Strongly-Local Flow-Based Method for Cut Improvement |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| A Subspace Learning Approach for High Dimensional Matrix Decomposition with Efficient Column/Row Sampling |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| A Theory of Generative ConvNet |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| A Variational Analysis of Stochastic Gradient Algorithms |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| A ranking approach to global optimization |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| ADIOS: Architectures Deep In Output Space |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Accurate Robust and Efficient Error Estimation for Decision Trees |
β |
β |
β
|
β
|
β |
β |
β |
2 |
| Actively Learning Hemimetrics with Applications to Eliciting User Preferences |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Adaptive Algorithms for Online Convex Optimization with Long-term Constraints |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Adaptive Sampling for SGD by Exploiting Side Information |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Additive Approximations in High Dimensional Nonparametric Regression via the SALSA |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Algorithms for Optimizing the Ratio of Submodular Functions |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| An optimal algorithm for the Thresholding Bandit Problem |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Analysis of Deep Neural Networks with Extended Data Jacobian Matrix |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Analysis of Variational Bayesian Factorizations for Sparse and Low-Rank Estimation |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Anytime Exploration for Multi-armed Bandits using Confidence Information |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Anytime optimal algorithms in stochastic multi-armed bandits |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Ask Me Anything: Dynamic Memory Networks for Natural Language Processing |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Associative Long Short-Term Memory |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Asymmetric Multi-task Learning Based on Task Relatedness and Loss |
β
|
β |
β
|
β
|
β |
β |
β |
3 |
| Asynchronous Methods for Deep Reinforcement Learning |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification |
β |
β
|
β
|
β
|
β
|
β |
β
|
5 |
| Autoencoding beyond pixels using a learned similarity metric |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Automatic Construction of Nonparametric Relational Regression Models for Multiple Time Series |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Auxiliary Deep Generative Models |
β |
β
|
β
|
β
|
β
|
β |
β
|
5 |
| BASC: Applying Bayesian Optimization to the Search for Global Minima on Potential Energy Surfaces |
β |
β
|
β
|
β |
β |
β
|
β
|
4 |
| BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Barron and Coverβs Theory in Supervised Learning and its Application to Lasso |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Benchmarking Deep Reinforcement Learning for Continuous Control |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Beyond CCA: Moment Matching for Multi-View Models |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Beyond Parity Constraints: Fourier Analysis of Hash Functions for Inference |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Bidirectional Helmholtz Machines |
β
|
β
|
β
|
β |
β
|
β |
β
|
5 |
| Binary embeddings with structured hashed projections |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Black-Box Alpha Divergence Minimization |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Black-box Optimization with a Politician |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Boolean Matrix Factorization and Noisy Completion via Message Passing |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Bounded Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Clustering High Dimensional Categorical Data via Topographical Features |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Collapsed Variational Inference for Sum-Product Networks |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Community Recovery in Graphs with Locality |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| Complex Embeddings for Simple Link Prediction |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Compressive Spectral Clustering |
β
|
β
|
β
|
β |
β
|
β
|
β
|
6 |
| Computationally Efficient NystrΓΆm Approximation using Fast Transforms |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Conditional Bernoulli Mixtures for Multi-label Classification |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Conservative Bandits |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Contextual Combinatorial Cascading Bandits |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Continuous Deep Q-Learning with Model-based Acceleration |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Control of Memory, Active Perception, and Action in Minecraft |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Controlling the distance to a Kemeny consensus without computing it |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Convergence of Stochastic Gradient Descent for PCA |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Convolutional Rectifier Networks as Generalized Tensor Decompositions |
β |
β |
β |
β |
β |
β |
β |
0 |
| Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Correcting Forecasts with Multifactor Neural Attention |
β |
β |
β |
β
|
β |
β |
β
|
2 |
| Correlation Clustering and Biclustering with Locally Bounded Errors |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Cross-Graph Learning of Multi-Relational Associations |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control |
β
|
β
|
β |
β |
β |
β |
β
|
3 |
| DCM Bandits: Learning to Rank with Multiple Clicks |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Data-driven Rank Breaking for Efficient Rank Aggregation |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Dealbreaker: A Nonlinear Latent Variable Model for Educational Data |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| Deconstructing the Ladder Network Architecture |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Deep Gaussian Processes for Regression using Approximate Expectation Propagation |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| Deep Structured Energy Based Models for Anomaly Detection |
β |
β |
β
|
β
|
β |
β |
β |
2 |
| Dictionary Learning for Massive Matrix Factorization |
β
|
β
|
β
|
β
|
β
|
β |
β
|
6 |
| Differential Geometric Regularization for Supervised Learning of Classifiers |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Differentially Private Policy Evaluation |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Discrete Deep Feature Extraction: A Theory and New Architectures |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Discrete Distribution Estimation under Local Privacy |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Discriminative Embeddings of Latent Variable Models for Structured Data |
β
|
β
|
β
|
β
|
β
|
β |
β |
5 |
| Distributed Clustering of Linear Bandits in Peer to Peer Networks |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Diversity-Promoting Bayesian Learning of Latent Variable Models |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Domain Adaptation with Conditional Transferable Components |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Doubly Decomposing Nonparametric Tensor Regression |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Doubly Robust Off-policy Value Evaluation for Reinforcement Learning |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Dropout distillation |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Dueling Network Architectures for Deep Reinforcement Learning |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Dynamic Capacity Networks |
β |
β |
β
|
β
|
β
|
β |
β
|
4 |
| Dynamic Memory Networks for Visual and Textual Question Answering |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Early and Reliable Event Detection Using Proximity Space Representation |
β
|
β
|
β
|
β
|
β
|
β |
β
|
6 |
| Efficient Algorithms for Adversarial Contextual Learning |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Efficient Multi-Instance Learning for Activity Recognition from Time Series Data Using an Auto-Regressive Hidden Markov Model |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Efficient Private Empirical Risk Minimization for High-dimensional Learning |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Energetic Natural Gradient Descent |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| Epigraph projections for fast general convex programming |
β
|
β
|
β
|
β |
β |
β |
β |
3 |
| Estimating Accuracy from Unlabeled Data: A Bayesian Approach |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Estimating Cosmological Parameters from the Dark Matter Distribution |
β |
β |
β |
β
|
β
|
β |
β
|
3 |
| Estimating Maximum Expected Value through Gaussian Approximation |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Estimating Structured Vector Autoregressive Models |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Estimation from Indirect Supervision with Linear Moments |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Evasion and Hardening of Tree Ensemble Classifiers |
β
|
β |
β
|
β |
β
|
β
|
β
|
5 |
| Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Exact Exponent in Optimal Rates for Crowdsourcing |
β |
β |
β |
β |
β |
β |
β |
0 |
| Experimental Design on a Budget for Sparse Linear Models and Applications |
β
|
β
|
β
|
β |
β
|
β |
β
|
5 |
| Exploiting Cyclic Symmetry in Convolutional Neural Networks |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Expressiveness of Rectifier Networks |
β |
β |
β |
β
|
β |
β |
β
|
2 |
| Extended and Unscented Kitchen Sinks |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Extreme F-measure Maximization using Sparse Probability Estimates |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Factored Temporal Sigmoid Belief Networks for Sequence Learning |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Fast Algorithms for Segmented Regression |
β
|
β |
β |
β |
β
|
β
|
β
|
4 |
| Fast Constrained Submodular Maximization: Personalized Data Summarization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Fast DPP Sampling for Nystrom with Application to Kernel Methods |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Fast Parameter Inference in Nonlinear Dynamical Systems using Iterative Gradient Matching |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Fast Rate Analysis of Some Stochastic Optimization Algorithms |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Fast k-means with accurate bounds |
β |
β
|
β
|
β |
β
|
β |
β |
3 |
| Fast methods for estimating the Numerical rank of large matrices |
β
|
β
|
β
|
β |
β
|
β |
β |
4 |
| Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Faster Eigenvector Computation via Shift-and-Invert Preconditioning |
β |
β |
β |
β |
β |
β |
β |
0 |
| Fixed Point Quantization of Deep Convolutional Networks |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| ForecastICU: A Prognostic Decision Support System for Timely Prediction of Intensive Care Unit Admission |
β |
β |
β |
β
|
β |
β |
β
|
2 |
| From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Gaussian process nonparametric tensor estimator and its minimax optimality |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Gaussian quadrature for matrix inverse forms with applications |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Generalization Properties and Implicit Regularization for Multiple Passes SGM |
β
|
β
|
β
|
β
|
β
|
β |
β
|
6 |
| Generalization and Exploration via Randomized Value Functions |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Generalized Direct Change Estimation in Ising Model Structure |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Generative Adversarial Text to Image Synthesis |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Geometric Mean Metric Learning |
β
|
β |
β
|
β
|
β
|
β
|
β
|
6 |
| Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Graying the black box: Understanding DQNs |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Greedy Column Subset Selection: New Bounds and Distributed Algorithms |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Gromov-Wasserstein Averaging of Kernel and Distance Matrices |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Group Equivariant Convolutional Networks |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Hawkes Processes with Stochastic Excitations |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Heteroscedastic Sequences: Beyond Gaussianity |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Hierarchical Compound Poisson Factorization |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Hierarchical Decision Making In Electricity Grid Management |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Hierarchical Span-Based Conditional Random Fields for Labeling and Segmenting Events in Wearable Sensor Data Streams |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Hierarchical Variational Models |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Horizontally Scalable Submodular Maximization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| How to Fake Multiply by a Gaussian Matrix |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Hyperparameter optimization with approximate gradient |
β
|
β
|
β
|
β
|
β |
β |
β
|
5 |
| Importance Sampling Tree for Large-scale Empirical Expectation |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Inference Networks for Sequential Monte Carlo in Graphical Models |
β |
β |
β
|
β |
β |
β |
β |
1 |
| Interacting Particle Markov Chain Monte Carlo |
β
|
β
|
β |
β |
β |
β |
β
|
3 |
| Interactive Bayesian Hierarchical Clustering |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Isotonic Hawkes Processes |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| K-Means Clustering with Distributed Dimensions |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| L1-regularized Neural Networks are Improperly Learnable in Polynomial Time |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Large-Margin Softmax Loss for Convolutional Neural Networks |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Learning Convolutional Neural Networks for Graphs |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Learning End-to-end Video Classification with Rank-Pooling |
β |
β |
β
|
β
|
β
|
β |
β
|
4 |
| Learning Granger Causality for Hawkes Processes |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Learning Mixtures of Plackett-Luce Models |
β
|
β |
β |
β |
β
|
β
|
β
|
4 |
| Learning Physical Intuition of Block Towers by Example |
β |
β
|
β |
β
|
β |
β |
β
|
3 |
| Learning Population-Level Diffusions with Generative RNNs |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Learning Representations for Counterfactual Inference |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Learning Simple Algorithms from Examples |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Learning Sparse Combinatorial Representations via Two-stage Submodular Maximization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Learning and Inference via Maximum Inner Product Search |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Learning from Multiway Data: Simple and Efficient Tensor Regression |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Learning privately from multiparty data |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Learning to Filter with Predictive State Inference Machines |
β
|
β |
β |
β
|
β |
β |
β
|
3 |
| Learning to Generate with Memory |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Linking losses for density ratio and class-probability estimation |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Loss factorization, weakly supervised learning and label noise robustness |
β
|
β |
β |
β
|
β |
β |
β
|
3 |
| Low-Rank Matrix Approximation with Stability |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Low-rank Solutions of Linear Matrix Equations via Procrustes Flow |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Low-rank tensor completion: a Riemannian manifold preconditioning approach |
β |
β
|
β
|
β
|
β
|
β |
β
|
5 |
| Markov Latent Feature Models |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Markov-modulated Marked Poisson Processes for Check-in Data |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Matrix Eigen-decomposition via Doubly Stochastic Riemannian Optimization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Meta-Learning with Memory-Augmented Neural Networks |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Metadata-conscious anonymous messaging |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| MetaβGradient Boosted Decision Tree Model for Weight and Target Learning |
β
|
β |
β |
β
|
β |
β |
β
|
3 |
| Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Minimizing the Maximal Loss: How and Why |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Minimum Regret Search for Single- and Multi-Task Optimization |
β |
β
|
β |
β |
β |
β |
β
|
2 |
| Mixing Rates for the Alternating Gibbs Sampler over Restricted Boltzmann Machines and Friends |
β |
β |
β |
β |
β |
β |
β |
0 |
| Mixture Proportion Estimation via Kernel Embeddings of Distributions |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Model-Free Imitation Learning with Policy Optimization |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Model-Free Trajectory Optimization for Reinforcement Learning |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Multi-Bias Non-linear Activation in Deep Neural Networks |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Multi-Player Bandits β a Musical Chairs Approach |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Near Optimal Behavior via Approximate State Abstraction |
β |
β
|
β |
β |
β |
β |
β
|
2 |
| Network Morphism |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Neural Variational Inference for Text Processing |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| No Oops, You Wonβt Do It Again: Mechanisms for Self-correction in Crowdsourcing |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| No penalty no tears: Least squares in high-dimensional linear models |
β
|
β |
β |
β
|
β |
β |
β
|
3 |
| No-Regret Algorithms for Heavy-Tailed Linear Bandits |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Noisy Activation Functions |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Non-negative Matrix Factorization under Heavy Noise |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| Nonlinear Statistical Learning with Truncated Gaussian Graphical Models |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Nonparametric Canonical Correlation Analysis |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks |
β |
β |
β
|
β
|
β
|
β |
β
|
4 |
| On Graduated Optimization for Stochastic Non-Convex Problems |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| On collapsed representation of hierarchical Completely Random Measures |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| On the Consistency of Feature Selection With Lasso for Non-linear Targets |
β |
β |
β |
β |
β |
β |
β
|
1 |
| On the Iteration Complexity of Oblivious First-Order Optimization Algorithms |
β
|
β |
β |
β |
β |
β |
β |
1 |
| On the Power and Limits of Distance-Based Learning |
β |
β |
β |
β |
β |
β |
β |
0 |
| On the Quality of the Initial Basin in Overspecified Neural Networks |
β |
β |
β |
β |
β |
β |
β |
0 |
| On the Statistical Limits of Convex Relaxations |
β |
β |
β |
β |
β |
β |
β |
0 |
| One-Shot Generalization in Deep Generative Models |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Online Learning with Feedback Graphs Without the Graphs |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Online Low-Rank Subspace Clustering by Basis Dictionary Pursuit |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Online Stochastic Linear Optimization under One-bit Feedback |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Opponent Modeling in Deep Reinforcement Learning |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Optimal Classification with Multivariate Losses |
β
|
β |
β
|
β |
β |
β |
β |
2 |
| Optimality of Belief Propagation for Crowdsourced Classification |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| PAC Lower Bounds and Efficient Algorithms for The Max K-Armed Bandit Problem |
β
|
β |
β |
β |
β |
β |
β |
1 |
| PAC learning of Probabilistic Automaton based on the Method of Moments |
β
|
β |
β
|
β |
β |
β
|
β
|
4 |
| PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| PHOG: Probabilistic Model for Code |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms |
β
|
β |
β
|
β |
β
|
β |
β
|
4 |
| Parameter Estimation for Generalized Thurstone Choice Models |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Pareto Frontier Learning with Expensive Correlated Objectives |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Partition Functions from Rao-Blackwellized Tempered Sampling |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Persistence weighted Gaussian kernel for topological data analysis |
β |
β |
β |
β
|
β |
β |
β
|
2 |
| Persistent RNNs: Stashing Recurrent Weights On-Chip |
β |
β
|
β |
β |
β
|
β |
β
|
3 |
| Pixel Recurrent Neural Networks |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Pliable Rejection Sampling |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Power of Ordered Hypothesis Testing |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Preconditioning Kernel Matrices |
β
|
β
|
β
|
β
|
β
|
β |
β
|
6 |
| Predictive Entropy Search for Multi-objective Bayesian Optimization |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Pricing a Low-regret Seller |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Primal-Dual Rates and Certificates |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Principal Component Projection Without Principal Component Analysis |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Provable Algorithms for Inference in Topic Models |
β
|
β
|
β |
β |
β |
β |
β
|
3 |
| Provable Non-convex Phase Retrieval with Outliers: Median TruncatedWirtinger Flow |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Quadratic Optimization with Orthogonality Constraints: Explicit Lojasiewicz Exponent and Linear Convergence of Line-Search Methods |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Recommendations as Treatments: Debiasing Learning and Evaluation |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Recovery guarantee of weighted low-rank approximation via alternating minimization |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Recurrent Orthogonal Networks and Long-Memory Tasks |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Recycling Randomness with Structure for Sublinear time Kernel Expansions |
β |
β |
β
|
β |
β
|
β
|
β
|
4 |
| Representational Similarity Learning with Application to Brain Networks |
β |
β |
β |
β
|
β |
β |
β
|
2 |
| Revisiting Semi-Supervised Learning with Graph Embeddings |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Rich Component Analysis |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Robust Monte Carlo Sampling using Riemannian NosΓ©-PoincarΓ© Hamiltonian Dynamics |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Robust Principal Component Analysis with Side Information |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Robust Random Cut Forest Based Anomaly Detection on Streams |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| SDCA without Duality, Regularization, and Individual Convexity |
β
|
β |
β |
β |
β |
β |
β |
1 |
| SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Scalable Discrete Sampling as a Multi-Armed Bandit Problem |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters |
β |
β
|
β
|
β
|
β |
β |
β
|
4 |
| Sequence to Sequence Training of CTC-RNNs with Partial Windowing |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Shifting Regret, Mirror Descent, and Matrices |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling |
β |
β
|
β
|
β |
β
|
β |
β
|
4 |
| Slice Sampling on Hamiltonian Trajectories |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Smooth Imitation Learning for Online Sequence Prediction |
β
|
β
|
β
|
β |
β |
β |
β |
3 |
| Softened Approximate Policy Iteration for Markov Games |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Solving Ridge Regression using Sketched Preconditioned SVRG |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Sparse Nonlinear Regression: Parameter Estimation under Nonconvexity |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Sparse Parameter Recovery from Aggregated Data |
β |
β |
β
|
β
|
β |
β |
β |
2 |
| Speeding up k-means by approximating Euclidean distances via block vectors |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies |
β |
β |
β
|
β |
β
|
β |
β
|
3 |
| Stability of Controllers for Gaussian Process Forward Models |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| Starting Small - Learning with Adaptive Sample Sizes |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Stochastic Block BFGS: Squeezing More Curvature out of Data |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| Stochastic Discrete Clenshaw-Curtis Quadrature |
β
|
β
|
β |
β |
β
|
β |
β
|
4 |
| Stochastic Optimization for Multiview Representation Learning using Partial Least Squares |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Stochastic Quasi-Newton Langevin Monte Carlo |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Stochastic Variance Reduction for Nonconvex Optimization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues |
β |
β |
β |
β |
β |
β |
β |
0 |
| Stratified Sampling Meets Machine Learning |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Strongly-Typed Recurrent Neural Networks |
β |
β |
β
|
β
|
β
|
β |
β
|
4 |
| Structure Learning of Partitioned Markov Networks |
β |
β |
β
|
β |
β |
β |
β |
1 |
| Structured Prediction Energy Networks |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings |
β |
β
|
β
|
β
|
β
|
β |
β
|
5 |
| Tensor Decomposition via Joint Matrix Schur Decomposition |
β |
β |
β
|
β |
β |
β |
β |
1 |
| Texture Networks: Feed-forward Synthesis of Textures and Stylized Images |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| The Arrow of Time in Multivariate Time Series |
β
|
β
|
β
|
β |
β |
β |
β
|
4 |
| The Information Sieve |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| The Information-Theoretic Requirements of Subspace Clustering with Missing Data |
β
|
β |
β |
β |
β |
β |
β |
1 |
| The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| The Label Complexity of Mixed-Initiative Classifier Training |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| The Segmented iHMM: A Simple, Efficient Hierarchical Infinite HMM |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| The Sum-Product Theorem: A Foundation for Learning Tractable Models |
β
|
β |
β |
β |
β |
β |
β
|
2 |
| The Teaching Dimension of Linear Learners |
β |
β |
β |
β |
β |
β
|
β
|
2 |
| The Variational Nystrom method for large-scale spectral problems |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| The knockoff filter for FDR control in group-sparse and multitask regression |
β |
β |
β
|
β |
β |
β
|
β
|
3 |
| Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient |
β
|
β |
β |
β |
β |
β |
β |
1 |
| Train and Test Tightness of LP Relaxations in Structured Prediction |
β |
β |
β
|
β
|
β |
β |
β |
2 |
| Train faster, generalize better: Stability of stochastic gradient descent |
β |
β |
β
|
β
|
β |
β |
β
|
3 |
| Training Deep Neural Networks via Direct Loss Minimization |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Training Neural Networks Without Gradients: A Scalable ADMM Approach |
β
|
β |
β
|
β
|
β
|
β |
β
|
5 |
| Truthful Univariate Estimators |
β |
β |
β |
β |
β |
β |
β |
0 |
| Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Unitary Evolution Recurrent Neural Networks |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Unsupervised Deep Embedding for Clustering Analysis |
β |
β
|
β
|
β |
β |
β |
β
|
3 |
| Uprooting and Rerooting Graphical Models |
β |
β |
β |
β |
β |
β
|
β
|
2 |
| Variable Elimination in the Fourier Domain |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| Variance Reduction for Faster Non-Convex Optimization |
β
|
β |
β
|
β
|
β |
β |
β
|
4 |
| Variance-Reduced and Projection-Free Stochastic Optimization |
β
|
β |
β
|
β |
β |
β |
β
|
3 |
| Variational Inference for Monte Carlo Objectives |
β
|
β |
β
|
β
|
β |
β |
β |
3 |
| Why Most Decisions Are Easy in TetrisβAnd Perhaps in Other Sequential Decision Problems, As Well |
β |
β |
β |
β |
β |
β |
β
|
1 |
| Why Regularized Auto-Encoders learn Sparse Representation? |
β |
β |
β
|
β |
β |
β |
β
|
2 |
| k-variates++: more pluses in the k-means++ |
β
|
β |
β |
β |
β |
β |
β
|
2 |