Awesome Papers: 2016-12-2

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein,Eric A Weiss,Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability.The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.



Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Yann N. Dauphin,Razvan Pascanu,Caglar Gulcehre,Kyunghyun Cho,Surya Ganguli,Yoshua Bengio

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.


对许多科学和工程领域的挑战主要在于使连续的高维空间上的非凸误差函数最小化。梯度下降或quasi-Newton方法几乎无处不在地用于执行这种最小化,并且通常认为这些局部方法找到全局最小值的主要困难源是局部最小值的增殖,它具有比全局最小值高得多的误差。在这里我们认为,基于统计物理,随机矩阵理论,神经网络理论和经验证据的结果,更深层更大的困难源于鞍点的扩散,而不是局部最小值,特别是在实际感兴趣的高维度问题。这些鞍点被高的错误平台包围,可以显着地减慢学习,并给出存在局部最小值的假象。受这些论点的影响,我们提出了一种新的二阶优化方法,即saddle-free Newton方法,与梯度下降法和quasi-Newton不同,它可以快速逃离高维鞍点。我们将这种算法应用于深度或递归神经网络训练,并为其卓越的优化性能提供数值证据。

Exponential expressivity in deep neural networks through transient chaos

Ben Poole,Subhaneil Lahiri,Maithra Raghu,Jascha Sohl-Dickstein,Surya Ganguli

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this generic class of deep random functions cannot be efficiently computed by any shallow network, going beyond prior work restricted to the analysis of single functions. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.




Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.


我们将构成深度Q学习的成功基础的想法适应于连续行动领域。 我们提出一个可以在连续动作空间上操作的基于确定性策略梯度的actor-critic、无模型算法。 通过使用相同的学习算法、网络架构和hyper参数,我们的算法很好地解决了超过20个模拟物理任务,其中包括经典问题,例如手推车摆动,灵巧操纵,有腿运动和汽车驾驶。 我们的算法能够找到其性能与通过计划算法发现的内容相媲美的策略,其具有对域和其导数动态的完全访问权限。 我们进一步证明,对于许多任务,算法可以学习“端到端”策略:直接从原始像素输入。


John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(λ). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.


策略梯度方法是强化学习中的一种很吸引人的方法,因为它们直接优化累积的回报,并且可以直接用于诸如神经网络的非线性近似函数。两个主要的挑战是通常需要大量的样本,并且尽管忽略输入数据的非稳定性仍然难以获得稳定和稳步改进。我们通过使用价值函数以一些偏差为代价来大大减少策略梯度估计的方差,利用与TD(λ)类似的优势函数的指数加权估计器来解决第一挑战。我们通过对由神经网络代表的策略和值函数使用信任区域优化进程来解决第二个挑战。 我们的方法在具有高度挑战性的3D移动任务、学习双足和四足模拟机器人的跑步步态,以及学习两足动物从开始躺在地上到站立起来的策略方面产生了强大的经验结果。与之前使用手动策略表征的工作相反,我们的神经网络策略直接从原始运动学映射到联合扭矩。我们的算法是完全无模型的,并且3D三足动物学习任务所需的模拟体验量对应于实时的1-2周。

Auto-Encoding Variational Bayes

Diederik P. Kingma,Max Welling

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.



Improving Variational Inference with Inverse Autoregressive Flow

Diedrik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling

We propose a simple and scalable method for improving the flexibility of variational inference through a transformation with autoregressive networks. Autoregressive networks, such as RNNs and MADE, are very powerful models; however, ancestral sampling in such networks is a sequential operation, therefore unappealing for direct use as approximate posteriors in variational inference on parallel hardware such as GPUs. We find that by inverting autoregressive networks we can obtain equally powerful data transformations that can often be computed in parallel. We show that such data transformations, inverse autoregressive flows (IAF), can be used to transform a simple distribution over the latent variables into a much more flexible distribution, while still allowing us to compute the resulting variables’ probability density function. The method is simple to implement, can be made arbitrarily flexible, and (in contrast with previous work) is naturally applicable to latent variables that are organized in multidimensional tensors, such as 2D grids or time series. The method is applied to a novel deep architecture of variational auto-encoders. In experiments we demonstrate that autoregressive flow leads to significant performance gains when applied to variational autoencoders for natural images.



Sample Efficient Actor-Critic with Experience Replay

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.


本文提出了一个具有经验回放的Actor-Critic深度强化学习代理,它具有稳定高效的样本,并且在具有挑战性的环境(包括离散的57个游戏Atari域和几个连续控制问题)中表现出色。 为了实现这一点,本文介绍了几个新概念,包括截断重要性采样偏差校正,随机决斗网络架构和新的信任区域策略优化方法。

Reparameterization trick for discrete variables

Seiya Tokui, Issei sato

Low-variance gradient estimation is crucial for learning directed graphical models parameterized by neural networks, where the reparameterization trick is widely used for those with continuous variables. While this technique gives low-variance gradient estimates, it has not been directly applicable to discrete variables, the sampling of which inherently requires discontinuous operations. We argue that the discontinuity can be bypassed by marginalizing out the variable of interest, which results in a new reparameterization trick for discrete variables. This reparameterization greatly reduces the variance, which is understood by regarding the method as an application of common random numbers to the estimation. The resulting estimator is theoretically guaranteed to have a variance not larger than that of the likelihood-ratio method with the optimal input-dependent baseline. We give empirical results for variational learning of sigmoid belief networks.



Multi-task learning with deep model based reinforcement learning

Asier Mujika

In recent years, model-free methods that use deep learning have achieved great success in many different reinforcement learning environments. Most successful approaches focus on solving a single task, while multi-task reinforcement learning remains an open problem. In this paper, we present a model based approach to deep reinforcement learning which we use to solve different tasks simultaneously. We show that our approach not only does not degrade but actually benefits from learning multiple tasks. For our model, we also present a new kind of recurrent neural network inspired by residual networks that decouples memory from computation allowing to model complex environments that do not require lots of memory.


近年来,使用深度学习的无模型方法在许多不同的强化学习环境中取得了巨大的成功。 最成功的方法集中于解决单个任务,而多任务强化学习仍然是一个开放的问题。在本文中,我们提出一个基于深度强化学习方法的模型,同时我们用它来同时解决不同的任务。结果表明,我们的方法不仅不会降低,而且实际上从学习多个任务中受益。对于我们的模型,我们还提出了一种新型的递归神经网络,灵感来自残余网络,将计算中的内存解耦,从而允许在不需要大量内存的复杂环境下建模。




谷歌开源Embedding Projector,可将高维数据可视化

链接:谷歌开源Embedding Projector,可将高维数据可视化



由Yoshua Bengio授权北京大学张志华老师团队负责翻译的《Deep Learning》中文版发布

链接:《Deep Learning》中文版发布


《Experiments in Handwriting with a Neural Network》

链接:Experiments in Handwriting with a Neural Network


Logistic Regression 模型简介

逻辑回归(Logistic Regression)是机器学习中的一种分类模型,由于算法的简单和高效,在实际中应用非常广泛。本文作为美团机器学习InAction系列中的一篇,主要关注逻辑回归算法的数学模型和参数求解方法,最后也会简单讨论下逻辑回归和贝叶斯分类的关系,以及在多分类问题上的推广。

链接:Logistic Regression 模型简介




National University of Defense Tecnology
Changsha, Hunan 410073