Awesome Papers: 2016-12-2

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein,Eric A Weiss,Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability.The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.

使用不平衡热力学的深度无监督学习

机器学习中的一个核心问题包括使用高度灵活的概率分布族对复杂数据集建模,其中学习,抽样,推理和评估仍然是分析或计算易处理的。我们开发了一种同时实现灵活性和易处理性的方法。受到非平衡统计物理学的启发,基本思想是通过迭代正向扩散过程在数据分布中系统和缓慢地破坏结构。然后我们学习一个反向扩散过程,它能恢复数据中的结构,产生一个高度灵活和易于处理的数据生成模型。这种方法允许我们在具有数千层或时间步长的深度生成模型中快速学习,采样和评估概率,以及在学习模型下计算条件和后验概率。我们还另外发布了一个算法的开源参考实现。


Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Yann N. Dauphin,Razvan Pascanu,Caglar Gulcehre,Kyunghyun Cho,Surya Ganguli,Yoshua Bengio

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.

识别和攻击高维非凸优化中的鞍点问题

对许多科学和工程领域的挑战主要在于使连续的高维空间上的非凸误差函数最小化。梯度下降或quasi-Newton方法几乎无处不在地用于执行这种最小化,并且通常认为这些局部方法找到全局最小值的主要困难源是局部最小值的增殖,它具有比全局最小值高得多的误差。在这里我们认为,基于统计物理,随机矩阵理论,神经网络理论和经验证据的结果,更深层更大的困难源于鞍点的扩散,而不是局部最小值,特别是在实际感兴趣的高维度问题。这些鞍点被高的错误平台包围,可以显着地减慢学习,并给出存在局部最小值的假象。受这些论点的影响,我们提出了一种新的二阶优化方法,即saddle-free Newton方法,与梯度下降法和quasi-Newton不同,它可以快速逃离高维鞍点。我们将这种算法应用于深度或递归神经网络训练,并为其卓越的优化性能提供数值证据。


Exponential expressivity in deep neural networks through transient chaos

Ben Poole,Subhaneil Lahiri,Maithra Raghu,Jascha Sohl-Dickstein,Surya Ganguli

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this generic class of deep random functions cannot be efficiently computed by any shallow network, going beyond prior work restricted to the analysis of single functions. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

利用瞬态混沌在深度神经网络中的指数表达

我们将黎曼几何与高维混沌的平均场理论相结合,来研究在具有随机权重的通用、深度神经网络中信号传播的性质。我们的结果揭示了秩序到混沌表达性的相变,网络在混沌阶段计算非线性函数,其总体曲率随深度而不是宽度呈指数增长。我们证明这种深度随机函数的泛类不能被任何浅层网络有效地计算,而且超越了之前限于单个函数分析的工作。此外,我们形式化和数量上证明了长期以来的想法,深层网络可以将高度弯曲的输入空间中的形式解构为隐藏空间中的平面形式。我们对深度网络表达力的理论分析广泛适用于任意非线性,并为之前对深度函数几何的抽象概念提供了量化基础。


CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING

Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.

深度强化学习的连续控制

我们将构成深度Q学习的成功基础的想法适应于连续行动领域。 我们提出一个可以在连续动作空间上操作的基于确定性策略梯度的actor-critic、无模型算法。 通过使用相同的学习算法、网络架构和hyper参数,我们的算法很好地解决了超过20个模拟物理任务,其中包括经典问题,例如手推车摆动,灵巧操纵,有腿运动和汽车驾驶。 我们的算法能够找到其性能与通过计划算法发现的内容相媲美的策略,其具有对域和其导数动态的完全访问权限。 我们进一步证明,对于许多任务,算法可以学习“端到端”策略:直接从原始像素输入。


HIGH-DIMENSIONAL CONTINUOUS CONTROL USING GENERALIZED ADVANTAGE ESTIMATION

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(λ). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.

使用通用优化估计的高维连续控制

策略梯度方法是强化学习中的一种很吸引人的方法,因为它们直接优化累积的回报,并且可以直接用于诸如神经网络的非线性近似函数。两个主要的挑战是通常需要大量的样本,并且尽管忽略输入数据的非稳定性仍然难以获得稳定和稳步改进。我们通过使用价值函数以一些偏差为代价来大大减少策略梯度估计的方差,利用与TD(λ)类似的优势函数的指数加权估计器来解决第一挑战。我们通过对由神经网络代表的策略和值函数使用信任区域优化进程来解决第二个挑战。 我们的方法在具有高度挑战性的3D移动任务、学习双足和四足模拟机器人的跑步步态,以及学习两足动物从开始躺在地上到站立起来的策略方面产生了强大的经验结果。与之前使用手动策略表征的工作相反,我们的神经网络策略直接从原始运动学映射到联合扭矩。我们的算法是完全无模型的,并且3D三足动物学习任务所需的模拟体验量对应于实时的1-2周。


Auto-Encoding Variational Bayes

Diederik P. Kingma,Max Welling

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

自动编码变化贝叶斯

我们如何在有向概率模型中,在具有难处理后验分布的连续潜变量和大数据集的情况下执行有效的推理和学习?我们引入一个随机变分推理和学习算法,它可以扩展到大型数据集,在一些温和的可微性条件下,甚至可以很棘手的情况中工作。我们的贡献是双重的。首先,我们展示变分下界的重新参数化产生了下界估计量,它可以使用标准随机梯度方法进行直接优化。其次,我们证明i.i.d.每个数据点具有连续潜变量的数据集,通过使用所提出的下限估计量将近似推理模型(也称为识别模型)拟合到难处理后验,可以使后验推论特别有效。理论优势反映在实验结果中。


Improving Variational Inference with Inverse Autoregressive Flow

Diedrik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling

We propose a simple and scalable method for improving the flexibility of variational inference through a transformation with autoregressive networks. Autoregressive networks, such as RNNs and MADE, are very powerful models; however, ancestral sampling in such networks is a sequential operation, therefore unappealing for direct use as approximate posteriors in variational inference on parallel hardware such as GPUs. We find that by inverting autoregressive networks we can obtain equally powerful data transformations that can often be computed in parallel. We show that such data transformations, inverse autoregressive flows (IAF), can be used to transform a simple distribution over the latent variables into a much more flexible distribution, while still allowing us to compute the resulting variables’ probability density function. The method is simple to implement, can be made arbitrarily flexible, and (in contrast with previous work) is naturally applicable to latent variables that are organized in multidimensional tensors, such as 2D grids or time series. The method is applied to a novel deep architecture of variational auto-encoders. In experiments we demonstrate that autoregressive flow leads to significant performance gains when applied to variational autoencoders for natural images.

使用逆自回归流改进变分推理

我们提出一种简单和可扩展的方法,通过自回归网络的变换提高变分推理的灵活性。自回归网络,例如RNNs和MADE,是非常强大的模型;然而,这些网络中的原始采样是顺序操作,因此对于在并行硬件(诸如GPU)上的变分推理中作为近似后验直接使用是不吸引人的。我们发现,通过反转自回归网络,我们可以获得同样强大的数据转换,通常可以用来并行计算。我们表明,这种数据变换,逆自回归流(IAF)可以用于将潜在变量的简单分布变换为更加灵活的分布,同时仍允许我们计算结果变量的概率密度函数。该方法易于实现,可以任意灵活地实施,并且(与先前的工作相反)自然地适用于以多维张量(例如2D网格或时间序列)组织的潜在变量。该方法应用于变分自动编码器的新型深度架构。在实验中,我们证明了当应用于自然图像的变分自动编码器时,自回归流能够导致显着的性能增益。


Sample Efficient Actor-Critic with Experience Replay

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

经验回放的高效Actor-Critic样本

本文提出了一个具有经验回放的Actor-Critic深度强化学习代理,它具有稳定高效的样本,并且在具有挑战性的环境(包括离散的57个游戏Atari域和几个连续控制问题)中表现出色。 为了实现这一点,本文介绍了几个新概念,包括截断重要性采样偏差校正,随机决斗网络架构和新的信任区域策略优化方法。


Reparameterization trick for discrete variables

Seiya Tokui, Issei sato

Low-variance gradient estimation is crucial for learning directed graphical models parameterized by neural networks, where the reparameterization trick is widely used for those with continuous variables. While this technique gives low-variance gradient estimates, it has not been directly applicable to discrete variables, the sampling of which inherently requires discontinuous operations. We argue that the discontinuity can be bypassed by marginalizing out the variable of interest, which results in a new reparameterization trick for discrete variables. This reparameterization greatly reduces the variance, which is understood by regarding the method as an application of common random numbers to the estimation. The resulting estimator is theoretically guaranteed to have a variance not larger than that of the likelihood-ratio method with the optimal input-dependent baseline. We give empirical results for variational learning of sigmoid belief networks.

离散变量的参数化技巧

低方差梯度估计对于学习通过神经网络参数化的有向图形模型是至关重要的,其中重新参数化技巧被广泛地用于具有连续变量的模型。虽然这种技术给出了低方差梯度估计,但是它不能直接适用于离散变量,其本质上需要不连续的操作。我们认为,可以通过边缘化感兴趣的变量来绕过不连续性,这将会导致离散变量的新的重新参数化技巧。这种重新参数化极大地减少了方差,它通过将该方法视为对估计公共随机数的应用来理解。所得到的估计器在理论上被保证具有不大于具有最优输入相关基线的似然比方法的方差。我们给出了S型信心网络变化学习的经验结果。


Multi-task learning with deep model based reinforcement learning

Asier Mujika

In recent years, model-free methods that use deep learning have achieved great success in many different reinforcement learning environments. Most successful approaches focus on solving a single task, while multi-task reinforcement learning remains an open problem. In this paper, we present a model based approach to deep reinforcement learning which we use to solve different tasks simultaneously. We show that our approach not only does not degrade but actually benefits from learning multiple tasks. For our model, we also present a new kind of recurrent neural network inspired by residual networks that decouples memory from computation allowing to model complex environments that do not require lots of memory.

基于强化学习深度模型的多任务学习

近年来,使用深度学习的无模型方法在许多不同的强化学习环境中取得了巨大的成功。 最成功的方法集中于解决单个任务,而多任务强化学习仍然是一个开放的问题。在本文中,我们提出一个基于深度强化学习方法的模型,同时我们用它来同时解决不同的任务。结果表明,我们的方法不仅不会降低,而且实际上从学习多个任务中受益。对于我们的模型,我们还提出了一种新型的递归神经网络,灵感来自残余网络,将计算中的内存解耦,从而允许在不需要大量内存的复杂环境下建模。


工具

2016深度学习重大进展:从无监督学习到生成对抗网络

链接:2016深度学习重大进展:从无监督学习到生成对抗网络

谷歌开源Embedding Projector,可将高维数据可视化

链接:谷歌开源Embedding Projector,可将高维数据可视化

开始使用机器学习

链接:开始使用机器学习

由Yoshua Bengio授权北京大学张志华老师团队负责翻译的《Deep Learning》中文版发布

链接:《Deep Learning》中文版发布

笔迹建模&笔迹预测可视化

《Experiments in Handwriting with a Neural Network》

链接:Experiments in Handwriting with a Neural Network


其他

Logistic Regression 模型简介

逻辑回归(Logistic Regression)是机器学习中的一种分类模型,由于算法的简单和高效,在实际中应用非常广泛。本文作为美团机器学习InAction系列中的一篇,主要关注逻辑回归算法的数学模型和参数求解方法,最后也会简单讨论下逻辑回归和贝叶斯分类的关系,以及在多分类问题上的推广。

链接:Logistic Regression 模型简介

Phone

07318457661

Address

National University of Defense Tecnology
Changsha, Hunan 410073
China