Awesome Papers: 2016-12-4

Deep Reinforcement Learning with Averaged Target DQN

Oron Anschel, Nir Baram, Nahum Shimkin

The commonly used Q-learning algorithm combined with function approximation induces systematic overestimations of state-action values. These systematic errors might cause instability, poor performance and sometimes divergence of learning. In this work, we present the Averaged Target DQN (ADQN) algorithm, an adaptation to the DQN class of algorithms which uses a weighted average over past learned networks to reduce generalization noise variance. As a consequence, this leads to reduced overestimations, more stable learning process and improved performance. Additionally, we analyze ADQN variance reduction along trajectories and demonstrate the performance of ADQN on a toy Gridworld problem, as well as on several of the Atari 2600 games from the Arcade Learning Environment.

有平均目标DQN的深度强化学习

通常使用的Q学习算法与函数近似相结合诱导了对状态动作值的系统高估。这些系统性错误可能导致不稳定、性能不佳,有时还会有学习分歧。在这次研究中,我们提出平均目标DQN(ADQN)算法——一个适应DQN类的算法,使用加权平均过去学习的网络,来减少广义噪声方差。结果会导致减少高估、更稳定的学习过程和改进的性能。此外,我们还分析了沿轨迹的ADQN方差减少,并证明ADQN对玩具Gridworld问题的性能,以及来自Arcade学习环境的几个Atari 2600游戏。


Safe and Efficient Off-Policy Reinforcement Learning

Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of “off-policyness”; and (3) it is efficient as it makes the best use of samples collected from near on-policy behavior policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to Q∗ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins’ Q(λ), which was an open problem since 1989. We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games.

安全有效的离策略强化学习

在这次研究中,我们重新考虑了一些旧的和新的算法,用于离策略、基于回报的强化学习。用一个共同的形式表现这些,我们导出一个新的算法,Retrace(λ),它具有三个期望的属性:(1)它具有低方差;(2)它安全地使用从任何行为策略收集的样本,无论其“违规”程度如何;(3)它是高效的,因为它很好地利用从近在策略行为策略收集的样本。我们在离策略策略评估和控制设置下分析相关运算符的收缩性质,并得出基于样本的在线算法。我们认为这是第一个基于回报的离策略控制算法收敛a.s.到Q *,没有GLIE假设(贪婪在无限探索的极限)。作为推论,我们证明了Watkins的Q(λ)的收敛,这是1989年以来的一个开放问题。我们解释了Retrace(λ)对Atari 2600游戏的标准套件的好处。


Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Emilio Jorge, Mikael Kageback, Emil Gustavsson

Learning your first language is an incredible feat and not easily duplicated. Doing this using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. As an alternative we propose to use situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks (DRQN) for learning a common language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who? The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that it is possible to learn this task using DRQN and even more importantly that the words the agents use correspond to physical attributes present in the images that make up the agents environment.

学会玩猜猜是谁?以及发明基础语言作为推论结果

学习你的第一语言是一个不可思议的壮举,不容易复制。只使用一些图片很少的书,一个语料库,甚至对于人类来说都是不可能的。作为替代,我们建议使用代理之间的位置交互作为通信的驱动力,以及用于学习基于所提供环境的共同语言的深度循环Q网络(DRQN)的框架。猜猜我们把交互式图像搜索的代理交给了游戏形式中的谁?来自游戏的图像提供了一个不平凡的环境供代理讨论,以及他们决定在通信中编码的概念的自然基础。我们的实验表明,可以使用DRQN来学习这个任务,甚至更重要的是,代理使用的词对应于组成代理环境的图像中存在的物理属性。


Low Data Drug Discovery with One-shot Learning

Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, and Vijay Pande

Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds. However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the residual LSTM embedding, that, when combined with graph convolutional neural networks, significantly improves the ability to learn meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery.

低数据药物发现与一次性学习

机器学习的最新进展对药物发现做出了重要贡献。深度神经网络已经被证明在推断小分子化合物的性质和活性时预测能力显着提高。然而,这些技术的适用性受到要求大量训练数据的限制。在这次研究中,我们证明一次性学习可以用于显着降低所需的数据量,使其在药物发现应用中进行有意义的预测。我们引入了一种新的架构——残余LSTM嵌入,当与图表卷积神经网络相结合时,会显着提高学习小分子的有意义距离指标的能力。我们将公开本次研究中介绍的所有模型作为DeepChem的一部分,DeepChem是药物发现深度学习的一种开源框架。


Importance Sampling with Unequal Support

Philip S. Thomas and Emma Brunskill

Importance sampling is often used in machine learning when training and testing data come from different distributions. In this paper we propose a new variant of importance sampling that can reduce the variance of importance sampling-based estimates by orders of magnitude when the supports of the training and testing distributions differ. After motivating and presenting our new importance sampling estimator, we provide a detailed theoretical analysis that characterizes both its bias and variance relative to the ordinary importance sampling estimator (in various settings, which include cases where ordinary importance sampling is biased, while our new estimator is not, and vice versa). We conclude with an example of how our new importance sampling estimator can be used to improve estimates of how well a new treatment policy for diabetes will work for an individual, using only data from when the individual used a previous treatment policy.

不均匀支持的重要性抽样

当训练和测试来自不同分布的数据时,重要性抽样通常用于机器学习。在本文中,我们提出了一种重要性抽样的新变量,当训练和测试分布的支持不同时,可以数量级地减少重要性基于抽样估计的方差。在激励和呈现我们新的重要性抽样估计量之后,我们提供详细的理论分析,其特征在于其相对于普通重要性抽样估计量的偏差和方差(在各种设置中,包括普通重要性抽样偏差的情况,而我们的新评估者不是,反之亦然)。我们以一个例子说明我们的新重要性抽样估计量如何用于改善糖尿病的新治疗政策对个体的工作效果的估计,仅使用来自个体使用先前治疗策略的数据。


SoK: Applying Machine Learning in Security - A Survey

Heju Jiang, Jasvir Nagra, Parvez Ahammad

The idea of applying machine learning(ML) to solve problems in security domains is almost 3 decades old. As information and communications grow more ubiquitous and more data become available, many security risks arise as well as appetite to manage and mitigate such risks. Consequently, research on applying and designing ML algorithms and systems for security has grown fast, ranging from intrusion detection systems(IDS) and malware classification to security policy management(SPM) and information leak checking. In this paper, we systematically study the methods, algorithms, and system designs in academic publications from 2008-2015 that applied ML in security domains. 98 percent of the surveyed papers appeared in the 6 highest-ranked academic security conferences and 1 conference known for pioneering ML applications in security. We examine the generalized system designs, underlying assumptions, measurements, and use cases in active research. Our examinations lead to 1) a taxonomy on ML paradigms and security domains for future exploration and exploitation, and 2) an agenda detailing open and upcoming challenges. Based on our survey, we also suggest a point of view that treats security as a game theory problem instead of a batch-trained ML problem.

SoK:在安全领域中应用机器学习—调查

应用机器学习(ML)解决安全领域中的问题的想法几乎已经有三十年了。随着信息和通信日益普及,有更多的数据可用,许多安全风险以及管理和减轻这些风险的欲望也在增长。因此,应用设计ML算法和安全系统的研究快速发展,从入侵检测系统(IDS)和恶意软件分类到安全策略管理(SPM)和信息泄漏检查。在本文中,我们系统地研究了2008-2015年在安全领域应用ML学术发表的方法、算法和系统设计。被调查的98%的论文出现在6个最高排名的学术安全会议和1个安全ML先驱应用程序会议中。我们检查了广义系统设计、基础假设、测量和活动研究中的用例。检查有二:1)对未来探索和利用ML范式和安全域的分类系统,2)详细说明开放和即将到来的挑战的议程。根据我们的调查,我们还提供了一个观点,把安全视为一个游戏理论问题,而不是批训练的ML问题。


Multi-Task Multiple Kernel Relationship Learning

Keerthiram Murugesan, Jaime Carbonell

This paper presents a novel multitask multiple-kernel learning framework that efficiently learns the kernel weights leveraging the relationship across multiple tasks. The idea is to automatically infer this task relationship in the \textit{RKHS} space corresponding to the given base kernels. The problem is formulated as a regularization-based approach called \textit{Multi-Task Multiple Kernel Relationship Learning} (\textit{MK-MTRL}), which models the task relationship matrix from the weights learned from latent feature spaces of task-specific base kernels. Unlike in previous work, the proposed formulation allows one to incorporate prior knowledge for simultaneously learning several related task. We propose an alternating minimization algorithm to learn the model parameters, kernel weights and task relationship matrix. In order to tackle large-scale problems, we further propose a two-stage \textit{MK-MTRL} online learning algorithm and show that it significantly reduces the computational time, and also achieves performance comparable to that of the joint learning framework. Experimental results on benchmark datasets show that the proposed formulations outperform several state-of-the-art multi-task learning methods. https://arxiv.org/abs/1611.03427

多任务多内核关系学习

本文提出一个新的多任务多内核学习框架,利用跨多个任务的关系有效地学习内核权重。这个想法是在与给定基本内核对应的\ textit {RKHS}空间中自动推断任务关系。该问题被表示为\ textit {多任务多核关系学习}(\ textit {MK-MTRL})的基于正则化的方法,它从特定任务基本内核的潜在特征空间学习的权重来建模任务关系矩阵。与以前的工作不同,这次提出的构想允许它吸收先验知识的同时学习几个相关的任务。我们提出一个交替最小化算法来学习模型参数、内核权重和任务关系矩阵。为了解决大规模的问题,我们进一步提出了一个两段式\ textit {MK-MTRL}在线学习算法,结果显示它显着减少了计算时间,并且达到了与联合学习框架相媲美的性能。基准数据集的实验结果表明,提出的构想优于几个当前最先进的多任务学习方法。


Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

Karthik Narasimhan, Adam Yala, Regina Barzilay

Most successful information extraction systems operate with access to a large collection of documents. In this work, we explore the task of acquiring and incorporating external evidence to improve extraction accuracy in domains where the amount of training data is scarce. This process entails issuing search queries, extraction from new sources and reconciliation of extracted values, which are repeated until sufficient evidence is collected. We approach the problem using a reinforcement learning framework where our model learns to select optimal actions based on contextual information. We employ a deep Q-network, trained to optimize a reward function that reflects extraction accuracy while penalizing extra effort. Our experiments on two databases – of shooting incidents, and food adulteration cases – demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline.

通过强化学习获得的外部证据改善信息提取

最成功的信息提取系统能够访问大量的文档集合。在这次研究中,我们探讨了获取和包含外部证据的任务,以提高在训练数据很少的情况下的提取准确率。该过程需要发布搜索查询,从新源和提取值中不断重复萃取,直到收集到足够的证据。我们使用强化学习框架来处理问题,其中我们的模型能基于上下文信息来学习选择最佳动作。我们采用深度Q网络,训练它来优化奖励函数以反映提取精确度,同时减少额外的工作。我们在两个数据库(射击事件和食物掺假案例)上的实验表明,我们的系统显着优于传统提取器和竞争性元分类器基线。


Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task. Our agent significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10× and averaging 87\% expert human performance on Labyrinth.

使用无监督辅助任务的强化学习

深度强化学习代理通过直接最大化累积奖励达到了最先进的结果。然而,环境包含更多种可能的训练信号。在本文中,我们介绍了一个代理,它在强化学习的同时最大化了许多其他伪奖励函数。所有这些任务都共享一个共同的表征,像无监督学习,继续在没有外在奖励下发展。我们还引入了一个新的机制,将这种表征集中在外在奖励上,使学习能够迅速适应实际任务中最有相关性的方面。我们的代理明显优于Atari先前的最先进的技术,平均880%的专家性能和一个具有挑战性的第一人称,三维迷宫的任务导致平均10×的学习加速和平均87 \%的专家性能提高。


Towards Deep Symbolic Reinforcement Learning

Marta Garnelo, Kai Arulkumaran, Murray Shanahan

Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.

深入象征性深度强化学习

深度加强学习(DRL)使深度神经网络的力量能承担试错学习的一般任务,并且其有效性已经在Atari视频游戏和Go游戏的任务上有说服力地展示过了。然而,当代DRL系统继承了当前一代深度学习技术的一些缺点。例如,他们需要非常大的数据集才能有效地工作,使得即使这些数据集可用,它们的学习起来也很慢。此外,他们缺乏在抽象层面进行推理的能力,这使其难以实现高级别的认知功能,例如迁移学习、类比推理和基于假设的推理。最后,它们的操作对于人们来说是不透明的,使得它们不适合于可验证性很重要的领域。在本文中,我们提出了一个端到端的强化学习架构,包括神经后端和符号前端,这样就有潜力克服这些缺点。作为概念验证,我们提出了架构的初步实现,并将其应用于几个简单视频游戏的变型上。结果证明得到的系统虽然只是一个原型 – 但它能有效地学习,并且通过获取一组对人类而言通俗易懂的符号规则,在游戏的随机变量上其性能显着优于传统的完全神经DRL系统。


工具

格式化和清洗数据的Python工具包

Python 社区提供了许多库让数据变得清晰有序——从格式化 DataFrame 到匿名化数据集。

链接:格式化和清洗数据的Python工具包

(视频+讲义+中文笔记)CMU凸优化课程

链接:CMU凸优化课程

微软亚洲研究院刘铁岩:AI时代,机器学习最新技术趋势解读

链接:AI时代,机器学习最新技术趋势解读

2016年年度十大Python库

我们避开了 Django、Flask 等已经成为今天的标准库的已经成功的项目。另外,这个榜单中有的库是 2016 年之前建立的,但它们在今年的受欢迎度出现了暴增或我们认为它们非常好所以可以进入这个榜单。

链接:2016年年度十大Python库

RankSys:(Java)开源推荐系统框架

RankSys-Java 8 Recommender Systems framework for novelty, diversity and much more

链接:RankSys:(Java)开源推荐系统框架


其他

EAI:不用太聪明,但却很实用

目前关于人工智能的讨论,流行的是机器具备意识、机器通过图灵测试和机器理解猫脸概念等话题,这些并不一定有建设性,不能有效解决实际问题。受教式人工智能(EAI)则更强调应用智能,目的是让智能技术为产业服务。

链接:EAI:不用太聪明,但却很实用

2016年人工智能行业全记录!人工智能大事件盘点

本文是我们机器人2025对2016年国内外人工智能行业的回顾,文章选取了15个较为重大的人工智能事件,虽然并不能解决几十年来困扰科学家、心理学家和哲学家们的人工智能问题,但却是人工智能技术发展历程上不可磨灭的印记。

链接:2016年人工智能行业全记录!人工智能大事件盘点

学术:SDN安全技术研究

软件定义网络SDN通过分离网络设备的控制面与数据面,向上将应用程序接口提供给应用层,从而构建了开放可编程的网络环境,向下将路由策略下发到路由器,实现网络设备集中管理。

链接:学术:SDN安全技术研究

机器学习2016年终知识盘点

链接:机器学习2016年终知识盘点

Phone

07318457661

Address

National University of Defense Tecnology
Changsha, Hunan 410073
China