# Awesome Papers: 2017-01-1

### iCaRL: Incremental Classifier and Representation Learning

*Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Christoph H. Lampert*

A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data.In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on the CIFAR-100 and ImageNet ILSVRC 2012 datasets that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.

### Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

*Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine*

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.

### A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

*Felix Leibfried, Nate Kushman, Katja Hofmann*

Reinforcement learning is concerned with learning to interact with environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as DQN, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment dynamics or the reward structure.In this paper we take a step towards using model-based techniques in environments with high-dimensional visual state space when system dynamics and the reward structure are both unknown and need to be learned, by demonstrating that it is possible to learn both jointly. Empirical evaluation on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these positive results as opening up important directions for model-based RL in complex, initially unknown environments.

### Cascaded Neural Networks with Selective Classifiers and its evaluation using Lung X-ray CT Images

*Masaharu Sakamoto, Hiroki Nakano*

Lung nodule detection is a class imbalanced problem because nodules are found with much lower frequency than non-nodules. In the class imbalanced problem,conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We therefore propose cascaded convolutional neural networks to cope with the class imbalanced problem. In the proposed approach, cascaded convolutional neural networks that perform as selective classifiers filter out obvious non-nodules. Successively, a convolutional neural network trained with a balanced data set calculates nodule probabilities. The proposed method achieved the detection sensitivity of 85.3% and 90.7% at 1 and 4 false positives per scan in FROC curve, respectively.

### Limbo: A Fast and Flexible Library for Bayesian Optimization

*Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret*

Limbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt(another C++ library) for a similar accuracy.

### Deep Learning Approximation for Stochastic Control Problems

*Jiequn Han, E Weinan*

Many real world stochastic control problems suffer from the “curse of dimensionality”. To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective function for the control problem plays the role of the loss function for the deep neural network. We test this approach using examples from the areas of optimal trading and energy storage. Our results suggest that the algorithm presented here achieves satisfactory accuracy and at the same time, can handle rather high dimensional problems.

### Title: Variational Intrinsic Control

*Karol Gregor, Danilo Jimenez Rezende, Daan Wierstra*

In this paper we introduce a new unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent. This set is learned by maximizing the number of different states an agent can reliably reach, as measured by the mutual information between the set of options and option termination states. To this end, we instantiate two policy gradient based algorithms, one that creates an explicit embedding space of options and one that represents options implicitly. The algorithms also provide an explicit measure of empowerment in a given state that can be used by an empowerment maximizing agent. The algorithm scales well with function approximation and we demonstrate the applicability of the algorithm on a range of tasks.

### Can Co-robots Learn to Teach?

*Harshal Maske, Emily Kieson, Girish Chowdhary, and Charles Abramson*

We explore beyond existing work on learning from demonstration by asking the question: Can robots learn to teach?, that is, can a robot autonomously learn an instructional policy from expert demonstration and use it to instruct or collaborate with humans in executing complex tasks in uncertain environments? In this paper we pursue a solution to this problem by leveraging the idea that humans often implicitly decompose a higher level task into several subgoals whose execution brings the task closer to completion. We propose Dirichlet process based non-parametric Inverse Reinforcement Learning (DPMIRL) approach for reward based unsupervised clustering of task space into subgoals. This approach is shown to capture the latent subgoals that a human teacher would have utilized to train a novice. The notion of action primitive is introduced as the means to communicate instruction policy to humans in the least complicated manner, and as a computationally efficient tool to segment demonstration data. We evaluate our approach through experiments on hydraulic actuated scaled model of an excavator and evaluate and compare different teaching strategies utilized by the robot.

### The Recycling Gibbs Sampler for Efficient Learning

*Luca Martino, Victor Elvira, Gustau Camps-Valls*

Monte Carlo methods are essential tools for Bayesian inference. Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning, and statistics, employed to draw samples from complicated high-dimensional posterior distributions. The key point for the successful application of the Gibbs sampler is the ability to draw efficiently samples from the full-conditional probability density functions. Since in the general case this is not possible, in order to speed up the convergence of the chain, it is required to generate auxiliary samples whose information is eventually disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. This novel scheme arises naturally after pointing out the relationship between the standard Gibbs sampler and the chain rule used for sampling purposes. Numerical simulations involving simple and real inference problems confirm the excellent performance of the proposed scheme in terms of accuracy and computational efficiency. In particular we give empirical evidence of performance in a toy example, inference of Gaussian processes hyperparameters, and learning dependence graphs through regression.

### Deep Transfer Learning for Person Re-identification

*Mengyue Geng, Yaowei Wang, Tao Xiang, Yonghong Tian*

Person re-identification (Re-ID) poses a unique challenge to deep learning: how to learn a deep model with millions of parameters on a small training set of few or no labels. In this paper, a number of deep transfer learning models are proposed to address the data sparsity problem. First, a deep network architecture is designed which differs from existing deep Re-ID models in that (a) it is more suitable for transferring representations learned from large image classification datasets, and (b) classification loss and verification loss are combined, each of which adopts a different dropout strategy. Second, a two-stepped fine-tuning strategy is developed to transfer knowledge from auxiliary datasets. Third, given an unlabelled Re-ID dataset, a novel unsupervised deep transfer learning model is developed based on co-training. The proposed models outperform the state-of-the-art deep Re-ID models by large margins: we achieve Rank-1 accuracy of 85.4\%, 83.7\% and 56.3\% on CUHK03, Market1501, and VIPeR respectively, whilst on VIPeR, our unsupervised model (45.1\%) beats most supervised models.

### Challenges in Bayesian Adaptive Data Analysis

*Sam Elder*

Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is commonly performed in practice, as an iterative process of proposing hypotheses and interacting with the data set. Previous work has established a model with a rather strong lower bound on sample complexity in terms of the number of queries, n∼q√, arguing that adaptive data analysis is much harder than static data analysis, where n∼logq is possible. Instead, we argue that those strong lower bounds point to a shortcoming in the model, an informational asymmetry with no basis in applications. In its place, we propose a new Bayesian version of the problem without this unnecessary asymmetry. The previous lower bounds are no longer valid, which offers the possibility for stronger results. However, we show that a large family of methods, including all previously proposed algorithms, cannot achieve the static dependence of n∼logq even in this regime, establishing polylogarithmic lower bounds with a new family of lower bounds. These preliminary results suggest that adaptive data analysis is harder than static data analysis even without this information asymmetry, but still leave wide open the possibility that new algorithms can be developed to work with fewer samples than the previous best known algorithms.