Arthur Tenenhaus

(Laboratoire des Signaux et Systèmes, CentraleSupelec, Université Paris-Sud, France)

Structured data analysis with RGCCA

The challenges related to the use of massive amounts of data include identifying the relevant variables, reducing dimensionality, summarizing information in a comprehensible way and displaying it for interpretation purposes. Often, these data are intrinsically structured in blocks of variables, in groups of individuals or in tensor. Classical statistical tools cannot be applied without altering their structure leading to the risk of information loss. The need to analyze the data by taking into account their natural structure appears to be essential but requires the development of new statistical techniques. In that context a general framework for structured data analysis based on Regularized Generalized Canonical Correlation Analysis (RGCCA) will be presented.

Andre Martins

(Unbabel, Lisbon, Portugal and Instituto de Telecomunicacoes (IT), Instituto Superior Tecnico, Lisboa, Portugal)

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

The softmax transformation is a key component of several statistical learning models, encompassing multinomial logistic regression, action selection in reinforcement learning, and neural networks for multi-class classification. Recently, it has also been used to  design attention mechanisms in neural networks, with important achievements in machine translation, image caption generation, speech recognition, and various tasks in natural language understanding and computation learning.

In this talk, I will describe sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, I will show how its Jacobian can be efficiently computed, enabling its use in a neural network trained with backpropagation. Then, I will propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss. An unexpected connection between this new loss and the Huber classification loss will be revealed. We obtained promising empirical results in multi-label classification problems and in attention-based neural networks for natural language inference. For the latter, we achieved a similar performance as the traditional softmax, but with a selective, more compact, attention focus.

Cajo J.F. ter Braak

(Biometris, Wageningen University & Research)

L-shaped data, GLM(M) and double constrained correspondence analysis

L-shaped data consists of a non-negative central matrix with associated matrices with predictors for rows and columns.  Formally, it is (weighted) bigraph with node predictors. Examples are preference data of consumers for products with features of both consumers and products are predictors, supervisory boards of firms with features of supervisors and firms as predictors for the membership, and, in ecology, abundance data of species and environmental variables with traits and environmental variables as predictors. We will discuss the statistical issues of analysing such data and why double constrained correspondence analysis and GLM(M) methods may give very similar results in terms of selecting important features.


Alfred Hero

(Dept. of Electrical Engineering and Computer Science, The College of Engineering, The University of Michigan, USA)

Tensor Graphical Lasso (TeraLasso)

We propose a new ultrasparse graphical model for representing multiway data based on a Kronecker sum representation of the process inverse covariance matrix. This statistical model decomposes the inverse covariance into a linear Kronecker sum representation with sparse Kronecker factors. Under the assumption that the multiway observations are matrix-normal the l1 sparsity regularized log-likelihood function is convex and admits significantly faster statistical rates of convergence than other sparse matrix normal algorithms such as graphical lasso or Kronecker graphical lasso. We will illustrate the method on several real multiway datasets, showing that we can recover sparse graphical structures in high dimensional data from few samples.