Timezone: »
Optimal transport (OT) is gradually establishing itself as a powerful and essential tool to compare probability measures, which in machine learning take the form of point clouds, histograms, bagsoffeatures, or more generally datasets to be compared with probability densities and generative models. OT can be traced back to early work by Monge, and later to Kantorovich and Dantzig during the birth of linear programming. The mathematical theory of OT has produced several important developments since the 90's, crowned by Cédric Villani's Fields Medal in 2010. OT is now transitioning into more applied spheres, including recent applications to machine learning, because it can tackle challenging learning scenarios including dimensionality reduction, structured prediction problems that involve histograms, and estimation of generative models in highly degenerate, highdimensional problems. This workshop will follow that organized 3 years ago (NIPS 2014) and will seek to amplify that trend. We will provide the audience with an update on all of the very recent successes brought forward by efficient solvers and innovative applications through a long list of invited talks. We will add to that a few contributed presentations (oral, and, if needed posters) and, finally, a panel for all invited speakers to take questions from the audience and formulate more nuanced opinions on this nascent field.
Sat 8:00 a.m.  8:20 a.m.

Structured Optimal Transport (with T. Jaakkola, S. Jegelka)
(Contributed 1)

David AlvarezMelis 
Sat 8:20 a.m.  9:00 a.m.

Approximate Bayesian computation with the Wasserstein distance
(Invited 1)
A growing range of generative statistical models prohibits the numerical evaluation of their likelihood functions. Approximate Bayesian computation has become a popular approach to overcome this issue, simulating synthetic data given parameters and comparing summaries of these simulations with the corresponding observed values. We propose to avoid these summaries and the ensuing loss of information through the use of Wasserstein distances between empirical distributions of observed and synthetic data. We describe how the approach can be used in the setting of dependent data such as time series, and how approximations of the Wasserstein distance allow in practice the method to scale to large datasets. In particular, we propose a new approximation to the optimal assignment problem using the Hilbert spacefilling curve. The approach is illustrated on various examples including i.i.d. data and time series. 
Pierre E Jacob 
Sat 9:00 a.m.  9:40 a.m.

Gradient flow in the Wasserstein metric
(Invited 2)
Optimal transport not only provides powerful techniques for comparing probability measures, but also for analyzing their evolution over time. For a range of partial differential equations arising in physics, biology, and engineering, solutions are gradient flows in the Wasserstein metric: each equation has a notion of energy for which solutions dissipate energy as quickly as possible, with respect to the Wasserstein structure. Steady states of the equation correspond to minimizers of the energy, and stability properties of the equation translate into convexity properties of the energy. In this talk, I will compare Wasserstein gradient flow with more classical gradient flows arising in optimization and machine learning. I’ll then introduce a class of particle blob methods for simulating Wasserstein gradient flows numerically. 
Katy Craig 
Sat 9:40 a.m.  10:00 a.m.

Approximate inference with Wasserstein gradient flows (with T. Poggio)
(Contributed 2)

Charlie Frogner 
Sat 10:00 a.m.  10:20 a.m.

6 x 3 minutes spotlights
(Poster Spotlights)

Rémi Flamary · Yongxin Chen · Napat Rujeerapaiboon · Jonas Adler · John Lee · Lucas R Roberts 
Sat 11:00 a.m.  11:40 a.m.

Optimal planar transport in nearlinear time
(Invited 3)
We show how to compute the Earth Mover Distance between two planar sets of size N in N^{1+o(1)} time. The algorithm is based on a generic framework that decomposes the natural Linear Programming formulation for the transport problem into a tree of smaller LPs, and recomposes it in a divideandconquer fashion. The main enabling idea is use sketching  a generalization of the dimension reduction method  in order to reduce the size of the "partial computation" so that the conquer step is more efficient. We will conclude with some open questions in the area. This is joint work with Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 
Alexandr Andoni 
Sat 11:40 a.m.  12:20 p.m.

Laplacian operator and Brownian motions on the Wasserstein space
(Invited 4)
We endow the space of probability measures on $\mathbb{R}^d$ with $\Delta_w$, a Laplacian operator.
A Brownian motion is shown to be consistent with the Laplacian operator. The smoothing
effect of the heat equation is established for a class of functions. Special perturbations of
the Laplacian operator, denoted $\Delta_{w,\epsilon}$, appearing in Mean Field Games theory, are considered (Joint work with Y. T. Chow).

Wilfrid Gangbo 
Sat 1:40 p.m.  2:20 p.m.

Geometrical Insights for Unsupervised Learning
(Invited 6)
After arguing that choosing the right probability distance is critical for achieving the elusive goals of unsupervised learning, we compare the geometric properties of the two currently most promising distances: (1) the earthmover distance, and (2) the energy distance, also known as maximum mean discrepancy. These insights allow us to give a fresh viewpoint on reported experimental results and to risk a couple predictions. Joint work with Leon Bottou, Martin Arjovsky, David LopezPaz, and Maxime Oquab. 
Leon Bottou 
Sat 2:20 p.m.  2:40 p.m.

Improving GANs Using Optimal Transport (with H. Zhang, A. Radford, D. Metaxas)
(Contributed 3)

Tim Salimans 
Sat 2:40 p.m.  3:00 p.m.

Overrelaxed SinkhornKnopp Algorithm for Regularized Optimal Transport (with L. Chizat, C. Dossal, N. Papadakis)
(Contributed 4)

Alexis THIBAULT 
Sat 3:30 p.m.  4:10 p.m.

Domain adaptation with optimal transport : from mapping to learning with joint distribution
(Invited 6)
This presentation deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. After a short introduction of recent developent in domain adaptation and their relation to optimal transport we will present a method that estimates a barycentric mapping between the feature distributions in order to adapt the training dataset prior to learning. Next we propose a novel method that model with optimal transport the transformation between the joint feature/labels space distributions of the two domains. We aim at recovering an estimated target distribution ptf=(X,f(X)) by optimizing simultaneously the optimal coupling and f. We discuss the generalization of the proposed method, and provide an efficient algorithmic solution. The versatility of the approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification, regression problems and large datasets where stochastic approaches become necessary. Joint work with Nicolas COURTY, Devis TUIA, Amaury HABRARD, and Alain RAKOTOMAMONJY 
Rémi Flamary 
Sat 4:10 p.m.  4:50 p.m.

Sharp asymptotic and finitesample rates of convergence of empirical measures in Wasserstein distance
(Invited 7)
The Wasserstein distance between two probability measures on a metric space is a measure of closeness with applications in statistics, probability, and machine learning. In this work, we consider the fundamental question of how quickly the empirical measure obtained fromnindependent samples from μ approaches μ in the Wasserstein distance of any order. We prove sharp asymptotic and finitesample results for this rate of convergence for general measures on general compact metric spaces. Our finitesample results show the existence of multiscale behavior, where measures can exhibit radically different rates of convergence as n grows. See more details in: J. Weed, F. Bach. Sharp asymptotic and finitesample ratesof convergence of empirical measures in Wasserstein distance. Technical Report, Arxiv1707.00087, 2017. 
Francis Bach 
Sat 4:50 p.m.  5:10 p.m.

7 x 3 minutes spotlights
(Poster Spotlights)

Elsa Cazelles · Aude Genevay · Gonzalo Mena · Christoph Brauer · Asja Fischer · Henning Petzka · Vivien Seguy · Antoine Rolet · Sho Sonoda 
Sat 5:10 p.m.  5:30 p.m.

short Q&A session with plenary speakers
(Roundtable)


Sat 5:30 p.m.  6:30 p.m.

Closing session
(Poster Session)

Author Information
Olivier Bousquet (Google Brain (Zurich))
Marco Cuturi (Google Brain & CREST  ENSAE)
Marco Cuturi is a research scientist at Google AI, Brain team in Paris. He received his Ph.D. in 11/2005 from the Ecole des Mines de Paris in applied mathematics. Before that he graduated from National School of Statistics (ENSAE) with a master degree (MVA) from ENS Cachan. He worked as a postdoctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 3/2007 and then in the financial industry between 4/2007 and 9/2008. After working at the ORFE department of Princeton University as a lecturer between 2/2009 and 8/2010, he was at the Graduate School of Informatics of Kyoto University between 9/2010 and 9/2016 as a tenured associate professor. He joined ENSAE in 9/2016 as a professor, where he is now working parttime. His main employment is now with Google AI (Brain team in Paris) since 10/2018, as a research scientist working on fundamental aspects of machine learning.
Gabriel Peyré (Université Paris Dauphine)
Fei Sha (University of Southern California (USC))
Justin Solomon (Stanford University)
More from the Same Authors

2021 : Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1D FrankWolfe »
Thibault Sejourne · FrancoisXavier Vialard · Gabriel Peyré 
2021 : Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1D FrankWolfe »
Thibault Sejourne · FrancoisXavier Vialard · Gabriel Peyré 
2021 : LinearTime Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi 
2021 : LinearTime Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi 
2021 Workshop: Optimal Transport and Machine Learning »
Jason Altschuler · Charlotte Bunne · Laetitia Chapel · Marco Cuturi · Rémi Flamary · Gabriel Peyré · Alexandra Suvorikova 
2021 Poster: Smooth Bilevel Programming for Sparse Regularization »
Clarice Poon · Gabriel Peyré 
2021 Poster: The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation »
Thibault Sejourne · FrancoisXavier Vialard · Gabriel Peyré 
2020 Poster: Projection Robust Wasserstein Distance and Riemannian Optimization »
Darren Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan 
2020 Poster: FixedSupport Wasserstein Barycenters: Computational Hardness and Fast Algorithm »
Darren Lin · Nhat Ho · Xi Chen · Marco Cuturi · Michael Jordan 
2020 Spotlight: Projection Robust Wasserstein Distance and Riemannian Optimization »
Darren Lin · Chenyou Fan · Nhat Ho · Marco Cuturi · Michael Jordan 
2020 Poster: Learning with Differentiable Pertubed Optimizers »
Quentin Berthet · Mathieu Blondel · Olivier Teboul · Marco Cuturi · JeanPhilippe Vert · Francis Bach 
2020 Memorial: In Memory of Olivier Chapelle »
Bernhard Schölkopf · Andre Elisseeff · Olivier Bousquet · Vladimir Vapnik · Jason E Weston 
2020 Poster: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi 
2020 Poster: Linear Time Sinkhorn Divergences using Positive Features »
Meyer Scetbon · Marco Cuturi 
2020 Oral: Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form »
Hicham Janati · Boris Muzellec · Gabriel Peyré · Marco Cuturi 
2020 Session: Orals & Spotlights Track 21: Optimization »
Peter Richtarik · Marco Cuturi 
2020 Poster: Synthetic Data Generators  Sequential and Private »
Olivier Bousquet · Roi Livni · Shay Moran 
2020 Poster: What Do Neural Networks Learn When Trained With Random Labels? »
Hartmut Maennel · Ibrahim Alabdulmohsin · Ilya Tolstikhin · Robert Baldock · Olivier Bousquet · Sylvain Gelly · Daniel Keysers 
2020 Spotlight: What Do Neural Networks Learn When Trained With Random Labels? »
Hartmut Maennel · Ibrahim Alabdulmohsin · Ilya Tolstikhin · Robert Baldock · Olivier Bousquet · Sylvain Gelly · Daniel Keysers 
2020 Session: Orals & Spotlights Track 01: Representation/Relational »
Laurens van der Maaten · Fei Sha 
2019 Workshop: Optimal Transport for Machine Learning »
Marco Cuturi · Gabriel Peyré · Rémi Flamary · Alexandra Suvorikova 
2019 Poster: Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections »
Boris Muzellec · Marco Cuturi 
2019 Poster: Differentiable Ranking and Sorting using Optimal Transport »
Marco Cuturi · Olivier Teboul · JeanPhilippe Vert 
2019 Spotlight: Differentiable Ranking and Sorting using Optimal Transport »
Marco Cuturi · Olivier Teboul · JeanPhilippe Vert 
2019 Poster: Practical and Consistent Estimation of fDivergences »
Paul Rubenstein · Olivier Bousquet · Josip Djolonga · Carlos Riquelme · Ilya Tolstikhin 
2019 Poster: TreeSliced Variants of Wasserstein Distances »
Tam Le · Makoto Yamada · Kenji Fukumizu · Marco Cuturi 
2018 Poster: Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport »
Theo Lacombe · Marco Cuturi · Steve OUDOT 
2018 Poster: Assessing Generative Models via Precision and Recall »
Mehdi S. M. Sajjadi · Olivier Bachem · Mario Lucic · Olivier Bousquet · Sylvain Gelly 
2018 Poster: Synthesize Policies for Transfer and Adaptation across Tasks and Environments »
Hexiang Hu · Liyu Chen · Boqing Gong · Fei Sha 
2018 Spotlight: Synthesize Policies for Transfer and Adaptation across Tasks and Environments »
Hexiang Hu · Liyu Chen · Boqing Gong · Fei Sha 
2018 Poster: Are GANs Created Equal? A LargeScale Study »
Mario Lucic · Karol Kurach · Marcin Michalski · Sylvain Gelly · Olivier Bousquet 
2018 Poster: Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions »
Boris Muzellec · Marco Cuturi 
2017 Poster: Approximation and Convergence Properties of Generative Adversarial Learning »
Shuang Liu · Olivier Bousquet · Kamalika Chaudhuri 
2017 Spotlight: Approximation and Convergence Properties of Generative Adversarial Learning »
Shuang Liu · Olivier Bousquet · Kamalika Chaudhuri 
2017 Poster: An Empirical Study on The Properties of Random Bases for Kernel Methods »
Maximilian Alber · PieterJan Kindermans · Kristof Schütt · KlausRobert Müller · Fei Sha 
2017 Poster: AdaGAN: Boosting Generative Models »
Ilya Tolstikhin · Sylvain Gelly · Olivier Bousquet · CarlJohann SIMONGABRIEL · Bernhard Schölkopf 
2017 Tutorial: A Primer on Optimal Transport »
Marco Cuturi · Justin Solomon 
2016 Workshop: Time Series Workshop »
Oren Anava · Marco Cuturi · Azadeh Khaleghi · Vitaly Kuznetsov · Sasha Rakhlin 
2016 Poster: A Multistep Inertial ForwardBackward Splitting Method for Nonconvex Optimization »
Jingwei Liang · Jalal Fadili · Gabriel Peyré 
2016 Poster: Wasserstein Training of Restricted Boltzmann Machines »
Grégoire Montavon · KlausRobert Müller · Marco Cuturi 
2016 Poster: Sparse Support Recovery with Nonsmooth Loss Functions »
Kévin Degraux · Gabriel Peyré · Jalal Fadili · Laurent Jacques 
2016 Poster: Stochastic Optimization for Largescale Optimal Transport »
Aude Genevay · Marco Cuturi · Gabriel Peyré · Francis Bach 
2015 : Do Shallow Kernel Methods Match Deep Neural Networks »
Fei Sha 
2015 : Do Shallow Kernel Methods Match Deep Neural Networks? »
Fei Sha 
2015 Poster: Biologically Inspired Dynamic Textures for Probing Motion Perception »
Jonathan Vacher · Andrew Isaac Meso · Laurent U Perrinet · Gabriel Peyré 
2015 Spotlight: Biologically Inspired Dynamic Textures for Probing Motion Perception »
Jonathan Vacher · Andrew Isaac Meso · Laurent U Perrinet · Gabriel Peyré 
2015 Poster: Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric »
Vivien Seguy · Marco Cuturi 
2014 Workshop: Optimal Transport and Machine Learning »
Marco Cuturi · Gabriel Peyré · Justin Solomon · Alexander Barvinok · Piotr Indyk · Robert McCann · Adam Oberman 
2014 Poster: Diverse Sequential Subset Selection for Supervised Video Summarization »
Boqing Gong · WeiLun (Harry) Chao · Kristen Grauman · Fei Sha 
2014 Poster: Local Linear Convergence of ForwardBackward under Partial Smoothness »
Jingwei Liang · Jalal Fadili · Gabriel Peyré 
2013 Workshop: New Directions in Transfer and MultiTask: Learning Across Domains and Tasks »
Urun Dogan · Marius Kloft · Tatiana Tommasi · Francesco Orabona · Massimiliano Pontil · Sinno Jialin Pan · Shai BenDavid · Arthur Gretton · Fei Sha · Marco Signoretto · Rajhans Samdani · YunQian Miao · Mohammad Gheshlaghi azar · Ruth Urner · Christoph Lampert · Jonathan How 
2013 Poster: Reshaping Visual Datasets for Domain Adaptation »
Boqing Gong · Kristen Grauman · Fei Sha 
2013 Poster: Sinkhorn Distances: Lightspeed Computation of Optimal Transport »
Marco Cuturi 
2013 Spotlight: Sinkhorn Distances: Lightspeed Computation of Optimal Transport »
Marco Cuturi 
2013 Poster: Similarity Component Analysis »
Soravit Changpinyo · Kuan Liu · Fei Sha 
2012 Poster: Nonlinear Metric Learning »
Dor Kedem · Stephen Tyree · Kilian Q Weinberger · Fei Sha · Gert Lanckriet 
2012 Session: Oral Session 5 »
Fei Sha 
2012 Poster: Semantic Kernel Forests from Multiple Taxonomies »
Sung Ju Hwang · Kristen Grauman · Fei Sha 
2011 Poster: Learning a Tree of Metrics with Disjoint Visual Features »
Sung Ju Hwang · Kristen Grauman · Fei Sha 
2010 Workshop: Challenges of Data Visualization »
Barbara Hammer · Laurens van der Maaten · Fei Sha · Alex Smola 
2010 Poster: Unsupervised Kernel Dimension Reduction »
Meihong Wang · Fei Sha · Michael Jordan 
2009 Workshop: Statistical Machine Learning for Visual Analytics »
Guy Lebanon · Fei Sha 
2009 Poster: White Functionals for Anomaly Detection in Dynamical Systems »
Marco Cuturi · JeanPhilippe Vert · Alexandre d'Aspremont 
2008 Poster: DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification »
Simon LacosteJulien · Fei Sha · Michael Jordan 
2008 Session: Oral session 1: Clustering »
Fei Sha 
2007 Workshop: Machine Learning for Systems Problems (Part 2) »
Archana Ganapathi · Sumit Basu · Fei Sha · Emre Kiciman 
2007 Workshop: Machine Learning for Systems Problems (Part 1) »
Archana Ganapathi · Sumit Basu · Fei Sha · Emre Kiciman 
2007 Session: Session 7: Systems and Applications »
Fei Sha 
2007 Poster: The Tradeoffs of Large Scale Learning »
Leon Bottou · Olivier Bousquet 
2006 Poster: Large Margin Gaussian Mixture Models for Automatic Speech Recognition »
Fei Sha · Lawrence Saul 
2006 Poster: Kernels on Structured Objects Through Nested Histograms »
Marco Cuturi · Kenji Fukumizu 
2006 Talk: Large Margin Gaussian Mixture Models for Automatic Speech Recognition »
Fei Sha · Lawrence Saul 
2006 Poster: Graph Regularization for Maximum Variance Unfolding with an Application to Sensor Localization »
Kilian Q Weinberger · Fei Sha · Qihui Zhu · Lawrence Saul