Of course. Here is a detailed explanation of the mathematical theory of optimal transport and its applications.

The Mathematical Theory of Optimal Transport and Its Applications

Introduction: The Intuitive Idea

At its heart, Optimal Transport (OT) is a theory about the most efficient way to move "stuff" from one place to another. The "stuff" can be anything: earth in a construction project, goods from factories to stores, or even probability mass in a statistical model.

Imagine you have a large pile of dirt (a source distribution) and you want to move it to fill a hole of the same volume (a target distribution). You want to do this with the minimum possible effort. The "effort" or cost might be the total distance the dirt is moved, multiplied by the amount of dirt. Optimal Transport provides the mathematical framework to find the best plan for moving every particle of dirt from its starting position to its final position to minimize this total cost.

This simple, intuitive idea has blossomed into a rich mathematical theory with deep connections to partial differential equations (PDEs), geometry, and probability, and has recently exploded in popularity due to its powerful applications in machine learning, computer vision, economics, and biology.

Part 1: The Core Mathematical Problem

The theory has two main historical formulations.

1. Monge's Formulation (1781)

The problem was first posed by French mathematician Gaspard Monge. He was tasked by the military with finding the most cost-effective way to move soil for embankments and fortifications.

Setup: We have two probability distributions (or measures), $\mu$ (the source, our pile of dirt) and $\nu$ (the target, the hole). We need to find a transport map $T(x)$ that tells us where to move a particle from location $x$ in the source to location $T(x)$ in the target.
Constraint: The map $T$ must transform the source distribution $\mu$ into the target distribution $\nu$. This is written as $T_# \mu = \nu$ (the push-forward of $\mu$ by $T$ is $\nu$). This simply means that if you move all the mass according to the map $T$, you end up with the target distribution $\nu$.
Objective: We want to minimize the total transportation cost. If the cost of moving one unit of mass from $x$ to $y$ is $c(x, y)$, the total cost is:

$$ \inf{T: T# \mu = \nu} \int_{\mathbb{R}^d} c(x, T(x)) \, d\mu(x) $$

Limitation of Monge's Formulation: This formulation is very rigid. It requires that each point $x$ in the source maps to a single point $T(x)$ in the target. This isn't always possible or optimal. What if you need to split a shovel of dirt from one location and use it to fill two different spots in the hole? Monge's formulation doesn't allow for this.

2. Kantorovich's Relaxation (1940s)

The problem was largely dormant until the 1940s when Soviet mathematician and economist Leonid Kantorovich revisited it from a completely different perspective: resource allocation. His brilliant insight was to relax the problem.

Setup: Instead of a deterministic map $T$, Kantorovich proposed a transport plan, denoted by $\gamma(x, y)$. This plan is a joint probability distribution on the product space of the source and target.
Interpretation: $\gamma(x, y)$ represents the amount of mass that is moved from location $x$ to location $y$. It allows for mass from a single point $x$ to be split and sent to multiple destinations, and for a single point $y$ to receive mass from multiple sources.
Constraint: The marginals of the transport plan $\gamma$ must be the original source and target distributions.
- $\int \gamma(x, y) \, dy = d\mu(x)$ (If you sum up all the mass leaving $x$, you get the original mass at $x$).
- $\int \gamma(x, y) \, dx = d\nu(y)$ (If you sum up all the mass arriving at $y$, you get the required mass at $y$). The set of all such valid transport plans is denoted $\Pi(\mu, \nu)$.
Objective: The goal is to find the optimal plan $\gamma$ that minimizes the total cost:

$$ \inf{\gamma \in \Pi(\mu, \nu)} \int{\mathbb{R}^d \times \mathbb{R}^d} c(x, y) \, d\gamma(x, y) $$

This is a linear programming problem, which is much better understood and easier to solve than Monge's original problem. It can be proven that a solution to Kantorovich's problem always exists, unlike Monge's.

3. The Wasserstein Distance (or Earth Mover's Distance)

When the cost function $c(x, y)$ is a distance, like $c(x, y) = \|x-y\|^p$, the optimal transport cost itself becomes a distance metric between the two probability distributions. This is known as the p-Wasserstein distance:

$$ Wp(\mu, \nu) = \left( \inf{\gamma \in \Pi(\mu, \nu)} \int \|x-y\|^p \, d\gamma(x, y) \right)^{1/p} $$

The Wasserstein distance is also known as the Earth Mover's Distance (EMD), especially in computer science.

Why is this so important? The Wasserstein distance is a powerful way to compare distributions because it respects the geometry of the underlying space. Metrics like the Kullback-Leibler (KL) divergence only care about the probability values at each point, not how "far apart" the points are. For example, two distributions that are slightly shifted versions of each other will have a small Wasserstein distance but could have an infinite KL divergence. This property makes OT incredibly useful for tasks involving physical or feature spaces.

Part 2: Key Theoretical Results

The theory is not just about a minimization problem; it has a deep and elegant structure.

Kantorovich Duality: Like all linear programs, the Kantorovich problem has a dual formulation. This dual problem involves finding two functions (potentials) $\phi(x)$ and $\psi(y)$ and maximizing an objective. This duality is not only theoretically important but is also key to some computational algorithms and provides economic interpretations (e.g., market equilibrium prices).
Brenier's Theorem (1991): This theorem provides a stunning connection back to Monge's problem. It states that if the cost is the squared Euclidean distance ($c(x,y) = \|x-y\|^2$), then the optimal Kantorovich transport plan $\gamma$ is not a diffuse plan after all. It is concentrated on the graph of a map $T$, meaning there is an optimal transport map just like in Monge's formulation. Furthermore, this optimal map $T$ is the gradient of a convex function, i.e., $T(x) = \nabla \Phi(x)$. This connects OT to convex analysis and the Monge-Ampère equation, a fundamental nonlinear PDE.
Computational Breakthrough: Entropic Regularization & Sinkhorn Algorithm: For a long time, the practical use of OT was limited because solving the linear program was computationally expensive, especially for large-scale problems. A major breakthrough was the introduction of entropic regularization. By adding an entropy term to the objective function, the problem becomes strictly convex and can be solved with an incredibly simple, fast, and parallelizable iterative algorithm called the Sinkhorn-Knopp algorithm. This is the single biggest reason for the explosion of OT in machine learning.

Part 3: Applications

The ability to compare distributions in a geometrically meaningful way has made OT a "killer app" in numerous fields.

1. Machine Learning & Data Science

Generative Models (GANs): The Wasserstein GAN (W-GAN) uses the Wasserstein distance as its loss function. This solves major problems of standard GANs like training instability and "mode collapse" (where the generator produces only a few types of outputs), leading to much more stable training and higher-quality generated samples.
Domain Adaptation: Imagine training a model on synthetic data (source domain) and wanting it to work on real-world data (target domain). OT can find an optimal mapping to align the feature distributions of the two domains, making the model more robust.
Word Mover's Distance (WMD): To compare two text documents, WMD treats each document as a distribution of its word embeddings (vectors representing word meanings). The distance between the documents is then the minimum "cost" to move the words of one document to become the words of the other. This provides a semantically meaningful measure of document similarity.

2. Computer Vision & Graphics

Color Transfer: The color palette of an image can be represented as a 3D distribution of (R,G,B) values. OT can find the optimal map to transfer the color style from a reference image to a target image, preserving the target's structure while adopting the reference's "mood."
Shape Matching & Interpolation: Shapes can be represented as point clouds or distributions. OT provides a natural way to define a correspondence between two shapes and a geodesic path (the "straightest line") between them in the "space of shapes." This allows for smooth and natural-looking morphing animations.
Image Retrieval: The Earth Mover's Distance is used to compare image feature distributions (e.g., color, texture histograms) for more accurate content-based image retrieval.

3. Economics

Matching Markets: This was one of Kantorovich's original motivations. OT provides a framework for problems of stable matching, such as matching workers to jobs, students to schools, or partners in a market, in a way that maximizes overall social welfare or stability. The dual potentials can be interpreted as equilibrium wages or prices.

4. Biology & Medicine

Single-Cell Biology: Single-cell RNA sequencing provides snapshots of cell populations at different time points. These populations can be viewed as distributions. OT can be used to infer developmental trajectories by finding the most likely path cells take from one time point to the next, a problem known as "trajectory inference."
Medical Image Registration: OT can be used to align medical images, for instance, an MRI and a CT scan of a patient's brain. By treating the image intensities as mass distributions, OT finds a geometrically meaningful way to warp one image to match the other.

Conclusion

Optimal Transport began as a concrete engineering problem over 200 years ago. It was later transformed by Kantorovich into a powerful tool in linear programming and economics. For decades, it remained a beautiful but computationally challenging piece of mathematics. Today, thanks to theoretical insights like Brenier's theorem and computational breakthroughs like the Sinkhorn algorithm, it has become an indispensable and versatile tool.

Its core strength lies in its unique ability to provide a distance between distributions that honors the underlying geometry of the space they live in. From moving earth to shaping the frontier of artificial intelligence, Optimal Transport is a perfect example of how deep mathematical ideas can find powerful, real-world applications across science and technology.

The Mathematical Theory of Optimal Transport and Its Applications

Introduction

Optimal transport is a beautiful mathematical theory that addresses a fundamental question: What is the most efficient way to move mass from one distribution to another? Originally formulated by Gaspard Monge in 1781 in the context of earthworks, this theory has experienced a renaissance in recent decades and now impacts numerous fields from economics to machine learning.

Historical Development

Monge's Original Problem (1781)

Monge asked: Given a pile of soil (source) and an excavation to fill (target), what is the cheapest way to transport the soil? Formally, given two probability measures μ and ν, find a transport map T that pushes μ forward to ν while minimizing the total transport cost.

Kantorovich's Relaxation (1942)

Leonid Kantorovich generalized Monge's problem by allowing "mass splitting," transforming the problem into a linear programming formulation. This relaxation made the problem more tractable and earned Kantorovich the Nobel Prize in Economics in 1975.

Mathematical Formulation

The Monge Problem

Given: - Source measure μ on space X - Target measure ν on space Y - Cost function c(x,y) representing the cost of moving mass from x to y

Find a transport map T: X → Y that minimizes:

∫ c(x, T(x)) dμ(x)

subject to T#μ = ν (T pushes μ forward to ν)

The Kantorovich Problem

Instead of a deterministic map, consider transport plans γ (joint probability measures on X × Y with marginals μ and ν):

inf {∫∫ c(x,y) dγ(x,y) : γ ∈ Π(μ,ν)}

where Π(μ,ν) is the set of all couplings with marginals μ and ν.

Wasserstein Distances

When c(x,y) = d(x,y)^p for a metric d, the optimal transport cost defines the Wasserstein-p distance:

W_p(μ,ν) = (inf_{γ∈Π(μ,ν)} ∫∫ d(x,y)^p dγ(x,y))^(1/p)

This provides a natural metric on probability measures, turning the space of probability distributions into a metric space.

Key Theoretical Results

Brenier's Theorem (1991)

For the quadratic cost c(x,y) = |x-y|²/2 on ℝⁿ with absolutely continuous measures, there exists a unique optimal transport map, and it is the gradient of a convex function: T(x) = ∇φ(x).

Monge-Ampère Equation

The optimal transport map satisfies a nonlinear PDE called the Monge-Ampère equation:

det(D²φ(x)) · ν(∇φ(x)) = μ(x)

This connects optimal transport to the theory of fully nonlinear elliptic PDEs.

Benamou-Brenier Formula

The Wasserstein-2 distance can be computed via:

W_2²(μ,ν) = inf ∫₀¹ ∫ |v_t(x)|² dμ_t(x) dt

where the infimum is over velocity fields vt and curves μt connecting μ to ν.

Applications

1. Economics and Game Theory

Matching problems: Optimal assignment of workers to jobs
Hedonic pricing: Understanding how product attributes determine prices
Market equilibrium: Analyzing competitive equilibria in matching markets

2. Machine Learning and Data Science

Generative Adversarial Networks (GANs) - Wasserstein GANs use optimal transport distances for more stable training - Provides meaningful loss functions even when distributions have disjoint supports

Domain Adaptation - Transporting knowledge from source to target domains - Color transfer between images using optimal transport maps

Clustering and Classification - Wasserstein barycenters for averaging distributions - Document classification using earth mover's distance

3. Image Processing and Computer Vision

Image Registration - Aligning medical images using optimal transport - Non-rigid image matching

Texture Synthesis - Generating textures by transporting exemplar distributions - Style transfer in neural networks

Shape Analysis - Comparing shapes via their mass distributions - Interpolation between shapes

4. Computational Biology

Single-Cell Genomics - Comparing cell populations across conditions - Trajectory inference in developmental biology - Waddington-OT for understanding cell differentiation

Population Genetics - Analyzing genetic drift using optimal transport - Comparing genomic distributions

5. Fluid Dynamics and Physics

Incompressible Euler Equations - Geometric formulation as geodesics in Wasserstein space - Understanding turbulence and fluid mixing

Plasma Physics - Particle transport in fusion reactors

6. Urban Planning and Logistics

Transportation Networks - Optimizing public transit routes - Facility location problems - Supply chain optimization

Traffic Flow - Modeling congestion using mean-field games on Wasserstein space

7. Statistics and Probability

Goodness-of-Fit Tests - Two-sample testing using Wasserstein distances - More powerful than traditional tests in high dimensions

Uncertainty Quantification - Comparing probability distributions in Bayesian inference - Robust optimization under distributional uncertainty

8. Gradient Flows and PDEs

Many important PDEs can be viewed as gradient flows in Wasserstein space: - Heat equation: Gradient flow of entropy - Fokker-Planck equation: Describes diffusion processes - Porous medium equation: Models groundwater flow

This perspective provides new analytical tools and numerical methods.

Computational Methods

Linear Programming

For discrete measures, optimal transport reduces to a linear program solvable by: - Simplex method - Network flow algorithms

Sinkhorn Algorithm

Adding entropic regularization enables fast computation: - Alternating projections (Sinkhorn-Knopp) - Complexity: O(n² log n) vs O(n³ log n) for linear programming - Widely used in machine learning applications

Semi-Discrete Transport

When one measure is discrete and one is continuous: - Reduces to solving a convex optimization problem - Applications in quantization and clustering

Recent Developments

Computational Optimal Transport

GPU implementations of Sinkhorn algorithm
Multi-scale methods for large problems
Neural network parameterizations of transport maps

Unbalanced Optimal Transport

Relaxing the mass conservation constraint: - Hellinger-Kantorovich distance - Applications where sources and targets have different total mass

Optimal Transport on Graphs and Networks

Discrete optimal transport for network data
Applications in graph matching and network alignment

Quantum Optimal Transport

Extending classical OT to quantum states
Applications in quantum information theory

Challenges and Open Problems

Computational Complexity: Exact computation scales poorly to high dimensions
Curse of Dimensionality: Statistical estimation rates degrade in high dimensions
Non-Euclidean Spaces: Extending theory to manifolds and metric spaces
Dynamical Formulations: Understanding time-dependent optimal transport
Stochastic Problems: Incorporating uncertainty in the transport problem

Conclusion

Optimal transport has evolved from an 18th-century engineering problem into a central tool in modern mathematics, connecting geometry, analysis, probability, and PDEs. Its applications span an impressive range of fields, from theoretical physics to practical machine learning. The theory continues to develop rapidly, driven by computational advances and new application domains.

The elegance of optimal transport lies in its ability to provide both: - Theoretical insights: Deep connections between different areas of mathematics - Practical tools: Efficient algorithms for real-world problems

As computational power increases and new applications emerge, optimal transport theory is likely to play an increasingly important role in data science, artificial intelligence, and scientific computing.

The Mathematical Theory of Optimal Transport and its Applications

Optimal Transport (OT), also known as the Monge-Kantorovich problem, is a powerful mathematical framework that deals with finding the most efficient way to transport resources from one distribution to another. It's a deceptively simple concept with profound implications and a rapidly growing range of applications. This explanation will cover the key aspects of the theory and its diverse applications.

1. The Origins: Monge's Problem (1781)

The seeds of Optimal Transport were sown by Gaspard Monge in 1781. He posed the following problem:

Imagine two heaps of sand, one in location A and another in location B. What is the most economical way to move all the sand from heap A to heap B, minimizing the total "work" done?

Mathematically, let:

A be a region in space representing the initial location of the sand (the "source" distribution).
B be a region in space representing the target location of the sand (the "target" distribution).
T: A -> B be a mapping (a "transport plan") that specifies where each grain of sand in A is moved to in B.
c(x, y) be a cost function that represents the cost of moving a grain of sand from point x in A to point y in B. Typically, c(x, y) = ||x - y|| or c(x, y) = ||x - y||^2 (Euclidean distance or squared Euclidean distance, respectively).

Monge's problem can then be formulated as minimizing the total cost:

min ∫_A c(x, T(x)) dx

subject to the constraint that T transports the mass from A to B. More formally, for any subset U of B, the mass in A that gets mapped to U must equal the mass of U in B:

∫_{x ∈ A : T(x) ∈ U} dx = ∫_U dy

The Limitations of Monge's Formulation:

Monge's original formulation had limitations:

Existence of Solutions: It's not guaranteed that a solution T exists, especially if the distributions A and B are very different or if the transport cost is poorly behaved. Consider the case where A is continuous and B is a single point mass. There's no deterministic map T that can accomplish this.
Singularities: The optimal T might be highly singular or even non-differentiable, making it difficult to find and analyze.
Splitting and Merging: Monge's problem doesn't allow for splitting a unit of mass at x and sending fractions of it to different locations in B, or merging different units of mass at x from different locations in A. This is a significant restriction in many practical scenarios.

2. Kantorovich's Relaxation (1942)

Leonid Kantorovich relaxed Monge's problem to overcome these limitations, leading to the more general and well-behaved Kantorovich Formulation.

Instead of a deterministic mapping T, Kantorovich considered a transport plan represented by a joint probability distribution γ(x, y) on A x B. This distribution specifies the amount of mass that is transported from x in A to y in B.

Formally, the Kantorovich problem is:

min ∫_{A x B} c(x, y) dγ(x, y)

subject to:

γ(x, y) >= 0 (the mass transported must be non-negative).
∫_B dγ(x, y) = μ(x) (the marginal distribution of γ on A must be μ, the distribution of mass in A). This means the amount of mass leaving each point x in A is correct.
∫_A dγ(x, y) = ν(y) (the marginal distribution of γ on B must be ν, the distribution of mass in B). This means the amount of mass arriving at each point y in B is correct.

Here, μ(x) and ν(y) represent the probability distributions of the source and target, respectively.

Key Advantages of Kantorovich's Formulation:

Existence of Solutions: Under mild conditions (e.g., A and B are compact metric spaces and c(x, y) is continuous), a solution to the Kantorovich problem is guaranteed to exist. This is a significant improvement over Monge's formulation.
Convexity: The Kantorovich problem is a linear program, and therefore, it is a convex optimization problem. Convex problems have well-developed theoretical properties and algorithms for finding global optima.
Handles Splitting and Merging: Kantorovich's formulation naturally allows for splitting and merging of mass. The joint distribution γ(x, y) represents the amount of mass moving from x to y, without requiring a one-to-one mapping.

3. Duality: The Kantorovich Dual Problem

The Kantorovich problem has a dual formulation, which often provides valuable insights and alternative solution methods. The Kantorovich dual problem is:

max ∫_A φ(x) dμ(x) + ∫_B ψ(y) dν(y)

subject to:

φ(x) + ψ(y) <= c(x, y) for all x ∈ A and y ∈ B.

Here, φ(x) and ψ(y) are functions defined on A and B respectively, known as Kantorovich potentials. They represent the "value" associated with the source and target locations.

Key Properties of the Dual Problem:

Weak Duality: The value of any feasible solution to the dual problem is always less than or equal to the value of any feasible solution to the primal (Kantorovich) problem.
Strong Duality: Under suitable conditions, the optimal value of the dual problem is equal to the optimal value of the primal problem. This allows us to solve either the primal or dual problem, depending on which is computationally more efficient.
Interpretation: The Kantorovich potentials can be interpreted as finding the optimal price structure such that it is never cheaper to transport goods yourself than to rely on a central planner (the transport plan).

4. The Wasserstein Distance (or Earth Mover's Distance)

The optimal value of the Kantorovich problem (the minimal transport cost) defines a metric on the space of probability distributions called the Wasserstein distance (also known as the Earth Mover's Distance or EMD). Specifically, the p-Wasserstein distance between two probability distributions μ and ν with cost function c(x, y) = ||x - y||^p is:

W_p(μ, ν) = (min_{γ ∈ Π(μ, ν)} ∫_{A x B} ||x - y||^p dγ(x, y))^{1/p}

where Π(μ, ν) is the set of all joint probability distributions γ whose marginals are μ and ν.

Key Properties of the Wasserstein Distance:

Metric: It satisfies the properties of a metric: non-negativity, identity of indiscernibles, symmetry, and the triangle inequality.
Sensitivity to Shape: Unlike other distances between distributions like the Kullback-Leibler divergence, the Wasserstein distance takes into account the underlying geometry of the space on which the distributions are defined. It effectively measures how much "earth" (probability mass) needs to be moved and how far it needs to be moved to transform one distribution into another.
Convergence: Convergence in the Wasserstein distance implies a stronger form of convergence compared to other distances, making it useful in various statistical and machine learning applications.

5. Computational Aspects

Computing the optimal transport plan and Wasserstein distance can be computationally challenging, especially for high-dimensional data. However, significant progress has been made in developing efficient algorithms:

Linear Programming: The Kantorovich problem can be formulated as a linear program and solved using standard linear programming solvers. However, this approach can be slow for large-scale problems.
Sinkhorn Algorithm: This is a fast, iterative algorithm based on entropic regularization. It adds a small entropy term to the objective function, making the problem strictly convex and solvable using alternating projections. While it provides an approximation, it scales much better to large datasets than linear programming.
Cutting Plane Methods: These methods iteratively refine a dual solution by adding constraints based on violation of the duality condition.
Specialized Algorithms: For specific types of data (e.g., discrete distributions on graphs), more specialized algorithms have been developed.

6. Applications of Optimal Transport

Optimal transport has found applications in a wide range of fields, including:

Image Processing:
- Image Retrieval: Comparing images based on their visual content using the Wasserstein distance between feature distributions.
- Color Transfer: Transferring the color palette from one image to another in a perceptually meaningful way.
- Image Registration: Aligning images from different modalities or viewpoints by finding the optimal transport between their feature maps.
- Shape Matching: Comparing and matching shapes based on their geometry and topology.
Machine Learning:
- Generative Modeling: Training generative models by minimizing the Wasserstein distance between the generated distribution and the target distribution (e.g., Wasserstein GANs). This often leads to more stable training and better sample quality compared to traditional GANs.
- Domain Adaptation: Transferring knowledge from a labeled source domain to an unlabeled target domain by aligning the distributions of their features using optimal transport.
- Clustering: Clustering data points based on their similarities, where the similarity measure is defined using optimal transport.
- Fairness in Machine Learning: Using optimal transport to mitigate bias and ensure fairness in machine learning models by aligning the distributions of sensitive attributes (e.g., race, gender) across different groups.
- Representation Learning: Learning meaningful representations of data by minimizing the cost of transporting one data point to another in the learned feature space.
Computer Graphics:
- Mesh Parameterization: Mapping a 3D mesh onto a 2D domain while minimizing distortion.
- Shape Interpolation: Creating smooth transitions between different shapes by finding the optimal transport between their surfaces.
- Texture Synthesis: Generating new textures that match the statistical properties of a given input texture.
Economics:
- Spatial Economics: Modeling the distribution of economic activity across space.
- Matching Markets: Finding the optimal assignment of workers to jobs or students to schools.
Fluid Dynamics:
- Modeling Fluid Flow: Using optimal transport to model the evolution of density distributions in fluid dynamics.
Medical Imaging:
- Image Registration: Aligning medical images from different modalities (e.g., MRI and CT scans).
- Shape Analysis: Analyzing the shape of anatomical structures to diagnose diseases.
Probability and Statistics:
- Distribution Comparison: Measuring the similarity between probability distributions.
- Statistical Inference: Developing statistical methods based on the Wasserstein distance.
Operations Research:
- Logistics and Supply Chain Management: Optimizing the transportation of goods from suppliers to customers.

7. Current Research Directions

Optimal transport is an active area of research, with several ongoing directions:

Scalable Algorithms: Developing more efficient algorithms for computing optimal transport, especially for high-dimensional data and large datasets.
Regularization Techniques: Exploring different regularization techniques to improve the stability and robustness of optimal transport solutions.
Geometric Optimal Transport: Extending optimal transport to non-Euclidean spaces, such as manifolds and graphs.
Stochastic Optimal Transport: Dealing with uncertainty in the source and target distributions.
Applications in New Domains: Exploring new applications of optimal transport in fields such as robotics, finance, and social sciences.

Conclusion:

Optimal Transport is a powerful and versatile mathematical framework for solving problems involving the efficient movement of mass. Its elegant theory, guaranteed existence of solutions, and the meaningful Wasserstein distance have led to its widespread adoption in diverse fields. As computational methods continue to improve and new applications are discovered, Optimal Transport is poised to play an even more significant role in shaping our understanding and solving real-world problems.

The mathematical theory of optimal transport and its applications.

The Mathematical Theory of Optimal Transport and Its Applications

Introduction: The Intuitive Idea

Part 1: The Core Mathematical Problem

1. Monge's Formulation (1781)

2. Kantorovich's Relaxation (1940s)

3. The Wasserstein Distance (or Earth Mover's Distance)

Part 2: Key Theoretical Results

Part 3: Applications

1. Machine Learning & Data Science

2. Computer Vision & Graphics

3. Economics

4. Biology & Medicine

Conclusion

The Mathematical Theory of Optimal Transport and Its Applications

Introduction

Historical Development

Monge's Original Problem (1781)

Kantorovich's Relaxation (1942)

Mathematical Formulation

The Monge Problem

The Kantorovich Problem

Wasserstein Distances

Key Theoretical Results

Brenier's Theorem (1991)

Monge-Ampère Equation

Benamou-Brenier Formula

Applications

1. Economics and Game Theory

2. Machine Learning and Data Science

3. Image Processing and Computer Vision

4. Computational Biology

5. Fluid Dynamics and Physics

6. Urban Planning and Logistics

7. Statistics and Probability

8. Gradient Flows and PDEs

Computational Methods

Linear Programming

Sinkhorn Algorithm

Semi-Discrete Transport

Recent Developments

Computational Optimal Transport

Unbalanced Optimal Transport

Optimal Transport on Graphs and Networks

Quantum Optimal Transport

Challenges and Open Problems

Conclusion

The Mathematical Theory of Optimal Transport and its Applications

Recent Topics

Links