Formation M2 Mathematics for Data Sciences
Semester1
BlocMathematical tools for data analysis
Teaching taffLecturer : Grégory Ginot., Ian Morilla :
Credits 4 ECTS
Teaching hours 21h of lectures + 21h of TA
ValidationContinuous examination+final exam

Class overview

The goal of the lecture is to give an introduction to Topological Data Analysis (TDA) focusing on the ideas at the core of that field and more particularly on one of its most fundamental tools: persistent homology. Topological Data Analysis is a recent field whose goal was to make easily usable, computable, effective in practice and computerizable some tools of algebraic topology. The core idea in TDA is to use the invariants of algebraic topology to classify data, compare them, and find characteristics or signatures of certain types of data. Among cocrete applications, one can cite shapes and pattern recognition reconnaissance de formes, image analysis which have for instance allowed to classify some bacteria, neurons (moelcular spectrscopy) and even help detecting some forms of cancer (through MRI).. Here are some parts that might be covered during the class :

  • Data sets and simplices : a key feature in TDA is to associate to a data set (a discrete set of points or measures) a simplicial complex approximating the space the data sets are an approximation of simplexes and polytopes glued together along common subfaces. We will explain what are simplicial complexes, their geometric realization and how to practically code or obtained these from data: in particular we will discuss Cech complexes and Rips complexes. We will relate these various constructions as well.
  • Homology of simplicial complexes : in order to understand a simplicial complex or space it approximates up to small perturbation or deformation, we will study its topological invariants given by its homology groups. We will explain the construction and give their most fundamental properties, incuding homotopy invariance.
  • Barcodes and persistent homology : persistent homology is associated to filtered spaces or simplicial complexes as those given by data sets. We will explain how to encode combinatorially the invariant obtained by looking at how the homology classes vary along the filtration. These invariants are called the barcodes.
  • Persistent homology associated to functions : we will explain how one can associate persistent homology groups and hence barcodes to functions on a compact metric space; using ideas of Morse theory. In practice those functions arise from measures.
  • Bottleneck and interleaving distance : to compare various data sets, we use a (pseudo)metric defined on their barcodes. We will explain the construction of those and their fundamental properties, in particular, the stability theorems. .
  • Topological inference and interpretation of distances : we will give an overveiw of the practical use of the above tools oto study data, differenciate and classify them.