University Côte d'azur

UE Intro to Artificial Intelligence : Data Analysis and Mach


Campus SophiaTech Les Lucioles , Campus Valrose
Master 1 , Master 2
Semestre impair


This course will develop an introduction to Machine Learning, by reviewing the fundamental principles and methods. Broadly speaking, Machine learning (ML) is the scientific field aiming at building models and inferring knowledge by applying algorithms to data. Therefore, the process involves the statistical analysis of data, and the design of models, possibly predictive. During this course, we will be more interested in the framework of use of the different methods rather than their mathematical foundations or their effective computer implementations.

This  minor is open to students from the DS4H, LIFE and SPECTRUM graduate schools. According to their cursus, each student have different need and their level could be quiet different. So each session will be divided in two modules :

  • One lecture for all students during approximately 1h to 1h30 ;
  • One tutorial during approximately 2 hours with 3 separate groups adapted to 3 differents levels : Python (advanced), Python (beginner), Python (health and biological data).

Course's manager(s)

, Michel Riveill

In class

  • 12h of directed studies


  • 12h of lectures


Before the start of the course, I must ...
  • Licence


By the end of this course, I must be able to...
  • • Know the principles of Machine Learning, the main classes of problems, the main models • Know how to use the tools of the domain to analyze data that do not require pre-processing


  • General introduction (Rodrigo Cabral Farias)

    This lecture will introduce the main ingredients of ML, namely the different classes of problems, the data involved in such processes, the main classes of algorithms, and the learning process.

    Topics :

    • Data type
    • Supervised vs non-supervised learning
    • Algorithms taxonomy
    • Software platforms and languages
  • Regression with the linear model (Rodrigo Cabral Farias)

    Regression is the problem concerned with the prediction a response value from variables. This course will cover the basics of the method including the selection of variables and the design of sparse models.

    Topics :

    • Linear regression and least squares
    • Errors and model adequacy
    • Sparse models
  • Classification with the logistic regression (Rodrigo Cabral Farias)

    Logistic regression is a supervised classification algorithm used to model the probability of an observation to belong to a given class. To do so, a linear model is used to estimate the parameters.

    Topics :

    • Classification using linear models
    • The logistic regression
  • Support Vector Machines (Rodrigo Cabral Farias)

    SVM are a popular and robust class of models to perform supervised classification. The main difficulties are to deal with classes which are partially mixed -- e.g. due to noise, and whose boundaries have a complex geometry.

    Topics :

    • Linear separability and support vectors
    • Soft margin separators
    • Kernels and non linear separation
    • Multiclass classification
  • Linear Discriminant Analysis / Naive Bayes (Lionel Fillatre)

    LDA is another supervised classification algorithm using a linear combination of features defining boundaries separating two or more classes. This lecture will introduce LDA and compare it to the so-called Naive Bayes classifier.

    Topics :

    • Naive Bayes classifier
    • LDA
  • CART / Decision Tree / Random Forest (Lionel Fillatre)

    Tree based models partition the data space to exploit local properties of the data, and can be used both for regression and classification. Multiple trees can also be combined to compensate the arbitrariness of the partitioning induced by a single tree.

    Topics :

    • Classification And Regression Trees
    • Decision tree based classification
    • Tree induction and split rules
    • Ensembles of decision trees and random forests
  • Clustering (k-means, hclust) (Michel Riveil

    In a non supervised context, clustering aims at grouping the data in homogeneous groups by minimizing the intra-group variance.  This fundamental task is surprisingly challenging due to several difficulties: the (generally) unknown number of clusters, clusters whose boundaries have a complex geometry, dealing with overlapping clusters (due to noise), dealing with high dimensional data, etc. This class will present two main clustering techniques:

    Topics :

    •     k-means and k-means++
    • Hierarchical clustering
  • Dimension reduction (PCA, t-SNE) (Michel Riveill)

    Dimensionality reduction methods aim at embedding high-dimensional data into a lower-dimensional space, while preserving specific properties such as pairwise distances, the data spread, etc. Originating with the celebrated Principal Components Analysis method, recent methods have focused on data located on non linear spaces.

    Topics :

    • Principal Component Analysis
    • t-Stochastic Neighbor Embedding (t-SNE)
Access to complete Syllabus (Authentification required)
This syllabus has no contractual value. Its content is subject to change throughout this year: be aware to the last updates