UE Intro to Artificial Intelligence : Data Analysis and Mach

Code de l'ECUE : KMUIAIU

Ce cours appartient à CHOIX Mineures DS4H - M1 LEA RFI qui contient 44 ECUE

Structure

EUR DS4H

Domaine disciplinaire

Informatique

Lieu d'enseignement

Campus SophiaTech Les Lucioles , Campus Valrose

Niveau du cours

Master 1 , Master 2

Semestre

Semestre impair

Langue

Anglais

PRESENTATION

This course will develop an introduction to Machine Learning, by reviewing the fundamental principles and methods. Broadly speaking, Machine learning (ML) is the scientific field aiming at building models and inferring knowledge by applying algorithms to data. Therefore, the process involves the statistical analysis of data, and the design of models, possibly predictive. During this course, we will be more interested in the framework of use of the different methods rather than their mathematical foundations or their effective computer implementations.

This minor is open to students from the DS4H, LIFE and SPECTRUM graduate schools. According to their cursus, each student have different need and their level could be quiet different. So each session will be divided in two modules :

One lecture for all students during approximately 1h to 1h30 ;
One tutorial during approximately 2 hours with 3 separate groups adapted to 3 differents levels : Python (advanced), Python (beginner), Python (health and biological data).

Responsable(s) du cours

, Michel Riveill

Présentiel

12h de travaux dirigés

Distanciel

12h de cours magistral

PREREQUIS

Avant le début du cours, je dois ...

Licence

OBJECTIFS

A la fin de ce cours, je devrais être capable de...

• Know the principles of Machine Learning, the main classes of problems, the main models • Know how to use the tools of the domain to analyze data that do not require pre-processing

CONTENU

Session 1
General introduction (Rodrigo Cabral Farias)

This lecture will introduce the main ingredients of ML, namely the different classes of problems, the data involved in such processes, the main classes of algorithms, and the learning process.

Topics :
- Data type
- Supervised vs non-supervised learning
- Algorithms taxonomy
- Software platforms and languages
Session 2
Regression with the linear model (Rodrigo Cabral Farias)

Regression is the problem concerned with the prediction a response value from variables. This course will cover the basics of the method including the selection of variables and the design of sparse models.

Topics :
- Linear regression and least squares
- Errors and model adequacy
- Sparse models
Session 3
Classification with the logistic regression (Rodrigo Cabral Farias)

Logistic regression is a supervised classification algorithm used to model the probability of an observation to belong to a given class. To do so, a linear model is used to estimate the parameters.

Topics :
- Classification using linear models
- The logistic regression
Session 4
Support Vector Machines (Rodrigo Cabral Farias)

SVM are a popular and robust class of models to perform supervised classification. The main difficulties are to deal with classes which are partially mixed -- e.g. due to noise, and whose boundaries have a complex geometry.

Topics :
- Linear separability and support vectors
- Soft margin separators
- Kernels and non linear separation
- Multiclass classification
Session 5
Linear Discriminant Analysis / Naive Bayes (Lionel Fillatre)

LDA is another supervised classification algorithm using a linear combination of features defining boundaries separating two or more classes. This lecture will introduce LDA and compare it to the so-called Naive Bayes classifier.

Topics :
- Naive Bayes classifier
- LDA
Session 6
CART / Decision Tree / Random Forest (Lionel Fillatre)

Tree based models partition the data space to exploit local properties of the data, and can be used both for regression and classification. Multiple trees can also be combined to compensate the arbitrariness of the partitioning induced by a single tree.

Topics :
- Classification And Regression Trees
- Decision tree based classification
- Tree induction and split rules
- Ensembles of decision trees and random forests
Session 7
Clustering (k-means, hclust) (Michel Riveil

In a non supervised context, clustering aims at grouping the data in homogeneous groups by minimizing the intra-group variance. This fundamental task is surprisingly challenging due to several difficulties: the (generally) unknown number of clusters, clusters whose boundaries have a complex geometry, dealing with overlapping clusters (due to noise), dealing with high dimensional data, etc. This class will present two main clustering techniques:

Topics :
- k-means and k-means++
- Hierarchical clustering
Session 8
Dimension reduction (PCA, t-SNE) (Michel Riveill)

Dimensionality reduction methods aim at embedding high-dimensional data into a lower-dimensional space, while preserving specific properties such as pairwise distances, the data spread, etc. Originating with the celebrated Principal Components Analysis method, recent methods have focused on data located on non linear spaces.

Topics :
- Principal Component Analysis
- t-Stochastic Neighbor Embedding (t-SNE)

Accéder au Syllabus complet (Authentification requise)

Important

Ce syllabus n’a aucune valeur contractuelle. Son contenu est susceptible d’évoluer en cours d’année : soyez attentifs aux dernières modifications.