**Machine Learning and Uncertainty Quantification for Data Science**

MA59800-550, Fall 2016

Office hours: TR
04:20-05:00pm (tentative) or by appointment

Email: guanglin [at] purdue.edu

--------------------------------------------------------------------------------------------------------

**Lectures Time and Location**

TR 03:00-04:15pm

Classroom: UNIV 103

--------------------------------------------------------------------------------------------------------

**Syllabus**

--------------------------------------------------------------------------------------------------------

**Graduate Course Description**

This introductory course will cover many concepts, models,
algorithms and Python codes in machine learning and uncertainty quantification
in data science. Topics include classical supervised learning (e.g., regression
and classification), unsupervised learning (e.g., principle component analysis
and K-means), uncertainty quantification algorithms (e.g., importance sampling,
Markov Chain Monte Carlo) and recent development in the machine learning and
uncertainty quantification field such as deep machine learning, variational Bayes, and Gaussian processes. While this
course will give students the basic ideas, intuition and hands on practice
behind modern machine learning and uncertainty quantification methods, the
underlying theme in the course is probabilistic inference for data science.

**Tentative Topics**

1. Review on basic concepts in information theory
and probability distributions

2. Unsupervised machine learning algorithm

a. Principal component analysis

3. Supervised machine learning algorithm

a. Active subspace algorithm

b. Sliced inverse regression

c. Localized sliced inverse regression

4. Dimension reduction algorithms & data
compression

5. Compressive sensing algorithm

6. Linear regression and classification

7. Bayesian inference

8. Clustering analysis (K-means Clustering, mixture
models and Expectation Maximization)

9. Stochastic gradient descent algorithm

10. Random forest

11. Hidden Markov models

12. Deterministic approximate inference: Variational Bayes, and expectation propagation

13. Support Vector Machines regression and
classification

14. Gaussian process regression, Gaussian process
classification

15. Uncertainty quantification algorithms

a. Monte Carlo,

b. Latin Hyper Cube sampling,

c. Importance sampling

d. Polynomial chaos method

e. Gaussian process regression

f. Compressive sensing

16. Data assimilation (Kalman
filter, particle filter)

--------------------------------------------------------------------------------------------------------

**Prerequisites**

Basic linear algebra,
calculus, and probability, or permission of instructor.

**Textbooks **

Pattern Recognition and
Machine Learning, Christopher M. Bishop, 2007

Gaussian processes for machine learning, Carl Edward Rasmussen and Christopher
K. I. Williams, 2005

**Lecture Notes in Python Interactive Jupyter
Notebook:**

**https://github.com/PredictiveModelingMachineLearningLab/MA598**

**Assignments**

- Homework (links will be activated as homework is
assigned). Copying will not be tolerated.
- Homework will be submitted through blackboard.
- Review of recent research

Students will choose a subtopic of machine learning research, select three recent conference papers on the topic, and write a 2 page report outlining the main ideas of papers and relate them to the context of the course. And give a presentation on the main ideas during the class. - Final project: You are
required to complete a class project. The choice of the topic is up to you
so long as it clearly pertains to the course material. To ensure that you
are on the right track, you will have to submit a one paragraph
description of your project a month before the project is due. You are encouraged
to collaborate on the project, but we expect a four-page write-up about
the project, which should clearly describe the project goal, methods, and
your results. Each group should submit only one copy of the write-up and
include all the names of the group members (a two-person group will have 6
pages, a three-person group will have 8 pages, and so on).

**Grading**

- 10% Participation
- 15% Review of recent research
- 40% Homework
- 35% Final Project