Introduction to the Mathematical Analysis of Data
This course will explore the mathematics of some advanced important methods such as Monte Carlo Methods, EM method, MCMC method and others.
Instructor: Prof. Chávez Casillas
Term: Spring
Course Overview
In our present world of automation, cloud computing, algorithms, artificial intelligence, and big data, few topics are as relevant as data science and machine learning. Their recent popularity lies not only in their applicability to real-life questions, but also in their natural blending of many different disciplines, including mathematics, statistics, computer science, engineering, science, and finance. While many practitioners may be satisfied with only learning how to use off-the-shelf recipes to apply to practical situations, in this course, we will delve into what happens if the assumptions of the black-box recipe are violated and whether we can still trust the results or, if not, understand how the algorithm should be adapted.
Brief Course Description: The course will be divided mainly in three parts (2 more parts will be discussed if time allows):
- Overview of Statistical Learning: Here, we introduce some common concepts and themes in statistical learning. We will discuss the difference between supervised and unsupervised learning, and how we can assess the predictive performance of supervised learning. We also examine the central role that the linear and Gaussian properties play in the modeling of data. We conclude with a section on Bayesian learning. The required probability background was supposed to be covered in MTH 451.
- Monte Carlo Methods: Many algorithms in machine learning and data science make use of Monte Carlo techniques. This part of the course provides an introduction to the three main uses of Monte Carlo simulation: to (1) simulate random objects and processes in order to observe their behavior, (2) estimate numerical quantities by repeated sampling, and (3) solve complicated optimization problems through randomized algorithms. The required probability background was supposed to be covered in MTH 451.
- Unsupervised Learning: When there is no distinction between response and explanatory variables, unsupervised methods are required to learn the structure of the data. In this part of the course we will look at various unsupervised learning techniques, such as density estimation, clustering, and principal component analysis. Important tools in unsupervised learning include the cross-entropy training loss, mixture models, the Expectation-Maximization algorithm, and the Singular Value Decomposition. The required linear algebra background was supposed to be covered in MTH 215.
- Linear Methods for Classification (time allowing): These models aim to predict a value from a finite (though still possibly large) set of classes or categories. That is, they try to assign the best value from a set of categories to a point in the data set. The main idea is to divide the input space into a collection of regions labeled according to the classification. When these decision boundaries are linear, the method is called a linear method for classification.
- Regularization and Kernel Methods (if time allows): The purpose of this section of the course is to familiarize the reader with two central concepts in modern data science and machine learning: regularization and kernel methods. Regularization provides a natural way to guard against overfitting and kernel methods offer a broad generalization of linear models. Here, we discuss regularized regression (ridge, lasso) as a bridge to the fundamentals of kernel methods. We will introduce reproducing kernel Hilbert spaces and show that selecting the best prediction function in such spaces is in fact a finite-dimensional optimization problem. Applications to spline fitting, Gaussian process regression, and kernel PCA are given.
- Deep Learning (if time allows): In this part of the course, we will see how to construct a rich class of approximating functions called neural networks. The learners belonging to the neural-network class of functions have attractive properties that have made them ubiquitous in modern machine learning applications - their training is computationally feasible and their complexity is easy to control and fine-tune.
Prerequisites
- MTH 451
- MTH 215
- CSC 310
Textbooks
There is no official textbook for this class, but class notes will be provided on a regular basis.
Grading
- Midterm Exam: 20%
- Homework: 40%
- Final Exam Research Presentation: 10%
- Cumulative Final Exam: 30%