Stat 6559 Statistical exploration of data

Description

Because of the recent advancement of statistical applications in different areas, there a high demand for qualified professionals in the area of statistical techniques for data analytics and big data analysis. The
proposed course deals with a set of multivariate techniques that explore the data visualization and identification of patterns in the data. This course will equip the students with statistical and computational skills for turning big data sets into meaningful insights that may help to achieve a proper interpretation.

This course will first introduce multivariate random variable and multivariate normal distribution followed by multivariate likelihood ratio tests. Supervised learning methods such as linear discriminant analysis, logistic regression, neural networks, recursive partitioning and nearest neighbours will be included. Unsupervised learning methods that will include are principal component analysis, cluster analysis. Other  topics includes are methods for data visualization of large data sets using software R.

Tentative course outline

  1. Review of statistical estimation and testing of hypothesis.
  2. Exploring Multivariate Data -- Graphical and Estimation of Parameters
  3. Multivariate random variable, Multivariate normal distribution.
  4. Likelihood ratio statistics, Hoteling T2 statistic, and Mahalanobis distance.
  5. Unsupervised learning - Association Rules, Principal component analysis, Cluster analysis.
  6. Robust statistics.
  7. Anomaly detection methods
  8. Data visualization using R - plotting large data sets, Projection pursuit.

Texts

  • An Introduction to Statistical Learning: With Applications in R by G James, D Witten, T Hastie, R Tibshirani, 2017.
  • The applied multivariate statistical analysis, 6th ed, by RA Johnson and DW Wichern. Pearson, 2015.
  • The elements of statistical learning by T Hastie, R Tibshirani, and J Friedman. Springer, 2009.