# Major in Data Science

Note that the University Calendar is the definitive authority on program regulations. It is the student's responsibility to check that they have met all program, degree, and graduation requirements. Any errors in what is presented outside the Calendar (such as in the diagram below) will not be grounds for appeal if graduation requirements are not met.

## Program Map

The following will appear in the 2024-25 University Calendar:

**Major in Data Science**

As a component of the Degree Regulations for the General Degree of Bachelor of

Science or the Degree Regulations for the General Degree of Bachelor of Arts, as

appropriate, a student shall complete the following requirements:

1. Mathematics 1000, 1001, 2000, 2050, 2320

2. Statistics 1500, 2410, 3411, 3521, 3530, 3585, 4411, 4486, 4502

3. Computer Science 1001, 1003

4. Statistics 2500 or 2550. Statistics 2550 is recommended.

5. Statistics 2530 or 2560

6. Statistics 2485 or both Computer Science 2001 and 2002

7. Statistics 3486 or Computer Science 3202

8. Six further credit hours in Statistics, Mathematics or Computer Science courses

numbered 3000 or higher including at least 3 credit hours in courses numbered

4000 or higher excluding Statistics 4581.

New Statistics Courses:

**STAT 2485 R for Data Science** provides a basic introduction to the programming language R. This course focuses on the foundations of coding, and development of basic programming skills for the effective handling of data structures and processes oriented towards the analysis of data.

CO: STAT 2500 or 2550

**STAT 2530** **Statistical Data Analytics** builds up from the basic techniques of analysis and visualization of data presented in any of our introductory courses. It uses the programming language R as the basic computational device. Mainstream techniques of predictive nalytics and statistical learning are presented in a hands-on approach.

PR: STAT 2550 or Mathematics 1000 or Mathematics 1005 or Mathematics 1006 and one of STAT 1500 or STAT 2500

**STAT 3486** **Statistical Learning** introduces statistical learning, including a brief overview of linear regression, and other important topics in data science, such as classification, resampling and cross validation, linear model selection, nonlinear models, tree-based models and unsupervised learning.

CR: Computer Science 3202

PR: Mathematics 2000, STAT 2485, STAT 2530 or 2560

**STAT 3530 Analysis of Observational Data** introduces sampling concepts, probability sampling designs including simple random sampling and stratified random sampling, study designs, and methods for analysis of observational data including measures of risk and association, inference for measures of association, confounding and logistic regression modeling.

PR: STAT 2530 or 2560

**STAT 4411 Bayesian Data Analysis** is an introductory course to the Bayesian data analysis with applications. The topics include basic principles of Bayesian modeling and inference, methods and theoretical aspects of Bayesian analysis, Bayesian computation and applications, and special topics in Bayesian data analysis. Statistical computing software R will be used to explore data sets using the techniques.

PR: STAT 3411

**STAT 4486 Neural Networks and Deep Learning** presents the theoretical foundations of artificial neural networks. Topics include a mathematical derivation of basic architectures, regularization of neural networks, their stability, generalization abilities and their relation to various areas of mathematics and probability, including hidden Markov chains, stochastic dynamical systems, graph theory

and numerical analysis.

PR: STAT 3521, STAT 3486 or Computer Science 3202

**STAT 4502** **Applied Stochastic Processes** aims to provide students with a basic understanding of the probabilistic models and

techniques underlying the most widely used classes of stochastic processes, such as Bernoulli processes, Poisson processes, renewal processes and Markov chains. The main focus is on modeling aspects, which are completed by a description of some

popular algorithms for simulation using R.

PR: STAT 3585