Statistics: Theory and Practice

Overview

Credit value: 30 credits at Level 6
Convenor: Haris Jameel
Assessment: three problem sets (10% each) and a three-hour examination (70%)

Module description

In this module we give an overview of the main theoretical ideas that underpin practices in routine or innovative uses of the theory of statistics and its applications.

If your focus is on statistics, this will give you the necessary knowledge for final-year undergraduate or postgraduate study. The module can also serve as a ‘stopping-off’ point if you are a mathematician wishing to push your statistical knowledge beyond introductory level.

You will also gain a working knowledge of a high-level statistical programming language, such as R.

Indicative syllabus

Probability and distribution theory

Probability spaces, review of conditional probability and independence
Discrete and continuous random variables and their moments
Functions of random variables, with emphasis on generating functions
Collections of random variables, conditional distributions and expectation
The multivariate normal distribution, with emphasis on the bivariate normal

Introduction to statistical inference

Point and interval estimation (with examples relating to the normal distribution)
Introduction to hypothesis testing (with examples relating to the normal distribution)
Likelihood and sufficiency, the Factorization Theorem
Maximum likelihood estimators

Completely randomized one-way design

Introduction to R
Design and analysis of completely randomized one-way design (theory and practice in R)
The chi-square and F distributions, and their relationship to analysis of variance techniques
Least squares estimators
Estimation and comparison of treatment effects
Analysis of residuals

Linear regression

Simple linear regression, analysis of residuals and prediction
Multiple linear regression, ANOVA, testing redundancy
Stepwise regression
Modelling linear regression using R

Learning objectives

By the end of this module, you will be able to:

set up and carry out a simple designed experiment which allows for the testing of the influence of certain factors using ANOVA techniques
collate and analyse data arising from a simple designed experiment within a package (like R), and draw appropriate conclusions
specify and recognise the joint distribution of several random variables given appropriate assumptions on the marginal distributions and their dependence structure
specify and recognise the multivariate normal distribution, and some of its important properties, particularly in relation to specific graphical properties of the bivariate normal distribution
derive key results pertaining to the Chi-squared and Fisher distributions, and relate these to the theoretical basis for the ANOVA technique
formulate and derive maximum likelihood estimators (and appreciate how these differ from those based on the method of moments)
determine whether a statistic is sufficient for a given parameter
appreciate the theoretical underpinning behind hypothesis testing and acknowledge how hypothesis tests are carried out across several different paradigms
determine whether a given data set is amenable to analysis using multiple linear regression
import or enter data into a statistical package, like R, and perform multiple linear regression by principally using command line functions (rather than menu-driven GUI operations)
interpret and draw conclusions from a statistical analysis, and present these conclusions so that they can either i) be well understood by a statistician, or ii) be accessible (in a non-misleading way) to the intelligent lay-person/non-statistician (who may be involved in policy development).