Introduction to Data Analytics using R
Overview
- Credit value: 15 credits at Level 6
- Convenor: Dr Cen Wan
- Assessment: problem-solving worksheets (20%) and a two-hour examination (80%)
Module description
In this module we cover the principle concepts and techniques of data analytics and how to apply them to large-scale data sets. You will develop the core skills and expertise needed by data scientists, including the use of techniques such as linear regression, classification and clustering. We will show you how to use the popular and powerful data analysis language and environment R to solve practical problems based on use cases extracted from real domains.
Indicative syllabus
- Introduction to big data analytics: big data overview, data pre-processing, concepts of supervised and unsupervised learning
- Basic statistics: mean, median, standard deviation, variance, correlation, covariance
- Simple linear regression
- Classification: logistic regression, decision trees
- Ensemble methods: bagging, random forests, boosting
- Clustering: K-means, K-medoids, hierarchical clustering
- Evaluation and validation: cross-validation, assessing the statistical significance of data mining results
- Tools: R
Learning objectives
By the end of this module, you will be able to:
- recognise the state of practice in data analytics in the industry
- demonstrate knowledge on linear regression, classification and clustering
- understand the ensemble methods and use them to analyse data on a big scale
- use the open-source tool R for performing the above tasks and apply the methods in real application scenarios
- validate and evaluate the data analysis results.