.
Machine Learning and Big Data
Study Course Description
Course Description Statuss:Approved
Course Description Version:4.00
Study Course Accepted:14.03.2024 11:41:03
Study Course Information | |||||||||
Course Code: | SL_120 | LQF level: | Level 7 | ||||||
Credit Points: | 2.00 | ECTS: | 3.00 | ||||||
Branch of Science: | Mathematics; Theory of Probability and Mathematical Statistics | Target Audience: | Life Science | ||||||
Study Course Supervisor | |||||||||
Course Supervisor: | Andrejs Ivanovs | ||||||||
Study Course Implementer | |||||||||
Structural Unit: | Statistics Unit | ||||||||
The Head of Structural Unit: | |||||||||
Contacts: | 23 Kapselu street, 2nd floor, Riga, statistikarsu[pnkts]lv, +371 67060897 | ||||||||
Study Course Planning | |||||||||
Full-Time - Semester No.1 | |||||||||
Lectures (count) | 6 | Lecture Length (academic hours) | 2 | Total Contact Hours of Lectures | 12 | ||||
Classes (count) | 6 | Class Length (academic hours) | 2 | Total Contact Hours of Classes | 12 | ||||
Total Contact Hours | 24 | ||||||||
Part-Time - Semester No.1 | |||||||||
Lectures (count) | 6 | Lecture Length (academic hours) | 1 | Total Contact Hours of Lectures | 6 | ||||
Classes (count) | 6 | Class Length (academic hours) | 2 | Total Contact Hours of Classes | 12 | ||||
Total Contact Hours | 18 | ||||||||
Study course description | |||||||||
Preliminary Knowledge: | Higher mathematics, probability, statistics, basic knowledge of R programming. | ||||||||
Objective: | Machine learning (ML) involves the study of algorithms that can extract information automatically and induce new knowledge from data. ML tasks are often related to large datasets, that create challenges in the areas of data storage, organization and processing. The response to these challenges is addressed by the discipline of the big data analytics. The aim of this course is to introduce students to the most important methods of machine learning: variations of regression and classification algorithms, as well as introduce the concepts of deep learning and big data analytics. The methods will be explored by case studies implemented in R program. | ||||||||
Topic Layout (Full-Time) | |||||||||
No. | Topic | Type of Implementation | Number | Venue | |||||
1 | Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap). | Lectures | 1.00 | computer room | |||||
2 | R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R. | Classes | 1.00 | computer room | |||||
3 | Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression. | Lectures | 1.00 | computer room | |||||
4 | Implementing regression methods in R. Comparing the performance of various regression models. | Classes | 1.00 | computer room | |||||
5 | Classification methods I: KNN, tree-classification, random forests. | Lectures | 1.00 | computer room | |||||
6 | Implementing simple classification models in R. Comparing the performance of various models. | Classes | 1.00 | computer room | |||||
7 | Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines. | Lectures | 1.00 | computer room | |||||
8 | Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models. | Classes | 1.00 | computer room | |||||
9 | Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers. | Lectures | 1.00 | computer room | |||||
10 | Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R. | Classes | 1.00 | computer room | |||||
11 | Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R. | Lectures | 1.00 | computer room | |||||
12 | Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches. | Classes | 1.00 | computer room | |||||
Topic Layout (Part-Time) | |||||||||
No. | Topic | Type of Implementation | Number | Venue | |||||
1 | Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap). | Lectures | 1.00 | computer room | |||||
2 | R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R. | Classes | 1.00 | computer room | |||||
3 | Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression. | Lectures | 1.00 | computer room | |||||
4 | Implementing regression methods in R. Comparing the performance of various regression models. | Classes | 1.00 | computer room | |||||
5 | Classification methods I: KNN, tree-classification, random forests. | Lectures | 1.00 | computer room | |||||
6 | Implementing simple classification models in R. Comparing the performance of various models. | Classes | 1.00 | computer room | |||||
7 | Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines. | Lectures | 1.00 | computer room | |||||
8 | Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models. | Classes | 1.00 | computer room | |||||
9 | Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers. | Lectures | 1.00 | computer room | |||||
10 | Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R. | Classes | 1.00 | computer room | |||||
11 | Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R. | Lectures | 1.00 | computer room | |||||
12 | Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches. | Classes | 1.00 | computer room | |||||
Assessment | |||||||||
Unaided Work: | 1. Review of compulsory and additional literature to expand the knowledge acquired in lectures and classes. 2. Students will be expected to hand in 4 R based computer assignments related to course topics. | ||||||||
Assessment Criteria: | Assessment on the 10-point scale according to the RSU Educational Order: • Computer assignments to be handed in – 70%. • Final exam – 30%. | ||||||||
Final Examination (Full-Time): | Exam (Written) | ||||||||
Final Examination (Part-Time): | Exam (Written) | ||||||||
Learning Outcomes | |||||||||
Knowledge: | • Selects the resampling methods and criteria of model accuracy assessment. • Explain the most important regression and classification algorithms. • Identifies the Big Data concept. | ||||||||
Skills: | • Can independently implement regression and classification machine learning algorithms in R. • Analytical evaluation R computational limitations and selects strategies to overcome those. | ||||||||
Competencies: | • Can critically compare various machine learning strategies and choose the appropriate algorithm for the problem at hand. | ||||||||
Bibliography | |||||||||
No. | Reference | ||||||||
Required Reading | |||||||||
1 | Chollet, F., Allaire, J.J. (2018) Deep learning with R, Manning Publications, Shelter Island. Parts I, II and III. | ||||||||
2 | Luraschi, J., Kuo, K., Ruiz E. (2019) Mastering Spark with R. O’Reilly. Chapters 1 – 4. | ||||||||
Additional Reading | |||||||||
1 | James, G., Witten, D., Hastie, T. and Tibshirani (2013). An Introduction to Statistical Learning with Applications in R., R., Springer-Verlag | ||||||||
2 | Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning., Springer-Verlag | ||||||||
3 | Simon Walkowiak (2016). Big data analytics with R. Utilize R to uncover hidden patterns in your Big Data. Packt Publishing, Birmingham, Chapters 3 - 7. | ||||||||
4 | Torgo, J. (2017) Data mining with R: learning with Case Studies, Chapman & Hall/CRC |