Skip to main content

Machine Learning and Big Data

Study Course Description

Course Description Statuss:Approved
Course Description Version:4.00
Study Course Accepted:14.03.2024 11:41:03
Study Course Information
Course Code:SL_120LQF level:Level 7
Credit Points:2.00ECTS:3.00
Branch of Science:Mathematics; Theory of Probability and Mathematical StatisticsTarget Audience:Life Science
Study Course Supervisor
Course Supervisor:Andrejs Ivanovs
Study Course Implementer
Structural Unit:Statistics Unit
The Head of Structural Unit:
Contacts:23 Kapselu street, 2nd floor, Riga, statistikaatrsu[pnkts]lv, +371 67060897
Study Course Planning
Full-Time - Semester No.1
Lectures (count)6Lecture Length (academic hours)2Total Contact Hours of Lectures12
Classes (count)6Class Length (academic hours)2Total Contact Hours of Classes12
Total Contact Hours24
Part-Time - Semester No.1
Lectures (count)6Lecture Length (academic hours)1Total Contact Hours of Lectures6
Classes (count)6Class Length (academic hours)2Total Contact Hours of Classes12
Total Contact Hours18
Study course description
Preliminary Knowledge:
Higher mathematics, probability, statistics, basic knowledge of R programming.
Objective:
Machine learning (ML) involves the study of algorithms that can extract information automatically and induce new knowledge from data. ML tasks are often related to large datasets, that create challenges in the areas of data storage, organization and processing. The response to these challenges is addressed by the discipline of the big data analytics. The aim of this course is to introduce students to the most important methods of machine learning: variations of regression and classification algorithms, as well as introduce the concepts of deep learning and big data analytics. The methods will be explored by case studies implemented in R program.
Topic Layout (Full-Time)
No.TopicType of ImplementationNumberVenue
1Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).Lectures1.00computer room
2R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.Classes1.00computer room
3Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.Lectures1.00computer room
4Implementing regression methods in R. Comparing the performance of various regression models.Classes1.00computer room
5Classification methods I: KNN, tree-classification, random forests.Lectures1.00computer room
6Implementing simple classification models in R. Comparing the performance of various models.Classes1.00computer room
7Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.Lectures1.00computer room
8Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.Classes1.00computer room
9Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.Lectures1.00computer room
10Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.Classes1.00computer room
11Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.Lectures1.00computer room
12Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.Classes1.00computer room
Topic Layout (Part-Time)
No.TopicType of ImplementationNumberVenue
1Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).Lectures1.00computer room
2R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.Classes1.00computer room
3Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.Lectures1.00computer room
4Implementing regression methods in R. Comparing the performance of various regression models.Classes1.00computer room
5Classification methods I: KNN, tree-classification, random forests.Lectures1.00computer room
6Implementing simple classification models in R. Comparing the performance of various models.Classes1.00computer room
7Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.Lectures1.00computer room
8Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.Classes1.00computer room
9Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.Lectures1.00computer room
10Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.Classes1.00computer room
11Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.Lectures1.00computer room
12Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.Classes1.00computer room
Assessment
Unaided Work:
1. Review of compulsory and additional literature to expand the knowledge acquired in lectures and classes. 2. Students will be expected to hand in 4 R based computer assignments related to course topics.
Assessment Criteria:
Assessment on the 10-point scale according to the RSU Educational Order: • Computer assignments to be handed in – 70%. • Final exam – 30%.
Final Examination (Full-Time):Exam (Written)
Final Examination (Part-Time):Exam (Written)
Learning Outcomes
Knowledge:• Selects the resampling methods and criteria of model accuracy assessment. • Explain the most important regression and classification algorithms. • Identifies the Big Data concept.
Skills:• Can independently implement regression and classification machine learning algorithms in R. • Analytical evaluation R computational limitations and selects strategies to overcome those.
Competencies:• Can critically compare various machine learning strategies and choose the appropriate algorithm for the problem at hand.
Bibliography
No.Reference
Required Reading
1Chollet, F., Allaire, J.J. (2018) Deep learning with R, Manning Publications, Shelter Island. Parts I, II and III.
2Luraschi, J., Kuo, K., Ruiz E. (2019) Mastering Spark with R. O’Reilly. Chapters 1 – 4.
Additional Reading
1James, G., Witten, D., Hastie, T. and Tibshirani (2013). An Introduction to Statistical Learning with Applications in R., R., Springer-Verlag
2Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning., Springer-Verlag
3Simon Walkowiak (2016). Big data analytics with R. Utilize R to uncover hidden patterns in your Big Data. Packt Publishing, Birmingham, Chapters 3 - 7.
4Torgo, J. (2017) Data mining with R: learning with Case Studies, Chapman & Hall/CRC