Managing Big Data
Course ID:
Semester: 1st
Year of Study: 1st Year
Category: Compulsory
For Erasmus Students: Όχι
Learning Outcomes
The course aims to make students comprehend the role big data has to play today in the field of economics and especially in the field of applied economics as well as enrich their knowledge of quantitate methods with algorithms that can be applied and work properly when used in big data settings.
After successful completion of the course, students should be able to:
- Define the notion of big data
- Recognize environments that are characterized by big data
- Comprehend the role of computers when dealing with big data
- Understand the role of big data in economics
- Get informed about case studies related to big data in economics
- Understand the stages and processes big data must go through from preprocessing to interpreting the analytics
- Become familiar with the basic algorithms for preprocessing and machine learning, including supervised and unsupervised ones, in the fields of Regression, Classification, Clustering and Association rule learning that can be applied in big data settings.
- Get experience in using multimedia data (such as text and images) as variables in econometric models
- Distinguish the situations in which each of the machine learning algorithms can and should be used
- Use machine learning algorithms for the study of economic questions when big data is involved
- Assess the results of machine learning algorithms and compare them with the results of traditional approaches in economics
- Interpret the result of analytics when these are carried out in big data settings
- Use programming languages R and Python and their library ecosystem to apply machine learning algorithms in big data contexts
Course Contents
- The concept of big data and its role in the field of economics today
Use cases of big data analytics
Preprocessing methods for big data
- Data reduction: Priincipal Component Method (PCA)
- Data cleaning: Binning method
- Data transformation: Normalization, discretization
- Similarity and distance metrics
- Use cases of preprocessing methods in the context of big data in the fields of economics
Regression methods
- Batch Gradient Descent
- Mini Batch Gradient Descent
- Stochastic Gradient Descent
- Evaluating the estimations of the Gradient Descent methods and comparing them to the results of the OLS method
- Use cases of regression methods in the context of big data in the fields of economics
Classification methods
- Decision trees
- K-nn
- Naïve Bayes
- Use cases of classification methods in the context of big data in the fields of economics
Clustering methods
- K-means / K-modes
- Hierarchical clustering
- Use cases of clustering methods in the context of big data in the fields of economics
Association rules mining methods
- Apriori algorithm
- Use cases of association rules mining methods in the context of big data in the fields of economics
Teaching Activities
Lectures (3 hours/week) and Tutorials (1 hours/week)
Teaching Organization
Activity |
Semester Workload |
Lectures (3 hours per week x 13 weeks) |
39 hours |
Lab practice (1 hours per week x 13 weeks) |
13 hours |
Self-study |
148 hours |
Course Total |
200 hours |
Assessment
Student’s performance evaluation is based on a written exam at the end of the semester (70% of the final grade) and the grade they achieve in 3 mandatory exercises (30% of the final grade).
Written exam aims at assessing if students understand the topics presented and discussed in lectures while the exercises attempt to asses if students can apply the discussed algorithms on real data and interpret the obtained results.
All 3 exercises are carried out by groups and meet the following criteria:
- Cover all algorithms taught and discussed during lectures
- Are announced after each course unit
- Require writing R and Python programs that apply the algorithms taught onto real data to answer economic questions and discuss the results
- Writing a report using Lateχ that evaluates the results and discussed them in the context of contemporary economics literature.
Use of ICT
- Using notes and slides to deliver lectures in classes that are made available to students via the eclass.upatras.gr portal.
- Using source code in R and Python to demonstrate the discussed machine learning algorithms that are part of the courses’ syllabus
- Using the e-learning portal eclass.upatras.gr in order to:
- Organize and hand out notes
- Announce weekly quizzes
- Announce group exercises
- Communicate with enrolled students
- Distributing open and freely available notes