The course provides an overview of recent advances in distributed systems for Big Data processing. The course starts presenting computational models for high throughput batch processing like MapReduce. Next, we will introduce software engineering techniques for distributed systems such as REST and component-based architectures. We will then cover low latency real time stream processing and complex event processing. Finally, we will present advanced topics in distributed data-intensive systems, such as geodistribution and security. The course focuses both on the fundamental concepts as well as on the concrete technologies and applications of the aforementioned techniques to real-world case studies.
Please use the forum for your questions. Other students can benefit from the answers to your questions and can help you, too. Answers will remain as a reference for other people.
Concepts and Technologies for Distributed Systems and Big Data Processing
Lectures are held on Tuesdays, 9:50–11:30 in S202/C205.
April 12
Course information, Motivation, Introduction to distributed systems
Slides 1
Slides 2
Exercise 1
April 19
Slides 3
Slides 4
Exercise 2 + Code
Solution 2 + Code
MapReduce paper by Dean and Ghemawat
April 26
Slides 5
Slides 6
Slides for Hadoop Demo + Code for multi-stage MapReduce
Exercise 3
Solution 3
Google File System paper by Ghemawat et al.
May 4 – date and place changed
Wednesday, 11:40–13:20 in S101/A03
Slides 7
Exercise 4 + Code
Solution 4 + Code
May 10
Slides 7 updated
no exercise this week
May 17
Slides 8 + Slides 8 with notes [updated]
Exercise 5 + Code
Solution 5 + Code
May 24
Slides 9 [updated]
Exercise 6
Solution 6
May 31
Slides 10
no exercise this week
June 7
Slides 11
Exercise 7
Solution 7 + Code
June 14
Slides 12 [updated]
Exercise 8 + Code
Solution 8 + Code
Apache Spark Resilient Distributed Datasets paper by Zaharia
et al.
June 21
Slides 13
Exercise 9
June 28 – slightly postponed to 10:00
Slides 14
Slides 15
Exercise 10
July 26
Exam
based on minimal theme by orderedlist