The course provides an overview of recent advances in distributed systems for Big Data processing. The course starts presenting computational models for high throughput batch processing like MapReduce. Next, we will introduce software engineering techniques for distributed systems such as REST and component-based architectures. We will then cover low latency real time stream processing and complex event processing. Finally, we will present advanced topics in distributed data-intensive systems, such as geodistribution and security. The course focuses both on the fundamental concepts as well as on the concrete technologies and applications of the aforementioned techniques to real-world case studies.
Please use the forum for your questions. Other students can benefit from the answers to your questions and can help you, too. Answers will remain as a reference for other people.
Concepts and Technologies for Distributed Systems and Big Data Processing
Lectures are held on Friday, 9:50–11:30 in S202/C205.
April 21
Slides 1
Slides 2
Exercise 1
April 28
Slides 3
Slides 4
Exercise 2 + Code
Solution 2 + Code
MapReduce paper by Dean and Ghemawat
May 5
Slides 5
Slides 6
Slides for Hadoop Demo + Code for multi-stage MapReduce
Exercise 3
Solution 3
Google File System paper by Ghemawat et al.
May 12
Slides 7
May 19
Slides 8
Exercise 4
Solution 4
May 26
Exercise 5 + Code
Solution 5 + Code
June 2
Slides 9
Exercise 6 + Code
Solution 6 + Code
Apache Spark Resilient Distributed Datasets paper by Zaharia et al.
based on minimal theme by orderedlist