About this online course

Overview

Data streams are everywhere: they are produced by smartphones, IoT devices, Cloud services, application logs, credit-card transactions, clickstreams, etc. Stream processing is currently a billion-dollar industry and is expected to quadruple in less than 5 years.

A company's goal today is not only to deal with analyzing Big Data, but to also provide timely results from that analysis. Data streams need to be processed in real-time and in a scalable fashion in order to have business value and offer operational insights.

Stream processing technology is used in various settings:

  • Data Engineer/Scientist teams implement scalable stream processing applications to ingest and analyze vast amounts of streamed data for monitoring and alerting systems.
  • Machine Learning Engineers deploy ML models on stream processing pipelines for fraud detection and risk assessment pipelines as well as re-training ML models in real-time.

After finishing this course, learners will be able to identify data streaming use cases, perform analysis of data streams, and set up stream processing pipelines of different types.

This course is designed for data engineers, machine learning engineers, software engineers and data scientists with a basic knowledge of scalable data processing techniques such as Hadoop, MapReduce, etc.

What you'll learn:

  • In this course you will develop the skills to design real-time stream processing pipelines, in a scalable and efficient manner, using Apache Flink - the state-of-the-art open-source technology for stream processing.
  • After taking this course, you will be able to apply your knowledge to set up enterprise pipelines for processing application logs, monitor data centers, deploy ML models for real-time pattern detection and predictive analytics.

Details

Course Syllabus:

Week 1:

Learn the basic concepts behind stream processing with examples from different industries. We will also cover various architectural patterns of stream processing technology. Practical assignments will help you install and experiment with Apache Flink.

In detail, the topics we will cover this week are:

  • The concept of stream processing as opposed to batch processing
  • Use cases from different industries such as Internet-scale companies, the banking sector, etc.
  • Architectures and best practices for setting up streaming pipelines
  • Introduction to the Apache Flink stream processor

Week 2:

Learn the fundamental ideas behind parallel stream processing. You will learn how a massive data stream can be split into a set of smaller substreams. This enables a streaming computation to be scaled out to a cluster of machines to filter and transform streams.

In detail, the topics we will cover this week are:

  • Partitioning strategies for streaming topologies
  • Using partitioning to parallelize the processing of streams
  • Transforming and filtering streams in parallel using a cluster of machines

Week 3:

Learn the concepts of time, order and streaming windows. This week we will show strategies to deal with streams in which events can arrive late or out of order, as well as the effect that the concept of time has on stream processing, in order to create streaming windows and aggregate data.

In detail, the topics we will cover this week are:

  • the concepts of event-time and processing-time and their fundamental difference
  • streaming windows in different time dimensions
  • aggregation of streaming windows
  • strategies to deal with streams containing out-of-order events

Week 4:

Join multiple data streams to gain insights from different streaming data sources. This week we will define the concept of joins on streams and we will see how joins can be used in conjunction with streaming windows.

In detail, the topics we will cover this week are:

  • joining multiple data streams
  • how windows and joins can be used together
  • different types of streaming joins
  • industry use case from the banking industry

Qualifications

Certificates

If you successfully complete this course you will earn a professional education certificate and you are eligible to receive 2.0 Continuing Education Units (CEUs).


View sample certificate

Admission

This course is primarily geared towards working professionals.

Prerequisites

  • Undergraduate degree in Computer Science or a related field
  • Basic knowledge of SQL and database concepts
  • Ability to write simple programs in Java, Scala or Python

In order to complete your enrollment you will be asked to upload the following document:

  • a copy of your passport or ID card (no driver's license)

Contact

If you have any questions about this course or the TU Delft online learning environment, please visit our Help & Support page.

Enroll now

  • Starts: Jan 15, 2020
  • Fee: € 695
  • Enrollment open until: Jan 08, 2020
  • Length: 4 weeks
  • Effort: 4 - 5 hours per week

Related courses and programs