Big Data Computing with Spark

Name: Big Data Computing with Spark
Rating: 4.9 (7 reviews)

This course is part of Big Data Technology.

This course, adapted from HKUST's MSc Program in Big Data Technology, provides a thorough understanding of big data systems with a focus on Apache Spark. Students learn both theoretical concepts and practical implementations through extensive hands-on experience. The curriculum covers Spark programming using RDD and DataFrame APIs, advanced packages like ML and GraphX, and system internals for performance optimization. With over 20 hours of lectures and numerous coding exercises, participants gain practical skills in managing and processing massive datasets across distributed computing environments.

4.9

(7 ratings)

Instructors:

Ke YI

English

This course includes

8 Weeks

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

33,183

Audit For Free

Add to compare

What you'll learn

Master Spark programming using RDD and DataFrame APIs

Implement machine learning solutions using Spark MLlib

Design efficient algorithms for big data processing

Optimize Spark performance through system understanding

Develop streaming applications for real-time data processing

Utilize GraphX for graph-based data analysis

Skills you'll gain

Apache Spark

Big Data Processing

Distributed Computing

MapReduce

Machine Learning

Data Analytics

Cloud Computing

Stream Processing

This course includes:

PreRecorded video

Graded assignments, Exams, 20 coding questions, 100+ multiple choice questions

Access on Mobile, Tablet, Desktop

Limited Access access

Shareable certificate

Closed caption

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Created by

The Hong Kong University of Science and Technology

Provided by

Edx

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

There are 7 modules in this course

This comprehensive course covers big data computing with Apache Spark, combining theoretical foundations with practical implementation skills. The curriculum progresses from basic concepts of MapReduce and Hadoop to advanced topics in Spark programming, including RDD and DataFrame APIs, machine learning libraries, and streaming data processing. Students learn system internals, performance optimization techniques, and algorithm design for distributed computing environments. The course features extensive hands-on practice through coding exercises and real-world applications.

Overview, MapReduce, and Hadoop

Module 1

Spark Basics and RDD

Module 2

SparkSQL and MLlib

Module 3

Spark Internals

Module 4

Algorithm Design for Big Data

Module 5

GraphX/GraphFrames

Module 6

Spark Streaming

Module 7

Fee Structure

Individual course purchase is not available - to enroll in this course with a certificate, you need to purchase the complete Professional Certificate Course. For enrollment and detailed fee structure, visit the following: Big Data Technology

Instructor

Ke YI

A Distinguished Scholar in Database Systems and Big Data Computing

Ke Yi serves as Professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology, where he also directs the MSc Program in Big Data Technology. After completing his BS from Tsinghua University in 2001 and PhD from Duke University in 2006, he has established himself as a leading expert in database theory, parallel computing, and data stream algorithms. His research excellence is evidenced by multiple prestigious awards, including two SIGMOD Best Paper Awards (2022, 2016), a PODS Test-of-time Award (2022), a SIGMOD Best Demonstration Award (2015), and a Google Faculty Research Award (2010). As an ACM Distinguished Member, he has made significant contributions to database systems and algorithms, particularly in areas of data security, privacy, and distributed computing. His teaching excellence has been recognized with multiple Best Teaching Awards for his course on Big Data Computing. Beyond his academic work, Yi maintains active research collaborations with industry partners including Alibaba, Huawei, Microsoft, and Google, while serving as associate editor for prestigious journals and regularly chairing major conferences in the field. His research spans theoretical computer science and practical database systems, with particular emphasis on designing algorithms that offer both theoretical guarantees and practical effectiveness.

This course includes