RiseUpp Logo
Educator Logo

Big Data Computing with Spark

Master big data processing using Apache Spark through hands-on coding and practical applications in this comprehensive course from HKUST.

Master big data processing using Apache Spark through hands-on coding and practical applications in this comprehensive course from HKUST.

This course, adapted from HKUST's MSc Program in Big Data Technology, provides a thorough understanding of big data systems with a focus on Apache Spark. Students learn both theoretical concepts and practical implementations through extensive hands-on experience. The curriculum covers Spark programming using RDD and DataFrame APIs, advanced packages like ML and GraphX, and system internals for performance optimization. With over 20 hours of lectures and numerous coding exercises, participants gain practical skills in managing and processing massive datasets across distributed computing environments.

4.9

(7 ratings)

Instructors:

English

English

Powered by

Provider Logo
Big Data Computing with Spark

This course includes

8 Weeks

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

33,183

Audit For Free

What you'll learn

  • Master Spark programming using RDD and DataFrame APIs

  • Implement machine learning solutions using Spark MLlib

  • Design efficient algorithms for big data processing

  • Optimize Spark performance through system understanding

  • Develop streaming applications for real-time data processing

  • Utilize GraphX for graph-based data analysis

Skills you'll gain

Apache Spark
Big Data Processing
Distributed Computing
MapReduce
Machine Learning
Data Analytics
Cloud Computing
Stream Processing

This course includes:

PreRecorded video

Graded assignments, Exams, 20 coding questions, 100+ multiple choice questions

Access on Mobile, Tablet, Desktop

Limited Access access

Shareable certificate

Closed caption

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Certificate

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

icon-0icon-1icon-2icon-3icon-4

There are 8 modules in this course

This comprehensive course covers big data computing with Apache Spark, combining theoretical foundations with practical implementation skills. The curriculum progresses from basic concepts of MapReduce and Hadoop to advanced topics in Spark programming, including RDD and DataFrame APIs, machine learning libraries, and streaming data processing. Students learn system internals, performance optimization techniques, and algorithm design for distributed computing environments. The course features extensive hands-on practice through coding exercises and real-world applications.

Overview, MapReduce, and Hadoop

Module 1

Spark Basics and RDD

Module 2

SparkSQL and MLlib

Module 3

SparkSQL and MLlib

Module 4

Spark Internals

Module 5

Algorithm Design for Big Data

Module 6

GraphX/GraphFrames

Module 7

Spark Streaming

Module 8

Fee Structure

Instructor

A Distinguished Scholar in Database Systems and Big Data Computing

Ke Yi serves as Professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology, where he also directs the MSc Program in Big Data Technology. After completing his BS from Tsinghua University in 2001 and PhD from Duke University in 2006, he has established himself as a leading expert in database theory, parallel computing, and data stream algorithms. His research excellence is evidenced by multiple prestigious awards, including two SIGMOD Best Paper Awards (2022, 2016), a PODS Test-of-time Award (2022), a SIGMOD Best Demonstration Award (2015), and a Google Faculty Research Award (2010). As an ACM Distinguished Member, he has made significant contributions to database systems and algorithms, particularly in areas of data security, privacy, and distributed computing. His teaching excellence has been recognized with multiple Best Teaching Awards for his course on Big Data Computing. Beyond his academic work, Yi maintains active research collaborations with industry partners including Alibaba, Huawei, Microsoft, and Google, while serving as associate editor for prestigious journals and regularly chairing major conferences in the field. His research spans theoretical computer science and practical database systems, with particular emphasis on designing algorithms that offer both theoretical guarantees and practical effectiveness.

Big Data Computing with Spark

This course includes

8 Weeks

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

33,183

Audit For Free

Testimonials

Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.

4.9 course rating

7 ratings

Frequently asked questions

Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.