Master big data processing using Apache Spark through hands-on coding and practical applications in this comprehensive course from HKUST.
Master big data processing using Apache Spark through hands-on coding and practical applications in this comprehensive course from HKUST.
This course, adapted from HKUST's MSc Program in Big Data Technology, provides a thorough understanding of big data systems with a focus on Apache Spark. Students learn both theoretical concepts and practical implementations through extensive hands-on experience. The curriculum covers Spark programming using RDD and DataFrame APIs, advanced packages like ML and GraphX, and system internals for performance optimization. With over 20 hours of lectures and numerous coding exercises, participants gain practical skills in managing and processing massive datasets across distributed computing environments.
4.9
(7 ratings)
Instructors:
English
English
What you'll learn
Master Spark programming using RDD and DataFrame APIs
Implement machine learning solutions using Spark MLlib
Design efficient algorithms for big data processing
Optimize Spark performance through system understanding
Develop streaming applications for real-time data processing
Utilize GraphX for graph-based data analysis
Skills you'll gain
This course includes:
PreRecorded video
Graded assignments, Exams, 20 coding questions, 100+ multiple choice questions
Access on Mobile, Tablet, Desktop
Limited Access access
Shareable certificate
Closed caption
Get a Completion Certificate
Share your certificate with prospective employers and your professional network on LinkedIn.
Provided by

Top companies offer this course to their employees
Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.





There are 8 modules in this course
This comprehensive course covers big data computing with Apache Spark, combining theoretical foundations with practical implementation skills. The curriculum progresses from basic concepts of MapReduce and Hadoop to advanced topics in Spark programming, including RDD and DataFrame APIs, machine learning libraries, and streaming data processing. Students learn system internals, performance optimization techniques, and algorithm design for distributed computing environments. The course features extensive hands-on practice through coding exercises and real-world applications.
Overview, MapReduce, and Hadoop
Module 1
Spark Basics and RDD
Module 2
SparkSQL and MLlib
Module 3
SparkSQL and MLlib
Module 4
Spark Internals
Module 5
Algorithm Design for Big Data
Module 6
GraphX/GraphFrames
Module 7
Spark Streaming
Module 8
Fee Structure
Instructor
A Distinguished Scholar in Database Systems and Big Data Computing
Ke Yi serves as Professor in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology, where he also directs the MSc Program in Big Data Technology. After completing his BS from Tsinghua University in 2001 and PhD from Duke University in 2006, he has established himself as a leading expert in database theory, parallel computing, and data stream algorithms. His research excellence is evidenced by multiple prestigious awards, including two SIGMOD Best Paper Awards (2022, 2016), a PODS Test-of-time Award (2022), a SIGMOD Best Demonstration Award (2015), and a Google Faculty Research Award (2010). As an ACM Distinguished Member, he has made significant contributions to database systems and algorithms, particularly in areas of data security, privacy, and distributed computing. His teaching excellence has been recognized with multiple Best Teaching Awards for his course on Big Data Computing. Beyond his academic work, Yi maintains active research collaborations with industry partners including Alibaba, Huawei, Microsoft, and Google, while serving as associate editor for prestigious journals and regularly chairing major conferences in the field. His research spans theoretical computer science and practical database systems, with particular emphasis on designing algorithms that offer both theoretical guarantees and practical effectiveness.
Testimonials
Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.
4.9 course rating
7 ratings
Frequently asked questions
Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.